Model parameters: d_model 224 ffw_size 896 kv_size 32 n_heads 7 n_layers 4 Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 4 --hidden-size 224 --num-attention-heads 7 --kv-channels 32 --ffn-hidden-size 896 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 44_416_143 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --loss-scale 12 --clip-grad 1.0 --kill-switch-path kill-switch-14m91b100m --bf16 --checkpoint-activations --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 44_416_143 --lr-warmup-samples 444_161 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_14m91b100m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_14m91b100m --load checkpoints_14m91b100m --train-weighted-split-paths-path train100m.txt --valid-weighted-split-paths-path val.txt --data-impl mmap --deepspeed --deepspeed_config ds_configs/3327073.json --zero-stage 0 START 3327073: Fri 17 Mar 2023 12:13:42 AM EET 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 44.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 39.0c 100.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 43.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 41.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 40.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 45.0c 82.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 44.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 47.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 43.0c 99.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 41.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 42.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 45.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 44.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 41.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 48.0c 87.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 41.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 42.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 40.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 47.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 45.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 39.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 41.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 44.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 42.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 43.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 40.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 7: Launching on nid005291 (7/8), master nid005284 port 9999, GPUs 8, CUDA: True 0: Launching on nid005284 (0/8), master nid005284 port 9999, GPUs 8, CUDA: True 5: Launching on nid005289 (5/8), master nid005284 port 9999, GPUs 8, CUDA: True 3: Launching on nid005287 (3/8), master nid005284 port 9999, GPUs 8, CUDA: True 4: Launching on nid005288 (4/8), master nid005284 port 9999, GPUs 8, CUDA: True 2: Launching on nid005286 (2/8), master nid005284 port 9999, GPUs 8, CUDA: True 1: Launching on nid005285 (1/8), master nid005284 port 9999, GPUs 8, CUDA: True 6: Launching on nid005290 (6/8), master nid005284 port 9999, GPUs 8, CUDA: True 0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... True 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 64 0: data_path ....................................... None 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/3327073.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 1000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 896 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 224 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-14m91b100m 0: kv_channels ..................................... 32 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_14m91b100m 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 10 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... 12.0 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 44416143 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 444161 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 4 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 7 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 4 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: reset_progress .................................. None 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_14m91b100m 0: save_interval ................................... 1000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... None 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_14m91b100m 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 44416143 0: train_tokens .................................... None 0: train_weighted_split_names ...................... ['train'] 0: train_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document']] 0: train_weighted_split_paths_path ................. None 0: train_weighted_split_splits ..................... [['0:1']] 0: train_weighted_split_weights .................... [['1.0']] 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... ['validation'] 0: valid_weighted_split_paths ...................... [['/scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document']] 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... [['0:1']] 0: valid_weighted_split_weights .................... [['1.0']] 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 64 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2023-03-17 00:14:27,918] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 7: > setting tensorboard ... 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.093 seconds 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 87 0: [1/1] c++ scaled_upper_triang_masked_softmax_hip.o scaled_upper_triang_masked_softmax_hip.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_upper_triang_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 63 0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: [1/1] c++ layer_norm_cuda.o layer_norm_hip_kernel.cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so 0: >>> done with compiling and loading fused kernels. Compilation time: 26.368 seconds 0: time to initialize megatron (seconds): -30.743 0: [after megatron is initialized] datetime: 2023-03-17 00:14:57 0: building GPT model ... 0: [2023-03-17 00:14:57,392] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2023-03-17 00:14:57,393] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2023-03-17 00:14:57,393] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.65 GB, percent = 6.1% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} 0: [2023-03-17 00:14:59,399] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=11 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: undo 0: 8: MixedFusedLayerNorm 0: 9: EmbeddingPipe 0: 10: float16_to_fp32 0: loss: CrossEntropy 0: [2023-03-17 00:14:59,578] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2023-03-17 00:14:59,579] [INFO] [utils.py:828:see_memory_usage] MA 0.03 GB Max_MA 0.03 GB CA 0.05 GB Max_CA 0 GB 0: [2023-03-17 00:14:59,579] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.66 GB, percent = 6.1% 0: setting training iterations to 173500 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2023-03-17 00:14:59,580] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 7: ninja: no work to do. 0: [2023-03-17 00:15:12,487] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2023-03-17 00:15:12,487] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 7: Time to load utils op: 0.15580368041992188 seconds 0: [2023-03-17 00:15:12,487] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2023-03-17 00:15:12,488] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2023-03-17 00:15:12,488] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2023-03-17 00:15:12,602] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2023-03-17 00:15:12,603] [INFO] [utils.py:828:see_memory_usage] MA 0.03 GB Max_MA 0.03 GB CA 0.05 GB Max_CA 0 GB 0: [2023-03-17 00:15:12,603] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.33 GB, percent = 6.2% 0: ninja: no work to do. 0: Time to load utils op: 0.1997058391571045 seconds 0: Time to load utils op: 0.20461750030517578 secondsTime to load utils op: 0.2046053409576416 seconds 0: 0: Time to load utils op: 0.20452642440795898 seconds 0: Time to load utils op: 0.20468997955322266 seconds 0: Time to load utils op: 0.20487403869628906 seconds 0: Time to load utils op: 0.20497655868530273 seconds 1: Time to load utils op: 0.21026873588562012 seconds 1: Time to load utils op: 0.21007108688354492 seconds 1: Time to load utils op: 0.209946870803833 secondsTime to load utils op: 0.2098710536956787 seconds 1: Time to load utils op: 0.20993661880493164 seconds 1: 1: Time to load utils op: 0.21027565002441406 seconds 1: Time to load utils op: 0.2099752426147461 secondsTime to load utils op: 0.2104940414428711 seconds 1: 7: Time to load utils op: 0.2028036117553711 seconds 7: Time to load utils op: 0.20409750938415527 seconds 7: Time to load utils op: 0.20432424545288086 seconds 7: Time to load utils op: 0.20414996147155762 seconds 7: Time to load utils op: 0.20277118682861328 seconds 7: Time to load utils op: 0.20377016067504883 seconds 7: Time to load utils op: 0.2030324935913086 seconds 0: Time to load utils op: 0.10203027725219727 seconds 6: Time to load utils op: 0.21092581748962402 seconds 6: Time to load utils op: 0.2111673355102539 seconds 6: Time to load utils op: 0.2108922004699707 seconds 6: Time to load utils op: 0.21139049530029297 seconds 6: Time to load utils op: 0.21113014221191406 secondsTime to load utils op: 0.21109676361083984 seconds 6: 6: Time to load utils op: 0.21097469329833984 seconds 2: Time to load utils op: 0.21218514442443848 seconds 2: Time to load utils op: 0.2121903896331787 seconds 2: Time to load utils op: 0.2121884822845459 seconds 2: Time to load utils op: 0.21219706535339355 seconds 2: Time to load utils op: 0.21221351623535156 seconds 2: Time to load utils op: 0.2122204303741455 secondsTime to load utils op: 0.2122175693511963 secondsTime to load utils op: 0.21224164962768555 seconds 2: 2: 4: Time to load utils op: 0.21051406860351562 secondsTime to load utils op: 0.21049284934997559 seconds 4: 4: Time to load utils op: 0.21054482460021973 seconds 4: Time to load utils op: 0.21051573753356934 seconds 4: Time to load utils op: 0.21056532859802246 seconds 4: Time to load utils op: 0.2105703353881836 secondsTime to load utils op: 0.2105727195739746 seconds 4: Time to load utils op: 0.21056914329528809 seconds 4: 3: Time to load utils op: 0.21273040771484375 secondsTime to load utils op: 0.21273422241210938 seconds 3: 3: Time to load utils op: 0.2127518653869629 seconds 3: Time to load utils op: 0.21274352073669434 secondsTime to load utils op: 0.212754487991333 seconds 3: 3: Time to load utils op: 0.21273398399353027 seconds 3: Time to load utils op: 0.21271395683288574 seconds 3: Time to load utils op: 0.21273493766784668 seconds 5: Time to load utils op: 0.21194124221801758 seconds 5: Time to load utils op: 0.21196866035461426 secondsTime to load utils op: 0.21192073822021484 secondsTime to load utils op: 0.2119762897491455 seconds 5: 5: 5: Time to load utils op: 0.21195721626281738 seconds 5: Time to load utils op: 0.21196460723876953 seconds 5: Time to load utils op: 0.21199321746826172 seconds 5: Time to load utils op: 0.21201229095458984 seconds 7: Time to load utils op: 0.0006439685821533203 seconds 0: Time to load utils op: 0.0009887218475341797 seconds 0: Time to load utils op: 0.0012314319610595703 seconds 0: Time to load utils op: 0.0012180805206298828 seconds 0: Time to load utils op: 0.001207590103149414 seconds 0: Time to load utils op: 0.0013117790222167969 seconds 0: Time to load utils op: 0.0013298988342285156 seconds 0: Time to load utils op: 0.0013546943664550781 seconds 6: Time to load utils op: 0.4037022590637207 seconds 7: Time to load utils op: 0.00031375885009765625 seconds 7: Time to load utils op: 0.00033092498779296875 seconds 7: Time to load utils op: 0.0004131793975830078 seconds 7: Time to load utils op: 0.0003597736358642578 seconds 7: Time to load utils op: 0.0003650188446044922 seconds 7: Time to load utils op: 0.0003361701965332031 secondsTime to load utils op: 0.000335693359375 seconds 7: 1: Time to load utils op: 0.0010993480682373047 seconds 1: Time to load utils op: 0.001295328140258789 seconds 1: Time to load utils op: 0.00141143798828125 seconds 1: Time to load utils op: 0.001399993896484375 secondsTime to load utils op: 0.0013892650604248047 seconds 1: 1: Time to load utils op: 0.0014548301696777344 seconds 1: Time to load utils op: 0.0014433860778808594 seconds 1: Time to load utils op: 0.0014710426330566406 seconds 6: Time to load utils op: 0.0004851818084716797 seconds 3: Time to load utils op: 0.0007803440093994141 seconds 6: Time to load utils op: 0.00043082237243652344 seconds 6: Time to load utils op: 0.00041985511779785156 secondsTime to load utils op: 0.00043129920959472656 seconds 6: 6: Time to load utils op: 0.0004489421844482422 seconds 4: Time to load utils op: 0.0008258819580078125 seconds 4: Time to load utils op: 0.0008180141448974609 secondsTime to load utils op: 0.0008747577667236328 seconds 4: 6: Time to load utils op: 0.0005128383636474609 seconds 3: Time to load utils op: 0.0009925365447998047 seconds 6: Time to load utils op: 0.0005543231964111328 seconds 6: Time to load utils op: 0.0005381107330322266 seconds 4: Time to load utils op: 0.0010120868682861328 seconds 4: Time to load utils op: 0.0009694099426269531 seconds 4: Time to load utils op: 0.001004934310913086 seconds 4: Time to load utils op: 0.0010063648223876953 seconds 4: Time to load utils op: 0.0010619163513183594 seconds 2: Time to load utils op: 0.0009734630584716797 seconds 3: Time to load utils op: 0.001394033432006836 seconds 3: Time to load utils op: 0.0013887882232666016 secondsTime to load utils op: 0.0013828277587890625 seconds 3: 3: Time to load utils op: 0.0013904571533203125 secondsTime to load utils op: 0.0014066696166992188 seconds 3: 3: Time to load utils op: 0.0014338493347167969 seconds 2: Time to load utils op: 0.0012183189392089844 seconds 2: Time to load utils op: 0.0012412071228027344 seconds 2: Time to load utils op: 0.0012874603271484375 seconds 2: Time to load utils op: 0.00136566162109375 secondsTime to load utils op: 0.001337289810180664 seconds 2: 2: Time to load utils op: 0.0013659000396728516 seconds 2: Time to load utils op: 0.0014295578002929688 seconds 5: Time to load utils op: 0.0005731582641601562 seconds 5: Time to load utils op: 0.0010390281677246094 seconds 5: Time to load utils op: 0.0011785030364990234 seconds 5: Time to load utils op: 0.001283884048461914 seconds 5: Time to load utils op: 0.0012004375457763672 seconds 5: Time to load utils op: 0.0012087821960449219 seconds 5: Time to load utils op: 0.001207590103149414 seconds 5: Time to load utils op: 0.001264810562133789 seconds 0: [2023-03-17 00:15:12,839] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 0: [2023-03-17 00:15:12,840] [INFO] [utils.py:828:see_memory_usage] MA 0.03 GB Max_MA 0.03 GB CA 0.05 GB Max_CA 0 GB 0: [2023-03-17 00:15:12,840] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:12,965] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 0: [2023-03-17 00:15:12,966] [INFO] [utils.py:828:see_memory_usage] MA 0.07 GB Max_MA 0.07 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:12,966] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,067] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2023-03-17 00:15:13,068] [INFO] [utils.py:828:see_memory_usage] MA 0.07 GB Max_MA 0.07 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,068] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,169] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2023-03-17 00:15:13,170] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,170] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,268] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2023-03-17 00:15:13,269] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,269] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,369] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2023-03-17 00:15:13,370] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,370] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,469] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2023-03-17 00:15:13,469] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,469] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,574] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2023-03-17 00:15:13,574] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,574] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,673] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2023-03-17 00:15:13,674] [INFO] [utils.py:828:see_memory_usage] MA 0.08 GB Max_MA 0.08 GB CA 0.12 GB Max_CA 0 GB 0: [2023-03-17 00:15:13,674] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 31.47 GB, percent = 6.3% 0: [2023-03-17 00:15:13,674] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2023-03-17 00:15:13,674] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2023-03-17 00:15:13,674] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2023-03-17 00:15:13,674] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] amp_params ................... False 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2023-03-17 00:15:13,675] [INFO] [config.py:1011:print] comms_config ................. 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] dump_state ................... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] monitor_config ............... 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] pld_params ................... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2023-03-17 00:15:13,676] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] world_size ................... 64 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2023-03-17 00:15:13,677] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2023-03-17 00:15:13,677] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 4, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.00040721893310546875 seconds 0: [2023-03-17 00:15:13,677] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 0: [2023-03-17 00:15:13,730] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=11 [0, 11) STAGE_PARAMS=14147392 (14.147M) TOTAL_PARAMS=14147392 (14.147M) UNIQUE_PARAMS=14147392 (14.147M) 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: WARNING: could not find the metadata file checkpoints_14m91b100m 0: will not load any checkpoints and will start from random 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2023-03-17 00:15:13,737] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2023-03-17 00:15:13,738] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,738] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2023-03-17 00:15:13,738] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_14m91b100m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: time (ms) | load-checkpoint: 7.99 0: estimated model parameters: 0.014147392 0: estimated model parameters without embeddings: 0.002420544 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2023-03-17 00:15:14 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 44416143 0: validation: 44544 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.009510 seconds 0: number of documents: 208931 0: > dataset split: 0: train: 0: document indices in [0, 208931) total of 208931 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_44416143ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_44416143ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_subsampled/gpt2tok_c4_en_100M_text_document_train_indexmap_44416143ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.047 seconds 0: total number of samples: 44461248 0: total number of epochs: 911 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.042076 seconds 0: number of documents: 364608 0: > dataset split: 0: validation: 0: document indices in [0, 364608) total of 364608 documents 0: > loading doc-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_44544ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_44544ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/c4_validation/gpt2tok_c4validation_rerun_text_document_validation_indexmap_44544ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.010 seconds 0: total number of samples: 84978 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2023-03-17 00:15:27 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 7: time (ms) | model-and-optimizer-setup: 16801.67 | train/valid/test-data-iterators-setup: 13383.86 0: [000-000] 0.0141B / 0.0024B 0: [before the start of training step] datetime: 2023-03-17 00:15:27 0: [2023-03-17 00:15:28,179] [INFO] [checkpointing.py:553:forward] Activation Checkpointing Information 0: [2023-03-17 00:15:28,179] [INFO] [checkpointing.py:554:forward] ----Partition Activations False, CPU CHECKPOINTING False 0: [2023-03-17 00:15:28,179] [INFO] [checkpointing.py:557:forward] ----contiguous Memory Checkpointing False with None total layers 0: [2023-03-17 00:15:28,179] [INFO] [checkpointing.py:560:forward] ----Synchronization False 0: [2023-03-17 00:15:28,179] [INFO] [checkpointing.py:561:forward] ----Profiling time in checkpointing False 0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 1686.69384765625 | max allocated: 4072.091796875 | reserved: 4858.0 | max reserved: 4858.0 7: iteration 10/ 173500 | consumed samples: 2560 | consumed tokens: 5242880 | elapsed time per iteration (s): 1.44 | learning rate: 1.153E-06 | global batch size: 256 | lm loss: 1.086005E+01 | grad norm: 2.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 177.684 | TFLOPs: 0.66 | 7: iteration 20/ 173500 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 0.10 | learning rate: 2.305E-06 | global batch size: 256 | lm loss: 1.084982E+01 | grad norm: 2.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.276 | TFLOPs: 9.76 | 7: iteration 30/ 173500 | consumed samples: 7680 | consumed tokens: 15728640 | elapsed time per iteration (s): 0.10 | learning rate: 3.458E-06 | global batch size: 256 | lm loss: 1.081222E+01 | grad norm: 2.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2590.033 | TFLOPs: 9.63 | 7: iteration 40/ 173500 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 0.10 | learning rate: 4.611E-06 | global batch size: 256 | lm loss: 1.075066E+01 | grad norm: 2.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2599.748 | TFLOPs: 9.67 | 7: iteration 50/ 173500 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (s): 0.10 | learning rate: 5.764E-06 | global batch size: 256 | lm loss: 1.066718E+01 | grad norm: 2.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2605.505 | TFLOPs: 9.69 | 7: iteration 60/ 173500 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 0.09 | learning rate: 6.916E-06 | global batch size: 256 | lm loss: 1.057804E+01 | grad norm: 1.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.216 | TFLOPs: 10.63 | 7: iteration 70/ 173500 | consumed samples: 17920 | consumed tokens: 36700160 | elapsed time per iteration (s): 0.08 | learning rate: 8.069E-06 | global batch size: 256 | lm loss: 1.050277E+01 | grad norm: 1.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.420 | TFLOPs: 11.47 | 7: iteration 80/ 173500 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 0.08 | learning rate: 9.222E-06 | global batch size: 256 | lm loss: 1.043906E+01 | grad norm: 1.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.237 | TFLOPs: 11.64 | 7: iteration 90/ 173500 | consumed samples: 23040 | consumed tokens: 47185920 | elapsed time per iteration (s): 0.09 | learning rate: 1.037E-05 | global batch size: 256 | lm loss: 1.038391E+01 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.174 | TFLOPs: 10.91 | 7: iteration 100/ 173500 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.09 | learning rate: 1.153E-05 | global batch size: 256 | lm loss: 1.033188E+01 | grad norm: 1.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.897 | TFLOPs: 10.83 | 7: iteration 110/ 173500 | consumed samples: 28160 | consumed tokens: 57671680 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-05 | global batch size: 256 | lm loss: 1.027893E+01 | grad norm: 1.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.056 | TFLOPs: 11.62 | 7: iteration 120/ 173500 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 0.09 | learning rate: 1.383E-05 | global batch size: 256 | lm loss: 1.022292E+01 | grad norm: 1.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2930.402 | TFLOPs: 10.90 | 7: iteration 130/ 173500 | consumed samples: 33280 | consumed tokens: 68157440 | elapsed time per iteration (s): 0.09 | learning rate: 1.499E-05 | global batch size: 256 | lm loss: 1.016328E+01 | grad norm: 1.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.698 | TFLOPs: 10.66 | 7: iteration 140/ 173500 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 0.09 | learning rate: 1.614E-05 | global batch size: 256 | lm loss: 1.010082E+01 | grad norm: 1.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2878.691 | TFLOPs: 10.71 | 7: iteration 150/ 173500 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (s): 0.08 | learning rate: 1.729E-05 | global batch size: 256 | lm loss: 1.003800E+01 | grad norm: 1.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.612 | TFLOPs: 11.45 | 7: iteration 160/ 173500 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-05 | global batch size: 256 | lm loss: 9.970332E+00 | grad norm: 1.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.516 | TFLOPs: 11.63 | 7: iteration 170/ 173500 | consumed samples: 43520 | consumed tokens: 89128960 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-05 | global batch size: 256 | lm loss: 9.900977E+00 | grad norm: 1.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.076 | TFLOPs: 11.00 | 7: iteration 180/ 173500 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 0.09 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 9.832587E+00 | grad norm: 1.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2819.754 | TFLOPs: 10.49 | 7: iteration 190/ 173500 | consumed samples: 48640 | consumed tokens: 99614720 | elapsed time per iteration (s): 0.09 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 9.762077E+00 | grad norm: 1.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2811.228 | TFLOPs: 10.46 | 7: iteration 200/ 173500 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.09 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 9.688070E+00 | grad norm: 1.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2856.165 | TFLOPs: 10.62 | 7: iteration 210/ 173500 | consumed samples: 53760 | consumed tokens: 110100480 | elapsed time per iteration (s): 0.09 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 9.610928E+00 | grad norm: 1.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.526 | TFLOPs: 10.86 | 7: iteration 220/ 173500 | consumed samples: 56320 | consumed tokens: 115343360 | elapsed time per iteration (s): 0.08 | learning rate: 2.536E-05 | global batch size: 256 | lm loss: 9.528062E+00 | grad norm: 1.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.082 | TFLOPs: 11.44 | 7: iteration 230/ 173500 | consumed samples: 58880 | consumed tokens: 120586240 | elapsed time per iteration (s): 0.09 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 9.451054E+00 | grad norm: 1.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.725 | TFLOPs: 10.29 | 7: iteration 240/ 173500 | consumed samples: 61440 | consumed tokens: 125829120 | elapsed time per iteration (s): 0.09 | learning rate: 2.767E-05 | global batch size: 256 | lm loss: 9.374092E+00 | grad norm: 1.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.962 | TFLOPs: 10.76 | 7: iteration 250/ 173500 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (s): 0.08 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 9.291859E+00 | grad norm: 1.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.647 | TFLOPs: 11.23 | 7: iteration 260/ 173500 | consumed samples: 66560 | consumed tokens: 136314880 | elapsed time per iteration (s): 0.09 | learning rate: 2.997E-05 | global batch size: 256 | lm loss: 9.211075E+00 | grad norm: 1.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.986 | TFLOPs: 10.76 | 7: iteration 270/ 173500 | consumed samples: 69120 | consumed tokens: 141557760 | elapsed time per iteration (s): 0.09 | learning rate: 3.112E-05 | global batch size: 256 | lm loss: 9.140829E+00 | grad norm: 1.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.765 | TFLOPs: 10.18 | 7: iteration 280/ 173500 | consumed samples: 71680 | consumed tokens: 146800640 | elapsed time per iteration (s): 0.09 | learning rate: 3.228E-05 | global batch size: 256 | lm loss: 9.059852E+00 | grad norm: 1.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.094 | TFLOPs: 10.44 | 7: iteration 290/ 173500 | consumed samples: 74240 | consumed tokens: 152043520 | elapsed time per iteration (s): 0.10 | learning rate: 3.343E-05 | global batch size: 256 | lm loss: 8.981351E+00 | grad norm: 1.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2558.941 | TFLOPs: 9.52 | 7: iteration 300/ 173500 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.09 | learning rate: 3.458E-05 | global batch size: 256 | lm loss: 8.903331E+00 | grad norm: 1.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.075 | TFLOPs: 10.07 | 7: iteration 310/ 173500 | consumed samples: 79360 | consumed tokens: 162529280 | elapsed time per iteration (s): 0.09 | learning rate: 3.573E-05 | global batch size: 256 | lm loss: 8.826328E+00 | grad norm: 1.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2863.247 | TFLOPs: 10.65 | 7: iteration 320/ 173500 | consumed samples: 81920 | consumed tokens: 167772160 | elapsed time per iteration (s): 0.08 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 8.752460E+00 | grad norm: 1.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.542 | TFLOPs: 11.97 | 7: iteration 330/ 173500 | consumed samples: 84480 | consumed tokens: 173015040 | elapsed time per iteration (s): 0.08 | learning rate: 3.804E-05 | global batch size: 256 | lm loss: 8.669928E+00 | grad norm: 1.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.447 | TFLOPs: 11.58 | 7: iteration 340/ 173500 | consumed samples: 87040 | consumed tokens: 178257920 | elapsed time per iteration (s): 0.09 | learning rate: 3.919E-05 | global batch size: 256 | lm loss: 8.601341E+00 | grad norm: 1.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.750 | TFLOPs: 10.39 | 7: iteration 350/ 173500 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (s): 0.08 | learning rate: 4.035E-05 | global batch size: 256 | lm loss: 8.521651E+00 | grad norm: 1.185 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.557 | TFLOPs: 11.24 | 7: iteration 360/ 173500 | consumed samples: 92160 | consumed tokens: 188743680 | elapsed time per iteration (s): 0.08 | learning rate: 4.150E-05 | global batch size: 256 | lm loss: 8.441368E+00 | grad norm: 1.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.545 | TFLOPs: 11.66 | 7: iteration 370/ 173500 | consumed samples: 94720 | consumed tokens: 193986560 | elapsed time per iteration (s): 0.09 | learning rate: 4.265E-05 | global batch size: 256 | lm loss: 8.372766E+00 | grad norm: 1.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.179 | TFLOPs: 10.15 | 7: iteration 380/ 173500 | consumed samples: 97280 | consumed tokens: 199229440 | elapsed time per iteration (s): 0.09 | learning rate: 4.380E-05 | global batch size: 256 | lm loss: 8.309316E+00 | grad norm: 1.166 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.431 | TFLOPs: 10.23 | 7: iteration 390/ 173500 | consumed samples: 99840 | consumed tokens: 204472320 | elapsed time per iteration (s): 0.08 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 8.228992E+00 | grad norm: 1.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.725 | TFLOPs: 11.36 | 7: iteration 400/ 173500 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.10 | learning rate: 4.611E-05 | global batch size: 256 | lm loss: 8.162898E+00 | grad norm: 1.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2611.188 | TFLOPs: 9.71 | 7: iteration 410/ 173500 | consumed samples: 104960 | consumed tokens: 214958080 | elapsed time per iteration (s): 0.09 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 8.097923E+00 | grad norm: 1.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.346 | TFLOPs: 10.88 | 7: iteration 420/ 173500 | consumed samples: 107520 | consumed tokens: 220200960 | elapsed time per iteration (s): 0.09 | learning rate: 4.841E-05 | global batch size: 256 | lm loss: 8.021178E+00 | grad norm: 1.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.550 | TFLOPs: 10.78 | 7: iteration 430/ 173500 | consumed samples: 110080 | consumed tokens: 225443840 | elapsed time per iteration (s): 0.10 | learning rate: 4.957E-05 | global batch size: 256 | lm loss: 7.965755E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.511 | TFLOPs: 9.76 | 7: iteration 440/ 173500 | consumed samples: 112640 | consumed tokens: 230686720 | elapsed time per iteration (s): 0.09 | learning rate: 5.072E-05 | global batch size: 256 | lm loss: 7.892110E+00 | grad norm: 1.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.275 | TFLOPs: 10.33 | 7: iteration 450/ 173500 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (s): 0.09 | learning rate: 5.187E-05 | global batch size: 256 | lm loss: 7.829012E+00 | grad norm: 1.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2951.502 | TFLOPs: 10.98 | 7: iteration 460/ 173500 | consumed samples: 117760 | consumed tokens: 241172480 | elapsed time per iteration (s): 0.08 | learning rate: 5.303E-05 | global batch size: 256 | lm loss: 7.765779E+00 | grad norm: 0.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.395 | TFLOPs: 11.20 | 7: iteration 470/ 173500 | consumed samples: 120320 | consumed tokens: 246415360 | elapsed time per iteration (s): 0.09 | learning rate: 5.418E-05 | global batch size: 256 | lm loss: 7.711938E+00 | grad norm: 0.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.914 | TFLOPs: 10.36 | 7: iteration 480/ 173500 | consumed samples: 122880 | consumed tokens: 251658240 | elapsed time per iteration (s): 0.09 | learning rate: 5.533E-05 | global batch size: 256 | lm loss: 7.664756E+00 | grad norm: 0.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.731 | TFLOPs: 11.11 | 7: iteration 490/ 173500 | consumed samples: 125440 | consumed tokens: 256901120 | elapsed time per iteration (s): 0.09 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 7.618103E+00 | grad norm: 0.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.409 | TFLOPs: 10.24 | 7: iteration 500/ 173500 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.10 | learning rate: 5.764E-05 | global batch size: 256 | lm loss: 7.556934E+00 | grad norm: 0.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.839 | TFLOPs: 9.92 | 7: iteration 510/ 173500 | consumed samples: 130560 | consumed tokens: 267386880 | elapsed time per iteration (s): 0.09 | learning rate: 5.879E-05 | global batch size: 256 | lm loss: 7.499502E+00 | grad norm: 0.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2918.458 | TFLOPs: 10.86 | 7: iteration 520/ 173500 | consumed samples: 133120 | consumed tokens: 272629760 | elapsed time per iteration (s): 0.09 | learning rate: 5.994E-05 | global batch size: 256 | lm loss: 7.466741E+00 | grad norm: 0.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.205 | TFLOPs: 11.08 | 7: iteration 530/ 173500 | consumed samples: 135680 | consumed tokens: 277872640 | elapsed time per iteration (s): 0.09 | learning rate: 6.109E-05 | global batch size: 256 | lm loss: 7.432997E+00 | grad norm: 0.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.783 | TFLOPs: 10.61 | 7: iteration 540/ 173500 | consumed samples: 138240 | consumed tokens: 283115520 | elapsed time per iteration (s): 0.08 | learning rate: 6.225E-05 | global batch size: 256 | lm loss: 7.387447E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.154 | TFLOPs: 11.25 | 7: iteration 550/ 173500 | consumed samples: 140800 | consumed tokens: 288358400 | elapsed time per iteration (s): 0.09 | learning rate: 6.340E-05 | global batch size: 256 | lm loss: 7.354859E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.456 | TFLOPs: 10.86 | 7: iteration 560/ 173500 | consumed samples: 143360 | consumed tokens: 293601280 | elapsed time per iteration (s): 0.08 | learning rate: 6.455E-05 | global batch size: 256 | lm loss: 7.310633E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.922 | TFLOPs: 11.42 | 7: iteration 570/ 173500 | consumed samples: 145920 | consumed tokens: 298844160 | elapsed time per iteration (s): 0.09 | learning rate: 6.571E-05 | global batch size: 256 | lm loss: 7.271725E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2884.327 | TFLOPs: 10.73 | 7: iteration 580/ 173500 | consumed samples: 148480 | consumed tokens: 304087040 | elapsed time per iteration (s): 0.10 | learning rate: 6.686E-05 | global batch size: 256 | lm loss: 7.223974E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.985 | TFLOPs: 9.83 | 7: iteration 590/ 173500 | consumed samples: 151040 | consumed tokens: 309329920 | elapsed time per iteration (s): 0.11 | learning rate: 6.801E-05 | global batch size: 256 | lm loss: 7.209837E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.119 | TFLOPs: 8.99 | 7: iteration 600/ 173500 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.09 | learning rate: 6.916E-05 | global batch size: 256 | lm loss: 7.177348E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.120 | TFLOPs: 10.19 | 7: iteration 610/ 173500 | consumed samples: 156160 | consumed tokens: 319815680 | elapsed time per iteration (s): 0.08 | learning rate: 7.032E-05 | global batch size: 256 | lm loss: 7.138157E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.398 | TFLOPs: 11.39 | 7: iteration 620/ 173500 | consumed samples: 158720 | consumed tokens: 325058560 | elapsed time per iteration (s): 0.09 | learning rate: 7.147E-05 | global batch size: 256 | lm loss: 7.101998E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.231 | TFLOPs: 10.82 | 7: iteration 630/ 173500 | consumed samples: 161280 | consumed tokens: 330301440 | elapsed time per iteration (s): 0.10 | learning rate: 7.262E-05 | global batch size: 256 | lm loss: 7.085678E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.211 | TFLOPs: 9.24 | 7: iteration 640/ 173500 | consumed samples: 163840 | consumed tokens: 335544320 | elapsed time per iteration (s): 0.08 | learning rate: 7.378E-05 | global batch size: 256 | lm loss: 7.056647E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.924 | TFLOPs: 11.92 | 7: iteration 650/ 173500 | consumed samples: 166400 | consumed tokens: 340787200 | elapsed time per iteration (s): 0.09 | learning rate: 7.493E-05 | global batch size: 256 | lm loss: 7.031139E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2899.839 | TFLOPs: 10.79 | 7: iteration 660/ 173500 | consumed samples: 168960 | consumed tokens: 346030080 | elapsed time per iteration (s): 0.09 | learning rate: 7.608E-05 | global batch size: 256 | lm loss: 7.000718E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.488 | TFLOPs: 10.17 | 7: iteration 670/ 173500 | consumed samples: 171520 | consumed tokens: 351272960 | elapsed time per iteration (s): 0.09 | learning rate: 7.723E-05 | global batch size: 256 | lm loss: 6.971330E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.129 | TFLOPs: 10.74 | 7: iteration 680/ 173500 | consumed samples: 174080 | consumed tokens: 356515840 | elapsed time per iteration (s): 0.08 | learning rate: 7.839E-05 | global batch size: 256 | lm loss: 6.960162E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.334 | TFLOPs: 11.27 | 7: iteration 690/ 173500 | consumed samples: 176640 | consumed tokens: 361758720 | elapsed time per iteration (s): 0.09 | learning rate: 7.954E-05 | global batch size: 256 | lm loss: 6.927275E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.359 | TFLOPs: 10.13 | 7: iteration 700/ 173500 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.09 | learning rate: 8.069E-05 | global batch size: 256 | lm loss: 6.909566E+00 | grad norm: 0.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.711 | TFLOPs: 10.74 | 7: iteration 710/ 173500 | consumed samples: 181760 | consumed tokens: 372244480 | elapsed time per iteration (s): 0.09 | learning rate: 8.184E-05 | global batch size: 256 | lm loss: 6.880545E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2753.018 | TFLOPs: 10.24 | 7: iteration 720/ 173500 | consumed samples: 184320 | consumed tokens: 377487360 | elapsed time per iteration (s): 0.09 | learning rate: 8.300E-05 | global batch size: 256 | lm loss: 6.858407E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2910.375 | TFLOPs: 10.83 | 7: iteration 730/ 173500 | consumed samples: 186880 | consumed tokens: 382730240 | elapsed time per iteration (s): 0.08 | learning rate: 8.415E-05 | global batch size: 256 | lm loss: 6.828732E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.818 | TFLOPs: 11.36 | 7: iteration 740/ 173500 | consumed samples: 189440 | consumed tokens: 387973120 | elapsed time per iteration (s): 0.11 | learning rate: 8.530E-05 | global batch size: 256 | lm loss: 6.834333E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.579 | TFLOPs: 8.72 | 7: iteration 750/ 173500 | consumed samples: 192000 | consumed tokens: 393216000 | elapsed time per iteration (s): 0.08 | learning rate: 8.646E-05 | global batch size: 256 | lm loss: 6.811624E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.285 | TFLOPs: 11.87 | 7: iteration 760/ 173500 | consumed samples: 194560 | consumed tokens: 398458880 | elapsed time per iteration (s): 0.08 | learning rate: 8.761E-05 | global batch size: 256 | lm loss: 6.785896E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.706 | TFLOPs: 11.93 | 7: iteration 770/ 173500 | consumed samples: 197120 | consumed tokens: 403701760 | elapsed time per iteration (s): 0.08 | learning rate: 8.876E-05 | global batch size: 256 | lm loss: 6.754237E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.631 | TFLOPs: 11.79 | 7: iteration 780/ 173500 | consumed samples: 199680 | consumed tokens: 408944640 | elapsed time per iteration (s): 0.09 | learning rate: 8.991E-05 | global batch size: 256 | lm loss: 6.754701E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.901 | TFLOPs: 10.82 | 7: iteration 790/ 173500 | consumed samples: 202240 | consumed tokens: 414187520 | elapsed time per iteration (s): 0.08 | learning rate: 9.107E-05 | global batch size: 256 | lm loss: 6.734300E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.897 | TFLOPs: 11.94 | 7: iteration 800/ 173500 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.08 | learning rate: 9.222E-05 | global batch size: 256 | lm loss: 6.711430E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.590 | TFLOPs: 11.26 | 7: iteration 810/ 173500 | consumed samples: 207360 | consumed tokens: 424673280 | elapsed time per iteration (s): 0.09 | learning rate: 9.337E-05 | global batch size: 256 | lm loss: 6.706747E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.741 | TFLOPs: 10.42 | 7: iteration 820/ 173500 | consumed samples: 209920 | consumed tokens: 429916160 | elapsed time per iteration (s): 0.09 | learning rate: 9.452E-05 | global batch size: 256 | lm loss: 6.705366E+00 | grad norm: 0.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.127 | TFLOPs: 10.76 | 7: iteration 830/ 173500 | consumed samples: 212480 | consumed tokens: 435159040 | elapsed time per iteration (s): 0.08 | learning rate: 9.568E-05 | global batch size: 256 | lm loss: 6.670564E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.438 | TFLOPs: 11.47 | 7: iteration 840/ 173500 | consumed samples: 215040 | consumed tokens: 440401920 | elapsed time per iteration (s): 0.09 | learning rate: 9.683E-05 | global batch size: 256 | lm loss: 6.654486E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.597 | TFLOPs: 10.22 | 7: iteration 850/ 173500 | consumed samples: 217600 | consumed tokens: 445644800 | elapsed time per iteration (s): 0.08 | learning rate: 9.798E-05 | global batch size: 256 | lm loss: 6.643107E+00 | grad norm: 0.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.728 | TFLOPs: 11.75 | 7: iteration 860/ 173500 | consumed samples: 220160 | consumed tokens: 450887680 | elapsed time per iteration (s): 0.09 | learning rate: 9.914E-05 | global batch size: 256 | lm loss: 6.628136E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.828 | TFLOPs: 10.46 | 7: iteration 870/ 173500 | consumed samples: 222720 | consumed tokens: 456130560 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 6.606814E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.785 | TFLOPs: 11.46 | 7: iteration 880/ 173500 | consumed samples: 225280 | consumed tokens: 461373440 | elapsed time per iteration (s): 0.09 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 6.606078E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2961.776 | TFLOPs: 11.02 | 7: iteration 890/ 173500 | consumed samples: 227840 | consumed tokens: 466616320 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 6.587260E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.328 | TFLOPs: 11.28 | 7: iteration 900/ 173500 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.10 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 6.577593E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.991 | TFLOPs: 9.70 | 7: iteration 910/ 173500 | consumed samples: 232960 | consumed tokens: 477102080 | elapsed time per iteration (s): 0.08 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 6.564377E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.039 | TFLOPs: 11.27 | 7: iteration 920/ 173500 | consumed samples: 235520 | consumed tokens: 482344960 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 6.567670E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.661 | TFLOPs: 11.34 | 7: iteration 930/ 173500 | consumed samples: 238080 | consumed tokens: 487587840 | elapsed time per iteration (s): 0.09 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 6.541547E+00 | grad norm: 0.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.317 | TFLOPs: 10.57 | 7: iteration 940/ 173500 | consumed samples: 240640 | consumed tokens: 492830720 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 6.533603E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.723 | TFLOPs: 11.52 | 7: iteration 950/ 173500 | consumed samples: 243200 | consumed tokens: 498073600 | elapsed time per iteration (s): 0.09 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 6.514489E+00 | grad norm: 0.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.825 | TFLOPs: 11.17 | 7: iteration 960/ 173500 | consumed samples: 245760 | consumed tokens: 503316480 | elapsed time per iteration (s): 0.09 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 6.509242E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.158 | TFLOPs: 11.01 | 7: iteration 970/ 173500 | consumed samples: 248320 | consumed tokens: 508559360 | elapsed time per iteration (s): 0.10 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 6.488614E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2551.277 | TFLOPs: 9.49 | 7: iteration 980/ 173500 | consumed samples: 250880 | consumed tokens: 513802240 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 6.478421E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.663 | TFLOPs: 11.50 | 7: iteration 990/ 173500 | consumed samples: 253440 | consumed tokens: 519045120 | elapsed time per iteration (s): 0.09 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 6.481158E+00 | grad norm: 0.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2961.833 | TFLOPs: 11.02 | 7: iteration 1000/ 173500 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.09 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 6.473004E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.972 | TFLOPs: 11.11 | 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 1000 | lm loss value: 6.429934E+00 | lm loss PPL: 6.201327E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 1000 to checkpoints_14m91b100m 0: [2023-03-17 00:17:10,328] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! 0: [2023-03-17 00:17:10,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:17:10,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:17:10,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:17:10,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:17:10,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:17:10,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:17:10,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:17:10,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:17:10,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:17:10,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:17:10,495] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:17:10,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:17:10,496] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step1000/mp_rank_00_model_states.pt 0: [2023-03-17 00:17:10,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:17:10,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:17:10,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:17:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2023-03-17 00:17:10,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:17:10,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:17:10,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:17:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 00:17:10,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2023-03-17 00:17:10,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2023-03-17 00:17:10,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:17:10,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: successfully saved checkpoint at iteration 1000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 206.61 7: iteration 1010/ 173500 | consumed samples: 258560 | consumed tokens: 529530880 | elapsed time per iteration (s): 0.12 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 6.450604E+00 | grad norm: 0.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2186.352 | TFLOPs: 8.13 | 7: iteration 1020/ 173500 | consumed samples: 261120 | consumed tokens: 534773760 | elapsed time per iteration (s): 0.09 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 6.451440E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.631 | TFLOPs: 10.44 | 7: iteration 1030/ 173500 | consumed samples: 263680 | consumed tokens: 540016640 | elapsed time per iteration (s): 0.09 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 6.435435E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.967 | TFLOPs: 10.56 | 7: iteration 1040/ 173500 | consumed samples: 266240 | consumed tokens: 545259520 | elapsed time per iteration (s): 0.09 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 6.426527E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2939.778 | TFLOPs: 10.93 | 7: iteration 1050/ 173500 | consumed samples: 268800 | consumed tokens: 550502400 | elapsed time per iteration (s): 0.09 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 6.413961E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2760.548 | TFLOPs: 10.27 | 7: iteration 1060/ 173500 | consumed samples: 271360 | consumed tokens: 555745280 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 6.405787E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.869 | TFLOPs: 11.56 | 7: iteration 1070/ 173500 | consumed samples: 273920 | consumed tokens: 560988160 | elapsed time per iteration (s): 0.10 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 6.397103E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.349 | TFLOPs: 9.84 | 7: iteration 1080/ 173500 | consumed samples: 276480 | consumed tokens: 566231040 | elapsed time per iteration (s): 0.09 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 6.391813E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.141 | TFLOPs: 10.33 | 7: iteration 1090/ 173500 | consumed samples: 279040 | consumed tokens: 571473920 | elapsed time per iteration (s): 0.09 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 6.373508E+00 | grad norm: 1.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.659 | TFLOPs: 10.76 | 7: iteration 1100/ 173500 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.10 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 6.375050E+00 | grad norm: 0.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.587 | TFLOPs: 10.00 | 7: iteration 1110/ 173500 | consumed samples: 284160 | consumed tokens: 581959680 | elapsed time per iteration (s): 0.10 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 6.366903E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2634.514 | TFLOPs: 9.80 | 7: iteration 1120/ 173500 | consumed samples: 286720 | consumed tokens: 587202560 | elapsed time per iteration (s): 0.11 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 6.342822E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2321.699 | TFLOPs: 8.64 | 7: iteration 1130/ 173500 | consumed samples: 289280 | consumed tokens: 592445440 | elapsed time per iteration (s): 0.11 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 6.345239E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2288.218 | TFLOPs: 8.51 | 7: iteration 1140/ 173500 | consumed samples: 291840 | consumed tokens: 597688320 | elapsed time per iteration (s): 0.10 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 6.325261E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.226 | TFLOPs: 9.46 | 7: iteration 1150/ 173500 | consumed samples: 294400 | consumed tokens: 602931200 | elapsed time per iteration (s): 0.09 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 6.331089E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.781 | TFLOPs: 11.04 | 7: iteration 1160/ 173500 | consumed samples: 296960 | consumed tokens: 608174080 | elapsed time per iteration (s): 0.10 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 6.318555E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2627.249 | TFLOPs: 9.77 | 7: iteration 1170/ 173500 | consumed samples: 299520 | consumed tokens: 613416960 | elapsed time per iteration (s): 0.09 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 6.313369E+00 | grad norm: 0.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2849.884 | TFLOPs: 10.60 | 7: iteration 1180/ 173500 | consumed samples: 302080 | consumed tokens: 618659840 | elapsed time per iteration (s): 0.10 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 6.307758E+00 | grad norm: 0.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2505.212 | TFLOPs: 9.32 | 7: iteration 1190/ 173500 | consumed samples: 304640 | consumed tokens: 623902720 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 6.288097E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.612 | TFLOPs: 11.22 | 7: iteration 1200/ 173500 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.09 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 6.285722E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.758 | TFLOPs: 10.23 | 7: iteration 1210/ 173500 | consumed samples: 309760 | consumed tokens: 634388480 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 6.281465E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.489 | TFLOPs: 11.56 | 7: iteration 1220/ 173500 | consumed samples: 312320 | consumed tokens: 639631360 | elapsed time per iteration (s): 0.11 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 6.271126E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.725 | TFLOPs: 8.95 | 7: iteration 1230/ 173500 | consumed samples: 314880 | consumed tokens: 644874240 | elapsed time per iteration (s): 0.09 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 6.249391E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.363 | TFLOPs: 10.45 | 7: iteration 1240/ 173500 | consumed samples: 317440 | consumed tokens: 650117120 | elapsed time per iteration (s): 0.10 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 6.247643E+00 | grad norm: 1.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2629.076 | TFLOPs: 9.78 | 7: iteration 1250/ 173500 | consumed samples: 320000 | consumed tokens: 655360000 | elapsed time per iteration (s): 0.09 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 6.239790E+00 | grad norm: 0.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.993 | TFLOPs: 10.88 | 7: iteration 1260/ 173500 | consumed samples: 322560 | consumed tokens: 660602880 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 6.228725E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.170 | TFLOPs: 11.48 | 7: iteration 1270/ 173500 | consumed samples: 325120 | consumed tokens: 665845760 | elapsed time per iteration (s): 0.09 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 6.223351E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.781 | TFLOPs: 10.97 | 7: iteration 1280/ 173500 | consumed samples: 327680 | consumed tokens: 671088640 | elapsed time per iteration (s): 0.09 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 6.206104E+00 | grad norm: 0.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.060 | TFLOPs: 11.03 | 7: iteration 1290/ 173500 | consumed samples: 330240 | consumed tokens: 676331520 | elapsed time per iteration (s): 0.10 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 6.201073E+00 | grad norm: 0.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.566 | TFLOPs: 9.83 | 7: iteration 1300/ 173500 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.13 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 6.192402E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.589 | TFLOPs: 7.58 | 7: iteration 1310/ 173500 | consumed samples: 335360 | consumed tokens: 686817280 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 6.193263E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.977 | TFLOPs: 11.84 | 7: iteration 1320/ 173500 | consumed samples: 337920 | consumed tokens: 692060160 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 6.171162E+00 | grad norm: 1.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.190 | TFLOPs: 11.84 | 7: iteration 1330/ 173500 | consumed samples: 340480 | consumed tokens: 697303040 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 6.179985E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.149 | TFLOPs: 11.39 | 7: iteration 1340/ 173500 | consumed samples: 343040 | consumed tokens: 702545920 | elapsed time per iteration (s): 0.09 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 6.149605E+00 | grad norm: 0.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.008 | TFLOPs: 11.18 | 7: iteration 1350/ 173500 | consumed samples: 345600 | consumed tokens: 707788800 | elapsed time per iteration (s): 0.11 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 6.152230E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2422.026 | TFLOPs: 9.01 | 7: iteration 1360/ 173500 | consumed samples: 348160 | consumed tokens: 713031680 | elapsed time per iteration (s): 0.10 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 6.135158E+00 | grad norm: 1.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2599.763 | TFLOPs: 9.67 | 7: iteration 1370/ 173500 | consumed samples: 350720 | consumed tokens: 718274560 | elapsed time per iteration (s): 0.09 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 6.124847E+00 | grad norm: 0.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.543 | TFLOPs: 10.97 | 7: iteration 1380/ 173500 | consumed samples: 353280 | consumed tokens: 723517440 | elapsed time per iteration (s): 0.11 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 6.126857E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2286.141 | TFLOPs: 8.50 | 7: iteration 1390/ 173500 | consumed samples: 355840 | consumed tokens: 728760320 | elapsed time per iteration (s): 0.10 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 6.110952E+00 | grad norm: 0.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2463.404 | TFLOPs: 9.16 | 7: iteration 1400/ 173500 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.10 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 6.108604E+00 | grad norm: 0.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.780 | TFLOPs: 9.15 | 7: iteration 1410/ 173500 | consumed samples: 360960 | consumed tokens: 739246080 | elapsed time per iteration (s): 0.09 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 6.093046E+00 | grad norm: 0.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.502 | TFLOPs: 10.92 | 7: iteration 1420/ 173500 | consumed samples: 363520 | consumed tokens: 744488960 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 6.091393E+00 | grad norm: 0.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2301.587 | TFLOPs: 8.56 | 7: iteration 1430/ 173500 | consumed samples: 366080 | consumed tokens: 749731840 | elapsed time per iteration (s): 0.12 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 6.066045E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2063.757 | TFLOPs: 7.68 | 7: iteration 1440/ 173500 | consumed samples: 368640 | consumed tokens: 754974720 | elapsed time per iteration (s): 0.11 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 6.071393E+00 | grad norm: 0.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.306 | TFLOPs: 8.85 | 7: iteration 1450/ 173500 | consumed samples: 371200 | consumed tokens: 760217600 | elapsed time per iteration (s): 0.10 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 6.062561E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2552.490 | TFLOPs: 9.49 | 7: iteration 1460/ 173500 | consumed samples: 373760 | consumed tokens: 765460480 | elapsed time per iteration (s): 0.13 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 6.055293E+00 | grad norm: 1.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.568 | TFLOPs: 7.39 | 7: iteration 1470/ 173500 | consumed samples: 376320 | consumed tokens: 770703360 | elapsed time per iteration (s): 0.09 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 6.062198E+00 | grad norm: 1.000 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.547 | TFLOPs: 10.09 | 7: iteration 1480/ 173500 | consumed samples: 378880 | consumed tokens: 775946240 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 6.045110E+00 | grad norm: 0.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2369.187 | TFLOPs: 8.81 | 7: iteration 1490/ 173500 | consumed samples: 381440 | consumed tokens: 781189120 | elapsed time per iteration (s): 0.12 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 6.045717E+00 | grad norm: 0.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2129.978 | TFLOPs: 7.92 | 7: iteration 1500/ 173500 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.13 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 6.042284E+00 | grad norm: 1.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1904.503 | TFLOPs: 7.08 | 7: iteration 1510/ 173500 | consumed samples: 386560 | consumed tokens: 791674880 | elapsed time per iteration (s): 0.09 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 6.026753E+00 | grad norm: 0.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.852 | TFLOPs: 10.13 | 7: iteration 1520/ 173500 | consumed samples: 389120 | consumed tokens: 796917760 | elapsed time per iteration (s): 0.10 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 6.004533E+00 | grad norm: 1.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.952 | TFLOPs: 9.11 | 7: iteration 1530/ 173500 | consumed samples: 391680 | consumed tokens: 802160640 | elapsed time per iteration (s): 0.10 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 6.006969E+00 | grad norm: 1.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.921 | TFLOPs: 9.45 | 7: iteration 1540/ 173500 | consumed samples: 394240 | consumed tokens: 807403520 | elapsed time per iteration (s): 0.09 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 6.010149E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.485 | TFLOPs: 10.80 | 7: iteration 1550/ 173500 | consumed samples: 396800 | consumed tokens: 812646400 | elapsed time per iteration (s): 0.10 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 5.990466E+00 | grad norm: 1.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2576.876 | TFLOPs: 9.58 | 7: iteration 1560/ 173500 | consumed samples: 399360 | consumed tokens: 817889280 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 5.986457E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.800 | TFLOPs: 11.37 | 7: iteration 1570/ 173500 | consumed samples: 401920 | consumed tokens: 823132160 | elapsed time per iteration (s): 0.09 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 5.979378E+00 | grad norm: 1.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.860 | TFLOPs: 11.05 | 7: iteration 1580/ 173500 | consumed samples: 404480 | consumed tokens: 828375040 | elapsed time per iteration (s): 0.11 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 5.977639E+00 | grad norm: 1.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2281.581 | TFLOPs: 8.49 | 7: iteration 1590/ 173500 | consumed samples: 407040 | consumed tokens: 833617920 | elapsed time per iteration (s): 0.09 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 5.967768E+00 | grad norm: 1.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.337 | TFLOPs: 10.84 | 7: iteration 1600/ 173500 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.09 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 5.967826E+00 | grad norm: 1.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.977 | TFLOPs: 10.80 | 7: iteration 1610/ 173500 | consumed samples: 412160 | consumed tokens: 844103680 | elapsed time per iteration (s): 0.09 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 5.957711E+00 | grad norm: 1.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2911.152 | TFLOPs: 10.83 | 7: iteration 1620/ 173500 | consumed samples: 414720 | consumed tokens: 849346560 | elapsed time per iteration (s): 0.09 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 5.944297E+00 | grad norm: 0.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.470 | TFLOPs: 10.61 | 7: iteration 1630/ 173500 | consumed samples: 417280 | consumed tokens: 854589440 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 5.951990E+00 | grad norm: 1.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.523 | TFLOPs: 11.42 | 7: iteration 1640/ 173500 | consumed samples: 419840 | consumed tokens: 859832320 | elapsed time per iteration (s): 0.13 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 5.933984E+00 | grad norm: 1.075 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1928.795 | TFLOPs: 7.17 | 7: iteration 1650/ 173500 | consumed samples: 422400 | consumed tokens: 865075200 | elapsed time per iteration (s): 0.13 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 5.936223E+00 | grad norm: 1.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.337 | TFLOPs: 7.24 | 7: iteration 1660/ 173500 | consumed samples: 424960 | consumed tokens: 870318080 | elapsed time per iteration (s): 0.12 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 5.922206E+00 | grad norm: 1.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2140.932 | TFLOPs: 7.96 | 7: iteration 1670/ 173500 | consumed samples: 427520 | consumed tokens: 875560960 | elapsed time per iteration (s): 0.12 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 5.910814E+00 | grad norm: 1.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2188.331 | TFLOPs: 8.14 | 7: iteration 1680/ 173500 | consumed samples: 430080 | consumed tokens: 880803840 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 5.909684E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.268 | TFLOPs: 11.23 | 7: iteration 1690/ 173500 | consumed samples: 432640 | consumed tokens: 886046720 | elapsed time per iteration (s): 0.10 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 5.898523E+00 | grad norm: 1.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2684.849 | TFLOPs: 9.99 | 7: iteration 1700/ 173500 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 5.891981E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2742.915 | TFLOPs: 10.20 | 7: iteration 1710/ 173500 | consumed samples: 437760 | consumed tokens: 896532480 | elapsed time per iteration (s): 0.10 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 5.892373E+00 | grad norm: 0.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.811 | TFLOPs: 9.60 | 7: iteration 1720/ 173500 | consumed samples: 440320 | consumed tokens: 901775360 | elapsed time per iteration (s): 0.12 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 5.896083E+00 | grad norm: 1.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2083.000 | TFLOPs: 7.75 | 7: iteration 1730/ 173500 | consumed samples: 442880 | consumed tokens: 907018240 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 5.893148E+00 | grad norm: 1.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.588 | TFLOPs: 10.96 | 7: iteration 1740/ 173500 | consumed samples: 445440 | consumed tokens: 912261120 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.877443E+00 | grad norm: 0.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2466.339 | TFLOPs: 9.17 | 7: iteration 1750/ 173500 | consumed samples: 448000 | consumed tokens: 917504000 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.887426E+00 | grad norm: 1.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.828 | TFLOPs: 11.71 | 7: iteration 1760/ 173500 | consumed samples: 450560 | consumed tokens: 922746880 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.874185E+00 | grad norm: 1.076 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2482.243 | TFLOPs: 9.23 | 7: iteration 1770/ 173500 | consumed samples: 453120 | consumed tokens: 927989760 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.859644E+00 | grad norm: 1.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1988.006 | TFLOPs: 7.39 | 7: iteration 1780/ 173500 | consumed samples: 455680 | consumed tokens: 933232640 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.851468E+00 | grad norm: 1.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2035.913 | TFLOPs: 7.57 | 7: iteration 1790/ 173500 | consumed samples: 458240 | consumed tokens: 938475520 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.859966E+00 | grad norm: 0.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2062.230 | TFLOPs: 7.67 | 7: iteration 1800/ 173500 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.842934E+00 | grad norm: 0.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1921.796 | TFLOPs: 7.15 | 7: iteration 1810/ 173500 | consumed samples: 463360 | consumed tokens: 948961280 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.832273E+00 | grad norm: 0.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2264.563 | TFLOPs: 8.42 | 7: iteration 1820/ 173500 | consumed samples: 465920 | consumed tokens: 954204160 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.836972E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2220.692 | TFLOPs: 8.26 | 7: iteration 1830/ 173500 | consumed samples: 468480 | consumed tokens: 959447040 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.817697E+00 | grad norm: 1.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.244 | TFLOPs: 7.32 | 7: iteration 1840/ 173500 | consumed samples: 471040 | consumed tokens: 964689920 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.818319E+00 | grad norm: 1.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2420.063 | TFLOPs: 9.00 | 7: iteration 1850/ 173500 | consumed samples: 473600 | consumed tokens: 969932800 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.824194E+00 | grad norm: 1.142 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.451 | TFLOPs: 11.35 | 7: iteration 1860/ 173500 | consumed samples: 476160 | consumed tokens: 975175680 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.806750E+00 | grad norm: 1.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2563.240 | TFLOPs: 9.53 | 7: iteration 1870/ 173500 | consumed samples: 478720 | consumed tokens: 980418560 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.811187E+00 | grad norm: 0.872 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2391.480 | TFLOPs: 8.90 | 7: iteration 1880/ 173500 | consumed samples: 481280 | consumed tokens: 985661440 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.792597E+00 | grad norm: 1.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1923.255 | TFLOPs: 7.15 | 7: iteration 1890/ 173500 | consumed samples: 483840 | consumed tokens: 990904320 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.796085E+00 | grad norm: 0.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2116.201 | TFLOPs: 7.87 | 7: iteration 1900/ 173500 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.780457E+00 | grad norm: 1.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.848 | TFLOPs: 9.34 | 7: iteration 1910/ 173500 | consumed samples: 488960 | consumed tokens: 1001390080 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.770327E+00 | grad norm: 0.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.650 | TFLOPs: 11.60 | 7: iteration 1920/ 173500 | consumed samples: 491520 | consumed tokens: 1006632960 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.776474E+00 | grad norm: 1.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.354 | TFLOPs: 9.43 | 7: iteration 1930/ 173500 | consumed samples: 494080 | consumed tokens: 1011875840 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.783167E+00 | grad norm: 1.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.140 | TFLOPs: 10.69 | 7: iteration 1940/ 173500 | consumed samples: 496640 | consumed tokens: 1017118720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.760155E+00 | grad norm: 1.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.736 | TFLOPs: 11.99 | 7: iteration 1950/ 173500 | consumed samples: 499200 | consumed tokens: 1022361600 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.752507E+00 | grad norm: 0.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.882 | TFLOPs: 7.71 | 7: iteration 1960/ 173500 | consumed samples: 501760 | consumed tokens: 1027604480 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.774426E+00 | grad norm: 0.996 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.484 | TFLOPs: 11.28 | 7: iteration 1970/ 173500 | consumed samples: 504320 | consumed tokens: 1032847360 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.750920E+00 | grad norm: 0.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.942 | TFLOPs: 10.16 | 7: iteration 1980/ 173500 | consumed samples: 506880 | consumed tokens: 1038090240 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.754458E+00 | grad norm: 1.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2249.900 | TFLOPs: 8.37 | 7: iteration 1990/ 173500 | consumed samples: 509440 | consumed tokens: 1043333120 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.750975E+00 | grad norm: 1.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.187 | TFLOPs: 11.71 | 0: [2023-03-17 00:18:50,645] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.00019999894289482022, 0.00019999894289482022, 0.00019999894289482022], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 2000/ 173500 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.742757E+00 | grad norm: 0.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.028 | TFLOPs: 11.17 | 0: steps: 2000 loss: 5.7370 iter time (s): 0.100 samples/sec: 2551.110 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 2000 | lm loss value: 5.684403E+00 | lm loss PPL: 2.942423E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 2000 to checkpoints_14m91b100m 0: [2023-03-17 00:18:50,738] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is begin to save! 0: [2023-03-17 00:18:50,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:18:50,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:18:50,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:18:50,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:18:50,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:18:50,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:18:50,773] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:18:50,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:18:50,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:18:50,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:18:50,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:18:50,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:18:50,780] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step2000/mp_rank_00_model_states.pt 0: [2023-03-17 00:18:50,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:18:50,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:18:50,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:18:50,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: successfully saved checkpoint at iteration 2000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 76.49 3: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:18:50,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:18:50,811] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: iteration 2010/ 173500 | consumed samples: 514560 | consumed tokens: 1053818880 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.736506E+00 | grad norm: 1.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.769 | TFLOPs: 9.70 | 7: iteration 2020/ 173500 | consumed samples: 517120 | consumed tokens: 1059061760 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.725270E+00 | grad norm: 0.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2093.099 | TFLOPs: 7.79 | 7: iteration 2030/ 173500 | consumed samples: 519680 | consumed tokens: 1064304640 | elapsed time per iteration (s): 0.14 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.728940E+00 | grad norm: 0.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1871.800 | TFLOPs: 6.96 | 7: iteration 2040/ 173500 | consumed samples: 522240 | consumed tokens: 1069547520 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.717435E+00 | grad norm: 0.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2250.203 | TFLOPs: 8.37 | 7: iteration 2050/ 173500 | consumed samples: 524800 | consumed tokens: 1074790400 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.722640E+00 | grad norm: 1.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2588.764 | TFLOPs: 9.63 | 7: iteration 2060/ 173500 | consumed samples: 527360 | consumed tokens: 1080033280 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.714585E+00 | grad norm: 1.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.614 | TFLOPs: 9.17 | 7: iteration 2070/ 173500 | consumed samples: 529920 | consumed tokens: 1085276160 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.729026E+00 | grad norm: 1.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2109.511 | TFLOPs: 7.85 | 7: iteration 2080/ 173500 | consumed samples: 532480 | consumed tokens: 1090519040 | elapsed time per iteration (s): 0.16 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.716468E+00 | grad norm: 1.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1561.723 | TFLOPs: 5.81 | 7: iteration 2090/ 173500 | consumed samples: 535040 | consumed tokens: 1095761920 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.704327E+00 | grad norm: 1.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1896.428 | TFLOPs: 7.05 | 7: iteration 2100/ 173500 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.702942E+00 | grad norm: 1.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.358 | TFLOPs: 9.61 | 7: iteration 2110/ 173500 | consumed samples: 540160 | consumed tokens: 1106247680 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.680855E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.798 | TFLOPs: 10.44 | 7: iteration 2120/ 173500 | consumed samples: 542720 | consumed tokens: 1111490560 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.675366E+00 | grad norm: 1.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.702 | TFLOPs: 11.73 | 7: iteration 2130/ 173500 | consumed samples: 545280 | consumed tokens: 1116733440 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.692298E+00 | grad norm: 0.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.664 | TFLOPs: 11.67 | 7: iteration 2140/ 173500 | consumed samples: 547840 | consumed tokens: 1121976320 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.674540E+00 | grad norm: 1.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.920 | TFLOPs: 11.60 | 7: iteration 2150/ 173500 | consumed samples: 550400 | consumed tokens: 1127219200 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.674192E+00 | grad norm: 1.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.803 | TFLOPs: 9.66 | 7: iteration 2160/ 173500 | consumed samples: 552960 | consumed tokens: 1132462080 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.671308E+00 | grad norm: 1.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2737.113 | TFLOPs: 10.18 | 7: iteration 2170/ 173500 | consumed samples: 555520 | consumed tokens: 1137704960 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.665478E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.192 | TFLOPs: 10.44 | 7: iteration 2180/ 173500 | consumed samples: 558080 | consumed tokens: 1142947840 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.669361E+00 | grad norm: 1.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.045 | TFLOPs: 11.63 | 7: iteration 2190/ 173500 | consumed samples: 560640 | consumed tokens: 1148190720 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.659435E+00 | grad norm: 0.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.871 | TFLOPs: 10.42 | 7: iteration 2200/ 173500 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.655305E+00 | grad norm: 1.061 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.537 | TFLOPs: 10.40 | 7: iteration 2210/ 173500 | consumed samples: 565760 | consumed tokens: 1158676480 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.654978E+00 | grad norm: 1.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2045.039 | TFLOPs: 7.61 | 7: iteration 2220/ 173500 | consumed samples: 568320 | consumed tokens: 1163919360 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.645641E+00 | grad norm: 1.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2070.692 | TFLOPs: 7.70 | 7: iteration 2230/ 173500 | consumed samples: 570880 | consumed tokens: 1169162240 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.644172E+00 | grad norm: 1.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2032.148 | TFLOPs: 7.56 | 7: iteration 2240/ 173500 | consumed samples: 573440 | consumed tokens: 1174405120 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.628735E+00 | grad norm: 1.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2249.334 | TFLOPs: 8.37 | 7: iteration 2250/ 173500 | consumed samples: 576000 | consumed tokens: 1179648000 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.632872E+00 | grad norm: 1.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2177.305 | TFLOPs: 8.10 | 7: iteration 2260/ 173500 | consumed samples: 578560 | consumed tokens: 1184890880 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.622396E+00 | grad norm: 0.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.493 | TFLOPs: 8.72 | 7: iteration 2270/ 173500 | consumed samples: 581120 | consumed tokens: 1190133760 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.610934E+00 | grad norm: 1.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.833 | TFLOPs: 11.33 | 7: iteration 2280/ 173500 | consumed samples: 583680 | consumed tokens: 1195376640 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.617131E+00 | grad norm: 1.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2651.523 | TFLOPs: 9.86 | 7: iteration 2290/ 173500 | consumed samples: 586240 | consumed tokens: 1200619520 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.622631E+00 | grad norm: 1.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.602 | TFLOPs: 11.97 | 7: iteration 2300/ 173500 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.620346E+00 | grad norm: 1.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.798 | TFLOPs: 11.12 | 7: iteration 2310/ 173500 | consumed samples: 591360 | consumed tokens: 1211105280 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.607856E+00 | grad norm: 0.977 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.019 | TFLOPs: 9.15 | 7: iteration 2320/ 173500 | consumed samples: 593920 | consumed tokens: 1216348160 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.605803E+00 | grad norm: 1.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.761 | TFLOPs: 10.76 | 7: iteration 2330/ 173500 | consumed samples: 596480 | consumed tokens: 1221591040 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.610761E+00 | grad norm: 1.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2603.727 | TFLOPs: 9.68 | 7: iteration 2340/ 173500 | consumed samples: 599040 | consumed tokens: 1226833920 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.587607E+00 | grad norm: 1.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.765 | TFLOPs: 9.45 | 7: iteration 2350/ 173500 | consumed samples: 601600 | consumed tokens: 1232076800 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.585433E+00 | grad norm: 1.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.449 | TFLOPs: 10.29 | 7: iteration 2360/ 173500 | consumed samples: 604160 | consumed tokens: 1237319680 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.595092E+00 | grad norm: 1.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.588 | TFLOPs: 8.62 | 7: iteration 2370/ 173500 | consumed samples: 606720 | consumed tokens: 1242562560 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.587822E+00 | grad norm: 1.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.579 | TFLOPs: 11.45 | 7: iteration 2380/ 173500 | consumed samples: 609280 | consumed tokens: 1247805440 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.596957E+00 | grad norm: 1.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1935.988 | TFLOPs: 7.20 | 7: iteration 2390/ 173500 | consumed samples: 611840 | consumed tokens: 1253048320 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.588128E+00 | grad norm: 0.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2306.356 | TFLOPs: 8.58 | 7: iteration 2400/ 173500 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.569899E+00 | grad norm: 1.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.034 | TFLOPs: 9.56 | 7: iteration 2410/ 173500 | consumed samples: 616960 | consumed tokens: 1263534080 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.580296E+00 | grad norm: 1.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2712.290 | TFLOPs: 10.09 | 7: iteration 2420/ 173500 | consumed samples: 619520 | consumed tokens: 1268776960 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.576709E+00 | grad norm: 1.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.624 | TFLOPs: 9.45 | 7: iteration 2430/ 173500 | consumed samples: 622080 | consumed tokens: 1274019840 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.564801E+00 | grad norm: 0.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2765.254 | TFLOPs: 10.29 | 7: iteration 2440/ 173500 | consumed samples: 624640 | consumed tokens: 1279262720 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.573973E+00 | grad norm: 1.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.522 | TFLOPs: 10.43 | 7: iteration 2450/ 173500 | consumed samples: 627200 | consumed tokens: 1284505600 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.556945E+00 | grad norm: 1.139 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.227 | TFLOPs: 10.75 | 7: iteration 2460/ 173500 | consumed samples: 629760 | consumed tokens: 1289748480 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.566093E+00 | grad norm: 1.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.871 | TFLOPs: 9.37 | 7: iteration 2470/ 173500 | consumed samples: 632320 | consumed tokens: 1294991360 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.556795E+00 | grad norm: 1.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.137 | TFLOPs: 10.75 | 7: iteration 2480/ 173500 | consumed samples: 634880 | consumed tokens: 1300234240 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.559122E+00 | grad norm: 1.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.679 | TFLOPs: 10.03 | 7: iteration 2490/ 173500 | consumed samples: 637440 | consumed tokens: 1305477120 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.539532E+00 | grad norm: 1.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.759 | TFLOPs: 7.83 | 7: iteration 2500/ 173500 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.15 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.544690E+00 | grad norm: 1.053 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1667.090 | TFLOPs: 6.20 | 7: iteration 2510/ 173500 | consumed samples: 642560 | consumed tokens: 1315962880 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.546487E+00 | grad norm: 0.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.888 | TFLOPs: 9.64 | 7: iteration 2520/ 173500 | consumed samples: 645120 | consumed tokens: 1321205760 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.539809E+00 | grad norm: 1.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.378 | TFLOPs: 9.65 | 7: iteration 2530/ 173500 | consumed samples: 647680 | consumed tokens: 1326448640 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.536098E+00 | grad norm: 1.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2411.683 | TFLOPs: 8.97 | 7: iteration 2540/ 173500 | consumed samples: 650240 | consumed tokens: 1331691520 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.527022E+00 | grad norm: 1.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.456 | TFLOPs: 9.68 | 7: iteration 2550/ 173500 | consumed samples: 652800 | consumed tokens: 1336934400 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.521288E+00 | grad norm: 1.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.613 | TFLOPs: 8.86 | 7: iteration 2560/ 173500 | consumed samples: 655360 | consumed tokens: 1342177280 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.532978E+00 | grad norm: 1.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2311.147 | TFLOPs: 8.60 | 7: iteration 2570/ 173500 | consumed samples: 657920 | consumed tokens: 1347420160 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.525785E+00 | grad norm: 1.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2373.813 | TFLOPs: 8.83 | 7: iteration 2580/ 173500 | consumed samples: 660480 | consumed tokens: 1352663040 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.512500E+00 | grad norm: 1.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.672 | TFLOPs: 9.52 | 7: iteration 2590/ 173500 | consumed samples: 663040 | consumed tokens: 1357905920 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.524684E+00 | grad norm: 0.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2662.480 | TFLOPs: 9.90 | 7: iteration 2600/ 173500 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.518932E+00 | grad norm: 1.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.033 | TFLOPs: 10.28 | 7: iteration 2610/ 173500 | consumed samples: 668160 | consumed tokens: 1368391680 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.501476E+00 | grad norm: 1.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2165.735 | TFLOPs: 8.06 | 7: iteration 2620/ 173500 | consumed samples: 670720 | consumed tokens: 1373634560 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.513074E+00 | grad norm: 1.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.117 | TFLOPs: 12.01 | 7: iteration 2630/ 173500 | consumed samples: 673280 | consumed tokens: 1378877440 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.500846E+00 | grad norm: 0.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.503 | TFLOPs: 12.04 | 7: iteration 2640/ 173500 | consumed samples: 675840 | consumed tokens: 1384120320 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.489899E+00 | grad norm: 1.088 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2917.405 | TFLOPs: 10.85 | 7: iteration 2650/ 173500 | consumed samples: 678400 | consumed tokens: 1389363200 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.501966E+00 | grad norm: 1.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.974 | TFLOPs: 9.41 | 7: iteration 2660/ 173500 | consumed samples: 680960 | consumed tokens: 1394606080 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.487429E+00 | grad norm: 1.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.177 | TFLOPs: 11.65 | 7: iteration 2670/ 173500 | consumed samples: 683520 | consumed tokens: 1399848960 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.506659E+00 | grad norm: 0.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2264.371 | TFLOPs: 8.42 | 7: iteration 2680/ 173500 | consumed samples: 686080 | consumed tokens: 1405091840 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.492982E+00 | grad norm: 1.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.860 | TFLOPs: 10.30 | 7: iteration 2690/ 173500 | consumed samples: 688640 | consumed tokens: 1410334720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.488573E+00 | grad norm: 1.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.748 | TFLOPs: 11.45 | 7: iteration 2700/ 173500 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.474371E+00 | grad norm: 1.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.245 | TFLOPs: 11.74 | 7: iteration 2710/ 173500 | consumed samples: 693760 | consumed tokens: 1420820480 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.475779E+00 | grad norm: 1.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.380 | TFLOPs: 10.07 | 7: iteration 2720/ 173500 | consumed samples: 696320 | consumed tokens: 1426063360 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.479619E+00 | grad norm: 0.879 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.128 | TFLOPs: 9.21 | 7: iteration 2730/ 173500 | consumed samples: 698880 | consumed tokens: 1431306240 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.466128E+00 | grad norm: 1.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.607 | TFLOPs: 11.54 | 7: iteration 2740/ 173500 | consumed samples: 701440 | consumed tokens: 1436549120 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.464851E+00 | grad norm: 1.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.054 | TFLOPs: 12.07 | 7: iteration 2750/ 173500 | consumed samples: 704000 | consumed tokens: 1441792000 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.464420E+00 | grad norm: 1.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2342.699 | TFLOPs: 8.71 | 7: iteration 2760/ 173500 | consumed samples: 706560 | consumed tokens: 1447034880 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.470800E+00 | grad norm: 1.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.478 | TFLOPs: 11.03 | 7: iteration 2770/ 173500 | consumed samples: 709120 | consumed tokens: 1452277760 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.465155E+00 | grad norm: 1.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2255.017 | TFLOPs: 8.39 | 7: iteration 2780/ 173500 | consumed samples: 711680 | consumed tokens: 1457520640 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.460456E+00 | grad norm: 0.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.907 | TFLOPs: 11.92 | 7: iteration 2790/ 173500 | consumed samples: 714240 | consumed tokens: 1462763520 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.451414E+00 | grad norm: 0.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2675.609 | TFLOPs: 9.95 | 7: iteration 2800/ 173500 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.457073E+00 | grad norm: 1.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2137.850 | TFLOPs: 7.95 | 7: iteration 2810/ 173500 | consumed samples: 719360 | consumed tokens: 1473249280 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.456646E+00 | grad norm: 0.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.058 | TFLOPs: 8.56 | 7: iteration 2820/ 173500 | consumed samples: 721920 | consumed tokens: 1478492160 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.456850E+00 | grad norm: 1.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2488.955 | TFLOPs: 9.26 | 7: iteration 2830/ 173500 | consumed samples: 724480 | consumed tokens: 1483735040 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.429688E+00 | grad norm: 1.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2704.335 | TFLOPs: 10.06 | 7: iteration 2840/ 173500 | consumed samples: 727040 | consumed tokens: 1488977920 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.438720E+00 | grad norm: 0.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.883 | TFLOPs: 11.55 | 7: iteration 2850/ 173500 | consumed samples: 729600 | consumed tokens: 1494220800 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.436654E+00 | grad norm: 1.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2239.370 | TFLOPs: 8.33 | 7: iteration 2860/ 173500 | consumed samples: 732160 | consumed tokens: 1499463680 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.427943E+00 | grad norm: 1.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.714 | TFLOPs: 11.25 | 7: iteration 2870/ 173500 | consumed samples: 734720 | consumed tokens: 1504706560 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.423022E+00 | grad norm: 1.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.906 | TFLOPs: 10.96 | 7: iteration 2880/ 173500 | consumed samples: 737280 | consumed tokens: 1509949440 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.427699E+00 | grad norm: 1.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2115.875 | TFLOPs: 7.87 | 7: iteration 2890/ 173500 | consumed samples: 739840 | consumed tokens: 1515192320 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.429893E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1997.571 | TFLOPs: 7.43 | 7: iteration 2900/ 173500 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.15 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.417950E+00 | grad norm: 1.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1663.361 | TFLOPs: 6.19 | 7: iteration 2910/ 173500 | consumed samples: 744960 | consumed tokens: 1525678080 | elapsed time per iteration (s): 0.14 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.423746E+00 | grad norm: 1.010 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1878.746 | TFLOPs: 6.99 | 7: iteration 2920/ 173500 | consumed samples: 747520 | consumed tokens: 1530920960 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.427974E+00 | grad norm: 1.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.178 | TFLOPs: 10.56 | 7: iteration 2930/ 173500 | consumed samples: 750080 | consumed tokens: 1536163840 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.404134E+00 | grad norm: 1.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2401.944 | TFLOPs: 8.93 | 7: iteration 2940/ 173500 | consumed samples: 752640 | consumed tokens: 1541406720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.414856E+00 | grad norm: 1.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.304 | TFLOPs: 12.00 | 7: iteration 2950/ 173500 | consumed samples: 755200 | consumed tokens: 1546649600 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.407920E+00 | grad norm: 0.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.856 | TFLOPs: 10.92 | 7: iteration 2960/ 173500 | consumed samples: 757760 | consumed tokens: 1551892480 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.410825E+00 | grad norm: 1.146 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.957 | TFLOPs: 10.39 | 7: iteration 2970/ 173500 | consumed samples: 760320 | consumed tokens: 1557135360 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.393639E+00 | grad norm: 0.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2675.553 | TFLOPs: 9.95 | 7: iteration 2980/ 173500 | consumed samples: 762880 | consumed tokens: 1562378240 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.397597E+00 | grad norm: 1.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2640.694 | TFLOPs: 9.82 | 7: iteration 2990/ 173500 | consumed samples: 765440 | consumed tokens: 1567621120 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.393298E+00 | grad norm: 1.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.266 | TFLOPs: 9.92 | 7: iteration 3000/ 173500 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.395044E+00 | grad norm: 1.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.170 | TFLOPs: 9.97 | 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 3000 | lm loss value: 5.293815E+00 | lm loss PPL: 1.991016E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 3000 to checkpoints_14m91b100m 0: [2023-03-17 00:20:32,121] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is begin to save! 0: [2023-03-17 00:20:32,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:20:32,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:20:32,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:20:32,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:20:32,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:20:32,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:20:32,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:20:32,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:20:32,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:20:32,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:20:32,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:20:32,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:20:32,162] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step3000/mp_rank_00_model_states.pt 0: [2023-03-17 00:20:32,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:20:32,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:20:32,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:20:32,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2023-03-17 00:20:32,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:20:32,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2023-03-17 00:20:32,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:20:32,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: successfully saved checkpoint at iteration 3000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 75.58 7: iteration 3010/ 173500 | consumed samples: 770560 | consumed tokens: 1578106880 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.379874E+00 | grad norm: 1.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.570 | TFLOPs: 8.95 | 7: iteration 3020/ 173500 | consumed samples: 773120 | consumed tokens: 1583349760 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.391389E+00 | grad norm: 1.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.428 | TFLOPs: 10.81 | 7: iteration 3030/ 173500 | consumed samples: 775680 | consumed tokens: 1588592640 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.380604E+00 | grad norm: 0.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.925 | TFLOPs: 11.52 | 7: iteration 3040/ 173500 | consumed samples: 778240 | consumed tokens: 1593835520 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.386418E+00 | grad norm: 0.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2563.764 | TFLOPs: 9.54 | 7: iteration 3050/ 173500 | consumed samples: 780800 | consumed tokens: 1599078400 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.381525E+00 | grad norm: 0.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.009 | TFLOPs: 7.75 | 7: iteration 3060/ 173500 | consumed samples: 783360 | consumed tokens: 1604321280 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.370405E+00 | grad norm: 1.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.018 | TFLOPs: 10.84 | 7: iteration 3070/ 173500 | consumed samples: 785920 | consumed tokens: 1609564160 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.370542E+00 | grad norm: 1.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.810 | TFLOPs: 10.04 | 7: iteration 3080/ 173500 | consumed samples: 788480 | consumed tokens: 1614807040 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.368753E+00 | grad norm: 1.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.443 | TFLOPs: 10.28 | 7: iteration 3090/ 173500 | consumed samples: 791040 | consumed tokens: 1620049920 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.370535E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2926.210 | TFLOPs: 10.88 | 7: iteration 3100/ 173500 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.359549E+00 | grad norm: 0.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.931 | TFLOPs: 10.93 | 7: iteration 3110/ 173500 | consumed samples: 796160 | consumed tokens: 1630535680 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.357880E+00 | grad norm: 0.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.843 | TFLOPs: 11.06 | 7: iteration 3120/ 173500 | consumed samples: 798720 | consumed tokens: 1635778560 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.358184E+00 | grad norm: 0.906 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.043 | TFLOPs: 10.81 | 7: iteration 3130/ 173500 | consumed samples: 801280 | consumed tokens: 1641021440 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.364368E+00 | grad norm: 1.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.890 | TFLOPs: 11.60 | 7: iteration 3140/ 173500 | consumed samples: 803840 | consumed tokens: 1646264320 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.355029E+00 | grad norm: 1.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2367.400 | TFLOPs: 8.81 | 7: iteration 3150/ 173500 | consumed samples: 806400 | consumed tokens: 1651507200 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.353516E+00 | grad norm: 1.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.332 | TFLOPs: 11.71 | 7: iteration 3160/ 173500 | consumed samples: 808960 | consumed tokens: 1656750080 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.358483E+00 | grad norm: 0.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.236 | TFLOPs: 11.20 | 7: iteration 3170/ 173500 | consumed samples: 811520 | consumed tokens: 1661992960 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.339879E+00 | grad norm: 1.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.237 | TFLOPs: 10.03 | 7: iteration 3180/ 173500 | consumed samples: 814080 | consumed tokens: 1667235840 | elapsed time per iteration (s): 0.14 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.338830E+00 | grad norm: 0.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1864.831 | TFLOPs: 6.94 | 7: iteration 3190/ 173500 | consumed samples: 816640 | consumed tokens: 1672478720 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.353595E+00 | grad norm: 0.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2239.963 | TFLOPs: 8.33 | 7: iteration 3200/ 173500 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.329336E+00 | grad norm: 0.992 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2655.411 | TFLOPs: 9.88 | 7: iteration 3210/ 173500 | consumed samples: 821760 | consumed tokens: 1682964480 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.337885E+00 | grad norm: 0.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.648 | TFLOPs: 11.13 | 7: iteration 3220/ 173500 | consumed samples: 824320 | consumed tokens: 1688207360 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.332890E+00 | grad norm: 1.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.874 | TFLOPs: 10.45 | 7: iteration 3230/ 173500 | consumed samples: 826880 | consumed tokens: 1693450240 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.336471E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.049 | TFLOPs: 11.64 | 7: iteration 3240/ 173500 | consumed samples: 829440 | consumed tokens: 1698693120 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.329379E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.823 | TFLOPs: 11.98 | 7: iteration 3250/ 173500 | consumed samples: 832000 | consumed tokens: 1703936000 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.322070E+00 | grad norm: 1.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.002 | TFLOPs: 10.45 | 7: iteration 3260/ 173500 | consumed samples: 834560 | consumed tokens: 1709178880 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.332390E+00 | grad norm: 0.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.118 | TFLOPs: 10.44 | 7: iteration 3270/ 173500 | consumed samples: 837120 | consumed tokens: 1714421760 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.313200E+00 | grad norm: 1.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.803 | TFLOPs: 11.99 | 7: iteration 3280/ 173500 | consumed samples: 839680 | consumed tokens: 1719664640 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.315846E+00 | grad norm: 0.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.560 | TFLOPs: 12.00 | 7: iteration 3290/ 173500 | consumed samples: 842240 | consumed tokens: 1724907520 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.322649E+00 | grad norm: 1.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2646.882 | TFLOPs: 9.85 | 7: iteration 3300/ 173500 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.321069E+00 | grad norm: 0.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2343.753 | TFLOPs: 8.72 | 7: iteration 3310/ 173500 | consumed samples: 847360 | consumed tokens: 1735393280 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.302071E+00 | grad norm: 1.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2468.053 | TFLOPs: 9.18 | 7: iteration 3320/ 173500 | consumed samples: 849920 | consumed tokens: 1740636160 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.315424E+00 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.654 | TFLOPs: 10.45 | 7: iteration 3330/ 173500 | consumed samples: 852480 | consumed tokens: 1745879040 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.314186E+00 | grad norm: 1.117 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2398.245 | TFLOPs: 8.92 | 7: iteration 3340/ 173500 | consumed samples: 855040 | consumed tokens: 1751121920 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.316370E+00 | grad norm: 1.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.090 | TFLOPs: 9.81 | 7: iteration 3350/ 173500 | consumed samples: 857600 | consumed tokens: 1756364800 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.291227E+00 | grad norm: 0.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.787 | TFLOPs: 11.48 | 7: iteration 3360/ 173500 | consumed samples: 860160 | consumed tokens: 1761607680 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.296652E+00 | grad norm: 0.837 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2159.265 | TFLOPs: 8.03 | 7: iteration 3370/ 173500 | consumed samples: 862720 | consumed tokens: 1766850560 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.291756E+00 | grad norm: 1.014 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2342.566 | TFLOPs: 8.71 | 7: iteration 3380/ 173500 | consumed samples: 865280 | consumed tokens: 1772093440 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.301154E+00 | grad norm: 0.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1948.078 | TFLOPs: 7.25 | 7: iteration 3390/ 173500 | consumed samples: 867840 | consumed tokens: 1777336320 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.290117E+00 | grad norm: 1.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2035.127 | TFLOPs: 7.57 | 7: iteration 3400/ 173500 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.291004E+00 | grad norm: 1.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2111.160 | TFLOPs: 7.85 | 7: iteration 3410/ 173500 | consumed samples: 872960 | consumed tokens: 1787822080 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.303646E+00 | grad norm: 1.040 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1964.078 | TFLOPs: 7.31 | 7: iteration 3420/ 173500 | consumed samples: 875520 | consumed tokens: 1793064960 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.286946E+00 | grad norm: 0.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2660.946 | TFLOPs: 9.90 | 7: iteration 3430/ 173500 | consumed samples: 878080 | consumed tokens: 1798307840 | elapsed time per iteration (s): 0.14 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.272422E+00 | grad norm: 0.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1867.799 | TFLOPs: 6.95 | 7: iteration 3440/ 173500 | consumed samples: 880640 | consumed tokens: 1803550720 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.281960E+00 | grad norm: 0.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.820 | TFLOPs: 7.42 | 7: iteration 3450/ 173500 | consumed samples: 883200 | consumed tokens: 1808793600 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.279820E+00 | grad norm: 1.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2224.570 | TFLOPs: 8.27 | 7: iteration 3460/ 173500 | consumed samples: 885760 | consumed tokens: 1814036480 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.276995E+00 | grad norm: 0.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2421.358 | TFLOPs: 9.01 | 7: iteration 3470/ 173500 | consumed samples: 888320 | consumed tokens: 1819279360 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.270946E+00 | grad norm: 0.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.881 | TFLOPs: 11.80 | 7: iteration 3480/ 173500 | consumed samples: 890880 | consumed tokens: 1824522240 | elapsed time per iteration (s): 0.10 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.273580E+00 | grad norm: 0.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2645.887 | TFLOPs: 9.84 | 7: iteration 3490/ 173500 | consumed samples: 893440 | consumed tokens: 1829765120 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.270631E+00 | grad norm: 0.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.537 | TFLOPs: 9.05 | 7: iteration 3500/ 173500 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.266702E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.368 | TFLOPs: 11.14 | 7: iteration 3510/ 173500 | consumed samples: 898560 | consumed tokens: 1840250880 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.259370E+00 | grad norm: 1.004 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1983.828 | TFLOPs: 7.38 | 7: iteration 3520/ 173500 | consumed samples: 901120 | consumed tokens: 1845493760 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.260731E+00 | grad norm: 1.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2003.625 | TFLOPs: 7.45 | 7: iteration 3530/ 173500 | consumed samples: 903680 | consumed tokens: 1850736640 | elapsed time per iteration (s): 0.13 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.270601E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.518 | TFLOPs: 7.46 | 7: iteration 3540/ 173500 | consumed samples: 906240 | consumed tokens: 1855979520 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.267429E+00 | grad norm: 1.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2367.305 | TFLOPs: 8.81 | 7: iteration 3550/ 173500 | consumed samples: 908800 | consumed tokens: 1861222400 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 5.254902E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.666 | TFLOPs: 11.56 | 7: iteration 3560/ 173500 | consumed samples: 911360 | consumed tokens: 1866465280 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.249912E+00 | grad norm: 0.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.415 | TFLOPs: 11.22 | 7: iteration 3570/ 173500 | consumed samples: 913920 | consumed tokens: 1871708160 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.256470E+00 | grad norm: 0.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.183 | TFLOPs: 10.75 | 7: iteration 3580/ 173500 | consumed samples: 916480 | consumed tokens: 1876951040 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.254085E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.790 | TFLOPs: 10.73 | 7: iteration 3590/ 173500 | consumed samples: 919040 | consumed tokens: 1882193920 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.252571E+00 | grad norm: 1.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.479 | TFLOPs: 10.65 | 7: iteration 3600/ 173500 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.252706E+00 | grad norm: 0.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.844 | TFLOPs: 11.54 | 7: iteration 3610/ 173500 | consumed samples: 924160 | consumed tokens: 1892679680 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.252422E+00 | grad norm: 0.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2719.406 | TFLOPs: 10.11 | 7: iteration 3620/ 173500 | consumed samples: 926720 | consumed tokens: 1897922560 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.221528E+00 | grad norm: 0.975 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.498 | TFLOPs: 11.20 | 7: iteration 3630/ 173500 | consumed samples: 929280 | consumed tokens: 1903165440 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.257931E+00 | grad norm: 1.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.844 | TFLOPs: 11.89 | 7: iteration 3640/ 173500 | consumed samples: 931840 | consumed tokens: 1908408320 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.233849E+00 | grad norm: 0.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.728 | TFLOPs: 11.88 | 7: iteration 3650/ 173500 | consumed samples: 934400 | consumed tokens: 1913651200 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.250182E+00 | grad norm: 1.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.799 | TFLOPs: 11.68 | 7: iteration 3660/ 173500 | consumed samples: 936960 | consumed tokens: 1918894080 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.239356E+00 | grad norm: 0.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.176 | TFLOPs: 10.57 | 7: iteration 3670/ 173500 | consumed samples: 939520 | consumed tokens: 1924136960 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.226785E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.545 | TFLOPs: 11.54 | 7: iteration 3680/ 173500 | consumed samples: 942080 | consumed tokens: 1929379840 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.234424E+00 | grad norm: 0.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.606 | TFLOPs: 11.89 | 7: iteration 3690/ 173500 | consumed samples: 944640 | consumed tokens: 1934622720 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.228185E+00 | grad norm: 1.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.787 | TFLOPs: 10.82 | 7: iteration 3700/ 173500 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.225233E+00 | grad norm: 1.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.235 | TFLOPs: 11.00 | 7: iteration 3710/ 173500 | consumed samples: 949760 | consumed tokens: 1945108480 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.232073E+00 | grad norm: 1.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.114 | TFLOPs: 8.53 | 7: iteration 3720/ 173500 | consumed samples: 952320 | consumed tokens: 1950351360 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.225412E+00 | grad norm: 1.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.364 | TFLOPs: 11.36 | 7: iteration 3730/ 173500 | consumed samples: 954880 | consumed tokens: 1955594240 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.227750E+00 | grad norm: 0.870 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.914 | TFLOPs: 9.44 | 7: iteration 3740/ 173500 | consumed samples: 957440 | consumed tokens: 1960837120 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.222841E+00 | grad norm: 0.961 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.035 | TFLOPs: 11.86 | 7: iteration 3750/ 173500 | consumed samples: 960000 | consumed tokens: 1966080000 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.221529E+00 | grad norm: 0.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2487.423 | TFLOPs: 9.25 | 7: iteration 3760/ 173500 | consumed samples: 962560 | consumed tokens: 1971322880 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.222913E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2376.197 | TFLOPs: 8.84 | 7: iteration 3770/ 173500 | consumed samples: 965120 | consumed tokens: 1976565760 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.211776E+00 | grad norm: 0.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2146.896 | TFLOPs: 7.99 | 7: iteration 3780/ 173500 | consumed samples: 967680 | consumed tokens: 1981808640 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.198684E+00 | grad norm: 0.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1919.512 | TFLOPs: 7.14 | 7: iteration 3790/ 173500 | consumed samples: 970240 | consumed tokens: 1987051520 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.212679E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.060 | TFLOPs: 8.95 | 7: iteration 3800/ 173500 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.211039E+00 | grad norm: 0.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2039.968 | TFLOPs: 7.59 | 7: iteration 3810/ 173500 | consumed samples: 975360 | consumed tokens: 1997537280 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.221980E+00 | grad norm: 1.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2108.058 | TFLOPs: 7.84 | 7: iteration 3820/ 173500 | consumed samples: 977920 | consumed tokens: 2002780160 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.214560E+00 | grad norm: 1.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2561.408 | TFLOPs: 9.53 | 7: iteration 3830/ 173500 | consumed samples: 980480 | consumed tokens: 2008023040 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.211282E+00 | grad norm: 0.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1955.878 | TFLOPs: 7.28 | 7: iteration 3840/ 173500 | consumed samples: 983040 | consumed tokens: 2013265920 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.209523E+00 | grad norm: 0.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.164 | TFLOPs: 8.57 | 7: iteration 3850/ 173500 | consumed samples: 985600 | consumed tokens: 2018508800 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.194580E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.167 | TFLOPs: 11.89 | 7: iteration 3860/ 173500 | consumed samples: 988160 | consumed tokens: 2023751680 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.217135E+00 | grad norm: 0.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.858 | TFLOPs: 11.56 | 7: iteration 3870/ 173500 | consumed samples: 990720 | consumed tokens: 2028994560 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.187276E+00 | grad norm: 1.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2829.358 | TFLOPs: 10.52 | 7: iteration 3880/ 173500 | consumed samples: 993280 | consumed tokens: 2034237440 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.188750E+00 | grad norm: 0.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.298 | TFLOPs: 11.59 | 7: iteration 3890/ 173500 | consumed samples: 995840 | consumed tokens: 2039480320 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.194904E+00 | grad norm: 0.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.607 | TFLOPs: 10.13 | 7: iteration 3900/ 173500 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.198642E+00 | grad norm: 1.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2234.098 | TFLOPs: 8.31 | 7: iteration 3910/ 173500 | consumed samples: 1000960 | consumed tokens: 2049966080 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.195936E+00 | grad norm: 0.912 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.510 | TFLOPs: 10.31 | 7: iteration 3920/ 173500 | consumed samples: 1003520 | consumed tokens: 2055208960 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.191744E+00 | grad norm: 1.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.948 | TFLOPs: 11.87 | 7: iteration 3930/ 173500 | consumed samples: 1006080 | consumed tokens: 2060451840 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.189654E+00 | grad norm: 1.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.227 | TFLOPs: 11.86 | 7: iteration 3940/ 173500 | consumed samples: 1008640 | consumed tokens: 2065694720 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.185691E+00 | grad norm: 1.038 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2482.547 | TFLOPs: 9.23 | 7: iteration 3950/ 173500 | consumed samples: 1011200 | consumed tokens: 2070937600 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.175656E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.975 | TFLOPs: 11.96 | 7: iteration 3960/ 173500 | consumed samples: 1013760 | consumed tokens: 2076180480 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.191761E+00 | grad norm: 1.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.301 | TFLOPs: 11.49 | 7: iteration 3970/ 173500 | consumed samples: 1016320 | consumed tokens: 2081423360 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.183661E+00 | grad norm: 0.774 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.122 | TFLOPs: 10.63 | 7: iteration 3980/ 173500 | consumed samples: 1018880 | consumed tokens: 2086666240 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.179926E+00 | grad norm: 0.907 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2345.068 | TFLOPs: 8.72 | 7: iteration 3990/ 173500 | consumed samples: 1021440 | consumed tokens: 2091909120 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.180236E+00 | grad norm: 0.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.805 | TFLOPs: 10.80 | 0: [2023-03-17 00:22:10,287] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00019992278300259638, 0.00019992278300259638, 0.00019992278300259638], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 4000/ 173500 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.170660E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2429.059 | TFLOPs: 9.04 | 0: steps: 4000 loss: 5.1589 iter time (s): 0.099 samples/sec: 2589.458 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 4000 | lm loss value: 5.081276E+00 | lm loss PPL: 1.609793E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 4000 to checkpoints_14m91b100m 0: [2023-03-17 00:22:10,369] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is begin to save! 0: [2023-03-17 00:22:10,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:22:10,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:22:10,399] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:22:10,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:22:10,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:22:10,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:22:10,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:22:10,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:22:10,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:22:10,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:22:10,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:22:10,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:22:10,412] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step4000/mp_rank_00_model_states.pt 0: [2023-03-17 00:22:10,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:22:10,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:22:10,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:22:10,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:22:10,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:22:10,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:22:10,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:22:10,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2023-03-17 00:22:10,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2023-03-17 00:22:10,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:22:10,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:22:10,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: successfully saved checkpoint at iteration 4000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.30 7: iteration 4010/ 173500 | consumed samples: 1026560 | consumed tokens: 2102394880 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.174117E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1974.045 | TFLOPs: 7.34 | 7: iteration 4020/ 173500 | consumed samples: 1029120 | consumed tokens: 2107637760 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.172221E+00 | grad norm: 0.943 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.253 | TFLOPs: 11.14 | 7: iteration 4030/ 173500 | consumed samples: 1031680 | consumed tokens: 2112880640 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.171348E+00 | grad norm: 1.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.140 | TFLOPs: 10.94 | 7: iteration 4040/ 173500 | consumed samples: 1034240 | consumed tokens: 2118123520 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.165834E+00 | grad norm: 0.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.854 | TFLOPs: 10.99 | 7: iteration 4050/ 173500 | consumed samples: 1036800 | consumed tokens: 2123366400 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.167119E+00 | grad norm: 1.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2707.644 | TFLOPs: 10.07 | 7: iteration 4060/ 173500 | consumed samples: 1039360 | consumed tokens: 2128609280 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.168695E+00 | grad norm: 1.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.099 | TFLOPs: 9.57 | 7: iteration 4070/ 173500 | consumed samples: 1041920 | consumed tokens: 2133852160 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.175438E+00 | grad norm: 1.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.809 | TFLOPs: 10.78 | 7: iteration 4080/ 173500 | consumed samples: 1044480 | consumed tokens: 2139095040 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.174831E+00 | grad norm: 1.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2538.510 | TFLOPs: 9.44 | 7: iteration 4090/ 173500 | consumed samples: 1047040 | consumed tokens: 2144337920 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.171577E+00 | grad norm: 0.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.294 | TFLOPs: 11.81 | 7: iteration 4100/ 173500 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.163280E+00 | grad norm: 0.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.001 | TFLOPs: 10.60 | 7: iteration 4110/ 173500 | consumed samples: 1052160 | consumed tokens: 2154823680 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.162025E+00 | grad norm: 0.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.669 | TFLOPs: 7.75 | 7: iteration 4120/ 173500 | consumed samples: 1054720 | consumed tokens: 2160066560 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.158251E+00 | grad norm: 1.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.393 | TFLOPs: 11.94 | 7: iteration 4130/ 173500 | consumed samples: 1057280 | consumed tokens: 2165309440 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.166700E+00 | grad norm: 0.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.680 | TFLOPs: 11.19 | 7: iteration 4140/ 173500 | consumed samples: 1059840 | consumed tokens: 2170552320 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.151802E+00 | grad norm: 0.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.927 | TFLOPs: 11.28 | 7: iteration 4150/ 173500 | consumed samples: 1062400 | consumed tokens: 2175795200 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.154230E+00 | grad norm: 1.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.242 | TFLOPs: 11.17 | 7: iteration 4160/ 173500 | consumed samples: 1064960 | consumed tokens: 2181038080 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.148597E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.258 | TFLOPs: 7.88 | 7: iteration 4170/ 173500 | consumed samples: 1067520 | consumed tokens: 2186280960 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.146503E+00 | grad norm: 0.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.306 | TFLOPs: 10.23 | 7: iteration 4180/ 173500 | consumed samples: 1070080 | consumed tokens: 2191523840 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.150644E+00 | grad norm: 1.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.616 | TFLOPs: 10.05 | 7: iteration 4190/ 173500 | consumed samples: 1072640 | consumed tokens: 2196766720 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.146851E+00 | grad norm: 0.849 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2556.845 | TFLOPs: 9.51 | 7: iteration 4200/ 173500 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.152344E+00 | grad norm: 1.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2187.507 | TFLOPs: 8.14 | 7: iteration 4210/ 173500 | consumed samples: 1077760 | consumed tokens: 2207252480 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.148782E+00 | grad norm: 0.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2690.795 | TFLOPs: 10.01 | 7: iteration 4220/ 173500 | consumed samples: 1080320 | consumed tokens: 2212495360 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.150329E+00 | grad norm: 1.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.277 | TFLOPs: 11.39 | 7: iteration 4230/ 173500 | consumed samples: 1082880 | consumed tokens: 2217738240 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.145080E+00 | grad norm: 1.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.668 | TFLOPs: 10.70 | 7: iteration 4240/ 173500 | consumed samples: 1085440 | consumed tokens: 2222981120 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.144171E+00 | grad norm: 1.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2149.688 | TFLOPs: 8.00 | 7: iteration 4250/ 173500 | consumed samples: 1088000 | consumed tokens: 2228224000 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.149290E+00 | grad norm: 1.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.398 | TFLOPs: 7.99 | 7: iteration 4260/ 173500 | consumed samples: 1090560 | consumed tokens: 2233466880 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.135185E+00 | grad norm: 1.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.683 | TFLOPs: 10.15 | 7: iteration 4270/ 173500 | consumed samples: 1093120 | consumed tokens: 2238709760 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.134213E+00 | grad norm: 0.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.524 | TFLOPs: 11.06 | 7: iteration 4280/ 173500 | consumed samples: 1095680 | consumed tokens: 2243952640 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.143287E+00 | grad norm: 0.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.022 | TFLOPs: 11.87 | 7: iteration 4290/ 173500 | consumed samples: 1098240 | consumed tokens: 2249195520 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.133871E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.852 | TFLOPs: 11.59 | 7: iteration 4300/ 173500 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.142020E+00 | grad norm: 0.825 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.561 | TFLOPs: 11.81 | 7: iteration 4310/ 173500 | consumed samples: 1103360 | consumed tokens: 2259681280 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.133376E+00 | grad norm: 0.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.427 | TFLOPs: 11.81 | 7: iteration 4320/ 173500 | consumed samples: 1105920 | consumed tokens: 2264924160 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.128821E+00 | grad norm: 1.102 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2685.794 | TFLOPs: 9.99 | 7: iteration 4330/ 173500 | consumed samples: 1108480 | consumed tokens: 2270167040 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.119141E+00 | grad norm: 0.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.689 | TFLOPs: 8.63 | 7: iteration 4340/ 173500 | consumed samples: 1111040 | consumed tokens: 2275409920 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.131021E+00 | grad norm: 0.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.726 | TFLOPs: 8.75 | 7: iteration 4350/ 173500 | consumed samples: 1113600 | consumed tokens: 2280652800 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.133553E+00 | grad norm: 1.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.697 | TFLOPs: 10.96 | 7: iteration 4360/ 173500 | consumed samples: 1116160 | consumed tokens: 2285895680 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.123415E+00 | grad norm: 0.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.542 | TFLOPs: 11.84 | 7: iteration 4370/ 173500 | consumed samples: 1118720 | consumed tokens: 2291138560 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.125849E+00 | grad norm: 1.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.646 | TFLOPs: 9.83 | 7: iteration 4380/ 173500 | consumed samples: 1121280 | consumed tokens: 2296381440 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.133850E+00 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.504 | TFLOPs: 9.49 | 7: iteration 4390/ 173500 | consumed samples: 1123840 | consumed tokens: 2301624320 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.117941E+00 | grad norm: 0.856 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.626 | TFLOPs: 11.70 | 7: iteration 4400/ 173500 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.125182E+00 | grad norm: 1.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.327 | TFLOPs: 11.49 | 7: iteration 4410/ 173500 | consumed samples: 1128960 | consumed tokens: 2312110080 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.111800E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.778 | TFLOPs: 11.26 | 7: iteration 4420/ 173500 | consumed samples: 1131520 | consumed tokens: 2317352960 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.111283E+00 | grad norm: 0.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.846 | TFLOPs: 10.00 | 7: iteration 4430/ 173500 | consumed samples: 1134080 | consumed tokens: 2322595840 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.115601E+00 | grad norm: 1.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.254 | TFLOPs: 10.30 | 7: iteration 4440/ 173500 | consumed samples: 1136640 | consumed tokens: 2327838720 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.108402E+00 | grad norm: 0.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.697 | TFLOPs: 11.86 | 7: iteration 4450/ 173500 | consumed samples: 1139200 | consumed tokens: 2333081600 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.121893E+00 | grad norm: 0.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.850 | TFLOPs: 11.50 | 7: iteration 4460/ 173500 | consumed samples: 1141760 | consumed tokens: 2338324480 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.112101E+00 | grad norm: 0.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.336 | TFLOPs: 11.89 | 7: iteration 4470/ 173500 | consumed samples: 1144320 | consumed tokens: 2343567360 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.103782E+00 | grad norm: 1.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.096 | TFLOPs: 8.95 | 7: iteration 4480/ 173500 | consumed samples: 1146880 | consumed tokens: 2348810240 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.107396E+00 | grad norm: 1.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1954.266 | TFLOPs: 7.27 | 7: iteration 4490/ 173500 | consumed samples: 1149440 | consumed tokens: 2354053120 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.115820E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.975 | TFLOPs: 10.03 | 7: iteration 4500/ 173500 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.104228E+00 | grad norm: 0.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2886.699 | TFLOPs: 10.74 | 7: iteration 4510/ 173500 | consumed samples: 1154560 | consumed tokens: 2364538880 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.106251E+00 | grad norm: 0.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.907 | TFLOPs: 9.45 | 7: iteration 4520/ 173500 | consumed samples: 1157120 | consumed tokens: 2369781760 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.114186E+00 | grad norm: 1.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.107 | TFLOPs: 7.28 | 7: iteration 4530/ 173500 | consumed samples: 1159680 | consumed tokens: 2375024640 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.093341E+00 | grad norm: 1.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.170 | TFLOPs: 10.05 | 7: iteration 4540/ 173500 | consumed samples: 1162240 | consumed tokens: 2380267520 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.092617E+00 | grad norm: 1.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.094 | TFLOPs: 11.42 | 7: iteration 4550/ 173500 | consumed samples: 1164800 | consumed tokens: 2385510400 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.110507E+00 | grad norm: 1.092 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.052 | TFLOPs: 11.95 | 7: iteration 4560/ 173500 | consumed samples: 1167360 | consumed tokens: 2390753280 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.093808E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.769 | TFLOPs: 11.96 | 7: iteration 4570/ 173500 | consumed samples: 1169920 | consumed tokens: 2395996160 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.093003E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.381 | TFLOPs: 11.01 | 7: iteration 4580/ 173500 | consumed samples: 1172480 | consumed tokens: 2401239040 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.082042E+00 | grad norm: 0.996 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2682.037 | TFLOPs: 9.98 | 7: iteration 4590/ 173500 | consumed samples: 1175040 | consumed tokens: 2406481920 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.107140E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2078.832 | TFLOPs: 7.73 | 7: iteration 4600/ 173500 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.14 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.095657E+00 | grad norm: 0.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1824.528 | TFLOPs: 6.79 | 7: iteration 4610/ 173500 | consumed samples: 1180160 | consumed tokens: 2416967680 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.099820E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2330.852 | TFLOPs: 8.67 | 7: iteration 4620/ 173500 | consumed samples: 1182720 | consumed tokens: 2422210560 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.085327E+00 | grad norm: 0.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.426 | TFLOPs: 10.23 | 7: iteration 4630/ 173500 | consumed samples: 1185280 | consumed tokens: 2427453440 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.077012E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.725 | TFLOPs: 10.20 | 7: iteration 4640/ 173500 | consumed samples: 1187840 | consumed tokens: 2432696320 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.096685E+00 | grad norm: 0.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2503.376 | TFLOPs: 9.31 | 7: iteration 4650/ 173500 | consumed samples: 1190400 | consumed tokens: 2437939200 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.091292E+00 | grad norm: 0.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.398 | TFLOPs: 11.34 | 7: iteration 4660/ 173500 | consumed samples: 1192960 | consumed tokens: 2443182080 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.079425E+00 | grad norm: 0.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.238 | TFLOPs: 10.14 | 7: iteration 4670/ 173500 | consumed samples: 1195520 | consumed tokens: 2448424960 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.085104E+00 | grad norm: 1.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.335 | TFLOPs: 11.02 | 7: iteration 4680/ 173500 | consumed samples: 1198080 | consumed tokens: 2453667840 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.086056E+00 | grad norm: 0.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2331.859 | TFLOPs: 8.67 | 7: iteration 4690/ 173500 | consumed samples: 1200640 | consumed tokens: 2458910720 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.091460E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2554.528 | TFLOPs: 9.50 | 7: iteration 4700/ 173500 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.12 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.084164E+00 | grad norm: 1.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.390 | TFLOPs: 8.02 | 7: iteration 4710/ 173500 | consumed samples: 1205760 | consumed tokens: 2469396480 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.090275E+00 | grad norm: 1.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.336 | TFLOPs: 7.46 | 7: iteration 4720/ 173500 | consumed samples: 1208320 | consumed tokens: 2474639360 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.078356E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.394 | TFLOPs: 11.90 | 7: iteration 4730/ 173500 | consumed samples: 1210880 | consumed tokens: 2479882240 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.077245E+00 | grad norm: 1.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.420 | TFLOPs: 9.15 | 7: iteration 4740/ 173500 | consumed samples: 1213440 | consumed tokens: 2485125120 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.076222E+00 | grad norm: 1.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.760 | TFLOPs: 7.42 | 7: iteration 4750/ 173500 | consumed samples: 1216000 | consumed tokens: 2490368000 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.077097E+00 | grad norm: 1.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.932 | TFLOPs: 11.37 | 7: iteration 4760/ 173500 | consumed samples: 1218560 | consumed tokens: 2495610880 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.073811E+00 | grad norm: 1.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.925 | TFLOPs: 9.29 | 7: iteration 4770/ 173500 | consumed samples: 1221120 | consumed tokens: 2500853760 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.070033E+00 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.934 | TFLOPs: 11.88 | 7: iteration 4780/ 173500 | consumed samples: 1223680 | consumed tokens: 2506096640 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.073386E+00 | grad norm: 0.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.676 | TFLOPs: 11.66 | 7: iteration 4790/ 173500 | consumed samples: 1226240 | consumed tokens: 2511339520 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.084410E+00 | grad norm: 0.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2312.387 | TFLOPs: 8.60 | 7: iteration 4800/ 173500 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.066454E+00 | grad norm: 0.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.107 | TFLOPs: 9.64 | 7: iteration 4810/ 173500 | consumed samples: 1231360 | consumed tokens: 2521825280 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.073129E+00 | grad norm: 1.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.784 | TFLOPs: 11.55 | 7: iteration 4820/ 173500 | consumed samples: 1233920 | consumed tokens: 2527068160 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.074676E+00 | grad norm: 0.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2464.127 | TFLOPs: 9.17 | 7: iteration 4830/ 173500 | consumed samples: 1236480 | consumed tokens: 2532311040 | elapsed time per iteration (s): 0.13 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.072881E+00 | grad norm: 1.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2046.027 | TFLOPs: 7.61 | 7: iteration 4840/ 173500 | consumed samples: 1239040 | consumed tokens: 2537553920 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.069605E+00 | grad norm: 1.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2617.539 | TFLOPs: 9.74 | 7: iteration 4850/ 173500 | consumed samples: 1241600 | consumed tokens: 2542796800 | elapsed time per iteration (s): 0.09 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.070792E+00 | grad norm: 0.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2929.533 | TFLOPs: 10.90 | 7: iteration 4860/ 173500 | consumed samples: 1244160 | consumed tokens: 2548039680 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.069197E+00 | grad norm: 1.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.990 | TFLOPs: 11.50 | 7: iteration 4870/ 173500 | consumed samples: 1246720 | consumed tokens: 2553282560 | elapsed time per iteration (s): 0.08 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.067170E+00 | grad norm: 1.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.864 | TFLOPs: 11.60 | 7: iteration 4880/ 173500 | consumed samples: 1249280 | consumed tokens: 2558525440 | elapsed time per iteration (s): 0.11 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.061739E+00 | grad norm: 0.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2229.090 | TFLOPs: 8.29 | 7: iteration 4890/ 173500 | consumed samples: 1251840 | consumed tokens: 2563768320 | elapsed time per iteration (s): 0.10 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 5.061071E+00 | grad norm: 0.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2484.838 | TFLOPs: 9.24 | 7: iteration 4900/ 173500 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.13 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.055615E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.477 | TFLOPs: 7.28 | 7: iteration 4910/ 173500 | consumed samples: 1256960 | consumed tokens: 2574254080 | elapsed time per iteration (s): 0.12 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.061045E+00 | grad norm: 0.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2125.097 | TFLOPs: 7.90 | 7: iteration 4920/ 173500 | consumed samples: 1259520 | consumed tokens: 2579496960 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.059440E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2439.479 | TFLOPs: 9.07 | 7: iteration 4930/ 173500 | consumed samples: 1262080 | consumed tokens: 2584739840 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.054872E+00 | grad norm: 1.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.090 | TFLOPs: 9.83 | 7: iteration 4940/ 173500 | consumed samples: 1264640 | consumed tokens: 2589982720 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.049322E+00 | grad norm: 0.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2631.253 | TFLOPs: 9.79 | 7: iteration 4950/ 173500 | consumed samples: 1267200 | consumed tokens: 2595225600 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.044777E+00 | grad norm: 0.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2435.749 | TFLOPs: 9.06 | 7: iteration 4960/ 173500 | consumed samples: 1269760 | consumed tokens: 2600468480 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.039424E+00 | grad norm: 1.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2655.724 | TFLOPs: 9.88 | 7: iteration 4970/ 173500 | consumed samples: 1272320 | consumed tokens: 2605711360 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.058424E+00 | grad norm: 0.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2682.311 | TFLOPs: 9.98 | 7: iteration 4980/ 173500 | consumed samples: 1274880 | consumed tokens: 2610954240 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.050321E+00 | grad norm: 0.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2895.016 | TFLOPs: 10.77 | 7: iteration 4990/ 173500 | consumed samples: 1277440 | consumed tokens: 2616197120 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.042021E+00 | grad norm: 0.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2333.011 | TFLOPs: 8.68 | 7: iteration 5000/ 173500 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.056871E+00 | grad norm: 0.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.187 | TFLOPs: 11.51 | 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 5000 | lm loss value: 4.922135E+00 | lm loss PPL: 1.372954E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 5000 to checkpoints_14m91b100m 0: [2023-03-17 00:23:47,611] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is begin to save! 0: [2023-03-17 00:23:47,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:23:47,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:23:47,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:23:47,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:23:47,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:23:47,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:23:47,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:23:47,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:23:47,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:23:47,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:23:47,652] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:23:47,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:23:47,653] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step5000/mp_rank_00_model_states.pt 0: [2023-03-17 00:23:47,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:23:47,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:23:47,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:23:47,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:23:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2023-03-17 00:23:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:23:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2023-03-17 00:23:47,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:23:47,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:23:47,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: successfully saved checkpoint at iteration 5000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 129.59 7: iteration 5010/ 173500 | consumed samples: 1282560 | consumed tokens: 2626682880 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.049949E+00 | grad norm: 0.894 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2354.882 | TFLOPs: 8.76 | 7: iteration 5020/ 173500 | consumed samples: 1285120 | consumed tokens: 2631925760 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.039352E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.526 | TFLOPs: 11.28 | 7: iteration 5030/ 173500 | consumed samples: 1287680 | consumed tokens: 2637168640 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.040896E+00 | grad norm: 0.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.705 | TFLOPs: 8.38 | 7: iteration 5040/ 173500 | consumed samples: 1290240 | consumed tokens: 2642411520 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.033702E+00 | grad norm: 0.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2568.425 | TFLOPs: 9.55 | 7: iteration 5050/ 173500 | consumed samples: 1292800 | consumed tokens: 2647654400 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.043560E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.164 | TFLOPs: 12.03 | 7: iteration 5060/ 173500 | consumed samples: 1295360 | consumed tokens: 2652897280 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.039254E+00 | grad norm: 0.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.045 | TFLOPs: 11.36 | 7: iteration 5070/ 173500 | consumed samples: 1297920 | consumed tokens: 2658140160 | elapsed time per iteration (s): 0.12 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.036843E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2096.298 | TFLOPs: 7.80 | 7: iteration 5080/ 173500 | consumed samples: 1300480 | consumed tokens: 2663383040 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.039005E+00 | grad norm: 1.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2373.302 | TFLOPs: 8.83 | 7: iteration 5090/ 173500 | consumed samples: 1303040 | consumed tokens: 2668625920 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.043630E+00 | grad norm: 0.815 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2412.197 | TFLOPs: 8.97 | 7: iteration 5100/ 173500 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.034118E+00 | grad norm: 1.077 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.480 | TFLOPs: 9.69 | 7: iteration 5110/ 173500 | consumed samples: 1308160 | consumed tokens: 2679111680 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.043525E+00 | grad norm: 0.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.503 | TFLOPs: 10.68 | 7: iteration 5120/ 173500 | consumed samples: 1310720 | consumed tokens: 2684354560 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.030004E+00 | grad norm: 0.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2374.673 | TFLOPs: 8.83 | 7: iteration 5130/ 173500 | consumed samples: 1313280 | consumed tokens: 2689597440 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.033495E+00 | grad norm: 1.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.908 | TFLOPs: 9.84 | 7: iteration 5140/ 173500 | consumed samples: 1315840 | consumed tokens: 2694840320 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.036653E+00 | grad norm: 1.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.091 | TFLOPs: 12.00 | 7: iteration 5150/ 173500 | consumed samples: 1318400 | consumed tokens: 2700083200 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.032353E+00 | grad norm: 1.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.695 | TFLOPs: 10.86 | 7: iteration 5160/ 173500 | consumed samples: 1320960 | consumed tokens: 2705326080 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.032847E+00 | grad norm: 1.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.033 | TFLOPs: 10.71 | 7: iteration 5170/ 173500 | consumed samples: 1323520 | consumed tokens: 2710568960 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.021780E+00 | grad norm: 1.011 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.600 | TFLOPs: 11.99 | 7: iteration 5180/ 173500 | consumed samples: 1326080 | consumed tokens: 2715811840 | elapsed time per iteration (s): 0.14 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.033153E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1886.220 | TFLOPs: 7.02 | 7: iteration 5190/ 173500 | consumed samples: 1328640 | consumed tokens: 2721054720 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.030709E+00 | grad norm: 0.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.534 | TFLOPs: 11.66 | 7: iteration 5200/ 173500 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.032687E+00 | grad norm: 1.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.750 | TFLOPs: 11.86 | 7: iteration 5210/ 173500 | consumed samples: 1333760 | consumed tokens: 2731540480 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.026959E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.675 | TFLOPs: 10.09 | 7: iteration 5220/ 173500 | consumed samples: 1336320 | consumed tokens: 2736783360 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.015731E+00 | grad norm: 0.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.601 | TFLOPs: 10.50 | 7: iteration 5230/ 173500 | consumed samples: 1338880 | consumed tokens: 2742026240 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.012065E+00 | grad norm: 1.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.485 | TFLOPs: 11.88 | 7: iteration 5240/ 173500 | consumed samples: 1341440 | consumed tokens: 2747269120 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.034103E+00 | grad norm: 1.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.485 | TFLOPs: 11.86 | 7: iteration 5250/ 173500 | consumed samples: 1344000 | consumed tokens: 2752512000 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.016584E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.024 | TFLOPs: 11.61 | 7: iteration 5260/ 173500 | consumed samples: 1346560 | consumed tokens: 2757754880 | elapsed time per iteration (s): 0.12 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.019198E+00 | grad norm: 0.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2158.423 | TFLOPs: 8.03 | 7: iteration 5270/ 173500 | consumed samples: 1349120 | consumed tokens: 2762997760 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.026112E+00 | grad norm: 1.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.220 | TFLOPs: 10.92 | 7: iteration 5280/ 173500 | consumed samples: 1351680 | consumed tokens: 2768240640 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.011570E+00 | grad norm: 0.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.060 | TFLOPs: 10.60 | 7: iteration 5290/ 173500 | consumed samples: 1354240 | consumed tokens: 2773483520 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.011213E+00 | grad norm: 1.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.689 | TFLOPs: 11.76 | 7: iteration 5300/ 173500 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.019749E+00 | grad norm: 0.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.409 | TFLOPs: 10.13 | 7: iteration 5310/ 173500 | consumed samples: 1359360 | consumed tokens: 2783969280 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.014592E+00 | grad norm: 1.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2819.166 | TFLOPs: 10.49 | 7: iteration 5320/ 173500 | consumed samples: 1361920 | consumed tokens: 2789212160 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.012493E+00 | grad norm: 0.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.661 | TFLOPs: 11.75 | 7: iteration 5330/ 173500 | consumed samples: 1364480 | consumed tokens: 2794455040 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.013298E+00 | grad norm: 0.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2526.579 | TFLOPs: 9.40 | 7: iteration 5340/ 173500 | consumed samples: 1367040 | consumed tokens: 2799697920 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.007483E+00 | grad norm: 0.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2437.300 | TFLOPs: 9.07 | 7: iteration 5350/ 173500 | consumed samples: 1369600 | consumed tokens: 2804940800 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.014027E+00 | grad norm: 0.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.269 | TFLOPs: 11.96 | 7: iteration 5360/ 173500 | consumed samples: 1372160 | consumed tokens: 2810183680 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.008076E+00 | grad norm: 0.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.102 | TFLOPs: 11.61 | 7: iteration 5370/ 173500 | consumed samples: 1374720 | consumed tokens: 2815426560 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.002035E+00 | grad norm: 0.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.661 | TFLOPs: 11.54 | 7: iteration 5380/ 173500 | consumed samples: 1377280 | consumed tokens: 2820669440 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.011276E+00 | grad norm: 0.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2629.619 | TFLOPs: 9.78 | 7: iteration 5390/ 173500 | consumed samples: 1379840 | consumed tokens: 2825912320 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.005520E+00 | grad norm: 0.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.710 | TFLOPs: 10.15 | 7: iteration 5400/ 173500 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.008574E+00 | grad norm: 1.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.427 | TFLOPs: 9.64 | 7: iteration 5410/ 173500 | consumed samples: 1384960 | consumed tokens: 2836398080 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.011551E+00 | grad norm: 1.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.708 | TFLOPs: 11.05 | 7: iteration 5420/ 173500 | consumed samples: 1387520 | consumed tokens: 2841640960 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.016182E+00 | grad norm: 1.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.222 | TFLOPs: 9.47 | 7: iteration 5430/ 173500 | consumed samples: 1390080 | consumed tokens: 2846883840 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.997132E+00 | grad norm: 1.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.927 | TFLOPs: 8.91 | 7: iteration 5440/ 173500 | consumed samples: 1392640 | consumed tokens: 2852126720 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.009742E+00 | grad norm: 1.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2611.989 | TFLOPs: 9.72 | 7: iteration 5450/ 173500 | consumed samples: 1395200 | consumed tokens: 2857369600 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.010324E+00 | grad norm: 1.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2822.414 | TFLOPs: 10.50 | 7: iteration 5460/ 173500 | consumed samples: 1397760 | consumed tokens: 2862612480 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.002116E+00 | grad norm: 1.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.386 | TFLOPs: 11.49 | 7: iteration 5470/ 173500 | consumed samples: 1400320 | consumed tokens: 2867855360 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.010895E+00 | grad norm: 1.103 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2721.903 | TFLOPs: 10.12 | 7: iteration 5480/ 173500 | consumed samples: 1402880 | consumed tokens: 2873098240 | elapsed time per iteration (s): 0.13 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.013667E+00 | grad norm: 0.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1988.439 | TFLOPs: 7.40 | 7: iteration 5490/ 173500 | consumed samples: 1405440 | consumed tokens: 2878341120 | elapsed time per iteration (s): 0.12 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.989478E+00 | grad norm: 0.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2195.952 | TFLOPs: 8.17 | 7: iteration 5500/ 173500 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.990915E+00 | grad norm: 1.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2377.063 | TFLOPs: 8.84 | 7: iteration 5510/ 173500 | consumed samples: 1410560 | consumed tokens: 2888826880 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.002219E+00 | grad norm: 1.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.925 | TFLOPs: 8.63 | 7: iteration 5520/ 173500 | consumed samples: 1413120 | consumed tokens: 2894069760 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.002155E+00 | grad norm: 0.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2270.380 | TFLOPs: 8.44 | 7: iteration 5530/ 173500 | consumed samples: 1415680 | consumed tokens: 2899312640 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.998429E+00 | grad norm: 0.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2696.900 | TFLOPs: 10.03 | 7: iteration 5540/ 173500 | consumed samples: 1418240 | consumed tokens: 2904555520 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.991441E+00 | grad norm: 1.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.642 | TFLOPs: 10.66 | 7: iteration 5550/ 173500 | consumed samples: 1420800 | consumed tokens: 2909798400 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.990793E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.174 | TFLOPs: 11.54 | 7: iteration 5560/ 173500 | consumed samples: 1423360 | consumed tokens: 2915041280 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.992656E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.078 | TFLOPs: 9.49 | 7: iteration 5570/ 173500 | consumed samples: 1425920 | consumed tokens: 2920284160 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.991364E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.325 | TFLOPs: 10.00 | 7: iteration 5580/ 173500 | consumed samples: 1428480 | consumed tokens: 2925527040 | elapsed time per iteration (s): 0.13 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.991357E+00 | grad norm: 0.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2039.377 | TFLOPs: 7.59 | 7: iteration 5590/ 173500 | consumed samples: 1431040 | consumed tokens: 2930769920 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.979905E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.169 | TFLOPs: 9.53 | 7: iteration 5600/ 173500 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.992957E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.548 | TFLOPs: 11.97 | 7: iteration 5610/ 173500 | consumed samples: 1436160 | consumed tokens: 2941255680 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.981514E+00 | grad norm: 1.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.555 | TFLOPs: 10.51 | 7: iteration 5620/ 173500 | consumed samples: 1438720 | consumed tokens: 2946498560 | elapsed time per iteration (s): 0.12 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.986171E+00 | grad norm: 1.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2111.591 | TFLOPs: 7.85 | 7: iteration 5630/ 173500 | consumed samples: 1441280 | consumed tokens: 2951741440 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.996039E+00 | grad norm: 0.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2895.403 | TFLOPs: 10.77 | 7: iteration 5640/ 173500 | consumed samples: 1443840 | consumed tokens: 2956984320 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.981607E+00 | grad norm: 1.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.006 | TFLOPs: 11.50 | 7: iteration 5650/ 173500 | consumed samples: 1446400 | consumed tokens: 2962227200 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.985693E+00 | grad norm: 1.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.819 | TFLOPs: 8.82 | 7: iteration 5660/ 173500 | consumed samples: 1448960 | consumed tokens: 2967470080 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.987996E+00 | grad norm: 0.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.828 | TFLOPs: 9.53 | 7: iteration 5670/ 173500 | consumed samples: 1451520 | consumed tokens: 2972712960 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.984443E+00 | grad norm: 0.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.963 | TFLOPs: 12.03 | 7: iteration 5680/ 173500 | consumed samples: 1454080 | consumed tokens: 2977955840 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.972397E+00 | grad norm: 0.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.109 | TFLOPs: 10.30 | 7: iteration 5690/ 173500 | consumed samples: 1456640 | consumed tokens: 2983198720 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 5.001849E+00 | grad norm: 0.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.702 | TFLOPs: 10.00 | 7: iteration 5700/ 173500 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.994470E+00 | grad norm: 1.027 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.316 | TFLOPs: 11.35 | 7: iteration 5710/ 173500 | consumed samples: 1461760 | consumed tokens: 2993684480 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.978726E+00 | grad norm: 1.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2333.930 | TFLOPs: 8.68 | 7: iteration 5720/ 173500 | consumed samples: 1464320 | consumed tokens: 2998927360 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.965010E+00 | grad norm: 0.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.544 | TFLOPs: 9.68 | 7: iteration 5730/ 173500 | consumed samples: 1466880 | consumed tokens: 3004170240 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.982586E+00 | grad norm: 1.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.718 | TFLOPs: 10.40 | 7: iteration 5740/ 173500 | consumed samples: 1469440 | consumed tokens: 3009413120 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.980608E+00 | grad norm: 1.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.745 | TFLOPs: 10.41 | 7: iteration 5750/ 173500 | consumed samples: 1472000 | consumed tokens: 3014656000 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.972678E+00 | grad norm: 0.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.175 | TFLOPs: 10.79 | 7: iteration 5760/ 173500 | consumed samples: 1474560 | consumed tokens: 3019898880 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.981126E+00 | grad norm: 1.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.623 | TFLOPs: 11.17 | 7: iteration 5770/ 173500 | consumed samples: 1477120 | consumed tokens: 3025141760 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.980681E+00 | grad norm: 0.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2512.758 | TFLOPs: 9.35 | 7: iteration 5780/ 173500 | consumed samples: 1479680 | consumed tokens: 3030384640 | elapsed time per iteration (s): 0.08 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.991671E+00 | grad norm: 0.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.447 | TFLOPs: 11.82 | 7: iteration 5790/ 173500 | consumed samples: 1482240 | consumed tokens: 3035627520 | elapsed time per iteration (s): 0.09 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.973397E+00 | grad norm: 1.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2763.426 | TFLOPs: 10.28 | 7: iteration 5800/ 173500 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.11 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.971079E+00 | grad norm: 1.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2262.424 | TFLOPs: 8.42 | 7: iteration 5810/ 173500 | consumed samples: 1487360 | consumed tokens: 3046113280 | elapsed time per iteration (s): 0.10 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 4.970362E+00 | grad norm: 0.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.493 | TFLOPs: 9.22 | 7: iteration 5820/ 173500 | consumed samples: 1489920 | consumed tokens: 3051356160 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.959541E+00 | grad norm: 1.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.314 | TFLOPs: 9.33 | 7: iteration 5830/ 173500 | consumed samples: 1492480 | consumed tokens: 3056599040 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.971918E+00 | grad norm: 0.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.155 | TFLOPs: 11.85 | 7: iteration 5840/ 173500 | consumed samples: 1495040 | consumed tokens: 3061841920 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.978125E+00 | grad norm: 0.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.384 | TFLOPs: 9.60 | 7: iteration 5850/ 173500 | consumed samples: 1497600 | consumed tokens: 3067084800 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.967382E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.365 | TFLOPs: 9.74 | 7: iteration 5860/ 173500 | consumed samples: 1500160 | consumed tokens: 3072327680 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.960458E+00 | grad norm: 1.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.213 | TFLOPs: 9.12 | 7: iteration 5870/ 173500 | consumed samples: 1502720 | consumed tokens: 3077570560 | elapsed time per iteration (s): 0.13 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.973841E+00 | grad norm: 1.136 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1959.905 | TFLOPs: 7.29 | 7: iteration 5880/ 173500 | consumed samples: 1505280 | consumed tokens: 3082813440 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.964212E+00 | grad norm: 1.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.966 | TFLOPs: 11.87 | 7: iteration 5890/ 173500 | consumed samples: 1507840 | consumed tokens: 3088056320 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.974686E+00 | grad norm: 1.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.354 | TFLOPs: 11.20 | 7: iteration 5900/ 173500 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.965859E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.588 | TFLOPs: 11.54 | 7: iteration 5910/ 173500 | consumed samples: 1512960 | consumed tokens: 3098542080 | elapsed time per iteration (s): 0.13 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.967380E+00 | grad norm: 0.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1934.736 | TFLOPs: 7.20 | 7: iteration 5920/ 173500 | consumed samples: 1515520 | consumed tokens: 3103784960 | elapsed time per iteration (s): 0.14 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.961033E+00 | grad norm: 1.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1798.510 | TFLOPs: 6.69 | 7: iteration 5930/ 173500 | consumed samples: 1518080 | consumed tokens: 3109027840 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.969408E+00 | grad norm: 0.827 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.548 | TFLOPs: 9.80 | 7: iteration 5940/ 173500 | consumed samples: 1520640 | consumed tokens: 3114270720 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.966069E+00 | grad norm: 0.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.271 | TFLOPs: 8.91 | 7: iteration 5950/ 173500 | consumed samples: 1523200 | consumed tokens: 3119513600 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.961601E+00 | grad norm: 1.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.741 | TFLOPs: 10.63 | 7: iteration 5960/ 173500 | consumed samples: 1525760 | consumed tokens: 3124756480 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.968021E+00 | grad norm: 0.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2450.936 | TFLOPs: 9.12 | 7: iteration 5970/ 173500 | consumed samples: 1528320 | consumed tokens: 3129999360 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.960500E+00 | grad norm: 0.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.221 | TFLOPs: 11.93 | 7: iteration 5980/ 173500 | consumed samples: 1530880 | consumed tokens: 3135242240 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.953007E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.949 | TFLOPs: 9.66 | 7: iteration 5990/ 173500 | consumed samples: 1533440 | consumed tokens: 3140485120 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.953905E+00 | grad norm: 0.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2703.066 | TFLOPs: 10.05 | 0: [2023-03-17 00:25:24,319] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.0001997263111243839, 0.0001997263111243839, 0.0001997263111243839], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 6000/ 173500 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.13 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.960215E+00 | grad norm: 0.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.426 | TFLOPs: 7.46 | 0: steps: 6000 loss: 4.9420 iter time (s): 0.096 samples/sec: 2675.616 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 6000 | lm loss value: 4.822213E+00 | lm loss PPL: 1.242397E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 6000 to checkpoints_14m91b100m 0: [2023-03-17 00:25:24,401] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step6000 is begin to save! 0: [2023-03-17 00:25:24,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:25:24,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:25:24,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:25:24,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:25:24,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:25:24,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:25:24,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:25:24,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:25:24,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:25:24,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:25:24,441] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:25:24,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:25:24,443] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step6000/mp_rank_00_model_states.pt 0: [2023-03-17 00:25:24,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:25:24,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:25:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:25:24,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 00:25:24,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:25:24,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:25:24,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2023-03-17 00:25:24,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:25:24,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: successfully saved checkpoint at iteration 6000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.73 7: iteration 6010/ 173500 | consumed samples: 1538560 | consumed tokens: 3150970880 | elapsed time per iteration (s): 0.12 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.948482E+00 | grad norm: 0.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2157.507 | TFLOPs: 8.02 | 7: iteration 6020/ 173500 | consumed samples: 1541120 | consumed tokens: 3156213760 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.958880E+00 | grad norm: 1.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.463 | TFLOPs: 11.98 | 7: iteration 6030/ 173500 | consumed samples: 1543680 | consumed tokens: 3161456640 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.967610E+00 | grad norm: 1.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.839 | TFLOPs: 10.00 | 7: iteration 6040/ 173500 | consumed samples: 1546240 | consumed tokens: 3166699520 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.950665E+00 | grad norm: 1.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.432 | TFLOPs: 11.45 | 7: iteration 6050/ 173500 | consumed samples: 1548800 | consumed tokens: 3171942400 | elapsed time per iteration (s): 0.13 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.958168E+00 | grad norm: 0.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1982.852 | TFLOPs: 7.38 | 7: iteration 6060/ 173500 | consumed samples: 1551360 | consumed tokens: 3177185280 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.963745E+00 | grad norm: 0.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.253 | TFLOPs: 10.31 | 7: iteration 6070/ 173500 | consumed samples: 1553920 | consumed tokens: 3182428160 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.950570E+00 | grad norm: 0.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.248 | TFLOPs: 11.79 | 7: iteration 6080/ 173500 | consumed samples: 1556480 | consumed tokens: 3187671040 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.942399E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.865 | TFLOPs: 9.09 | 7: iteration 6090/ 173500 | consumed samples: 1559040 | consumed tokens: 3192913920 | elapsed time per iteration (s): 0.13 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.944909E+00 | grad norm: 0.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1911.958 | TFLOPs: 7.11 | 7: iteration 6100/ 173500 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.961758E+00 | grad norm: 0.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2394.074 | TFLOPs: 8.90 | 7: iteration 6110/ 173500 | consumed samples: 1564160 | consumed tokens: 3203399680 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.951700E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.579 | TFLOPs: 10.29 | 7: iteration 6120/ 173500 | consumed samples: 1566720 | consumed tokens: 3208642560 | elapsed time per iteration (s): 0.12 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.943851E+00 | grad norm: 1.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2113.797 | TFLOPs: 7.86 | 7: iteration 6130/ 173500 | consumed samples: 1569280 | consumed tokens: 3213885440 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.946590E+00 | grad norm: 0.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.870 | TFLOPs: 10.68 | 7: iteration 6140/ 173500 | consumed samples: 1571840 | consumed tokens: 3219128320 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.949218E+00 | grad norm: 1.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.175 | TFLOPs: 11.19 | 7: iteration 6150/ 173500 | consumed samples: 1574400 | consumed tokens: 3224371200 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.950570E+00 | grad norm: 1.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.294 | TFLOPs: 10.56 | 7: iteration 6160/ 173500 | consumed samples: 1576960 | consumed tokens: 3229614080 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.934516E+00 | grad norm: 0.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.477 | TFLOPs: 11.76 | 7: iteration 6170/ 173500 | consumed samples: 1579520 | consumed tokens: 3234856960 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.941483E+00 | grad norm: 1.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.752 | TFLOPs: 11.71 | 7: iteration 6180/ 173500 | consumed samples: 1582080 | consumed tokens: 3240099840 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.941212E+00 | grad norm: 0.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2822.630 | TFLOPs: 10.50 | 7: iteration 6190/ 173500 | consumed samples: 1584640 | consumed tokens: 3245342720 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.959045E+00 | grad norm: 1.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.609 | TFLOPs: 11.77 | 7: iteration 6200/ 173500 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.956219E+00 | grad norm: 0.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2606.831 | TFLOPs: 9.70 | 7: iteration 6210/ 173500 | consumed samples: 1589760 | consumed tokens: 3255828480 | elapsed time per iteration (s): 0.12 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.947797E+00 | grad norm: 1.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.981 | TFLOPs: 8.05 | 7: iteration 6220/ 173500 | consumed samples: 1592320 | consumed tokens: 3261071360 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.944131E+00 | grad norm: 1.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2822.202 | TFLOPs: 10.50 | 7: iteration 6230/ 173500 | consumed samples: 1594880 | consumed tokens: 3266314240 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.942589E+00 | grad norm: 1.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.188 | TFLOPs: 11.71 | 7: iteration 6240/ 173500 | consumed samples: 1597440 | consumed tokens: 3271557120 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.950986E+00 | grad norm: 1.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.644 | TFLOPs: 9.41 | 7: iteration 6250/ 173500 | consumed samples: 1600000 | consumed tokens: 3276800000 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.941444E+00 | grad norm: 1.156 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.065 | TFLOPs: 8.44 | 7: iteration 6260/ 173500 | consumed samples: 1602560 | consumed tokens: 3282042880 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.929394E+00 | grad norm: 0.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.059 | TFLOPs: 11.92 | 7: iteration 6270/ 173500 | consumed samples: 1605120 | consumed tokens: 3287285760 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.935849E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.643 | TFLOPs: 10.96 | 7: iteration 6280/ 173500 | consumed samples: 1607680 | consumed tokens: 3292528640 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.931413E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2321.741 | TFLOPs: 8.64 | 7: iteration 6290/ 173500 | consumed samples: 1610240 | consumed tokens: 3297771520 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.935912E+00 | grad norm: 0.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2418.420 | TFLOPs: 9.00 | 7: iteration 6300/ 173500 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.938462E+00 | grad norm: 0.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.630 | TFLOPs: 11.90 | 7: iteration 6310/ 173500 | consumed samples: 1615360 | consumed tokens: 3308257280 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.939458E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.280 | TFLOPs: 11.86 | 7: iteration 6320/ 173500 | consumed samples: 1617920 | consumed tokens: 3313500160 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.926608E+00 | grad norm: 1.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2270.791 | TFLOPs: 8.45 | 7: iteration 6330/ 173500 | consumed samples: 1620480 | consumed tokens: 3318743040 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.944097E+00 | grad norm: 0.882 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2284.166 | TFLOPs: 8.50 | 7: iteration 6340/ 173500 | consumed samples: 1623040 | consumed tokens: 3323985920 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.934100E+00 | grad norm: 1.049 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.476 | TFLOPs: 9.61 | 7: iteration 6350/ 173500 | consumed samples: 1625600 | consumed tokens: 3329228800 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.924756E+00 | grad norm: 1.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.990 | TFLOPs: 11.78 | 7: iteration 6360/ 173500 | consumed samples: 1628160 | consumed tokens: 3334471680 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.925904E+00 | grad norm: 1.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.098 | TFLOPs: 11.51 | 7: iteration 6370/ 173500 | consumed samples: 1630720 | consumed tokens: 3339714560 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.935477E+00 | grad norm: 1.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.416 | TFLOPs: 11.81 | 7: iteration 6380/ 173500 | consumed samples: 1633280 | consumed tokens: 3344957440 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.929432E+00 | grad norm: 0.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.597 | TFLOPs: 11.94 | 7: iteration 6390/ 173500 | consumed samples: 1635840 | consumed tokens: 3350200320 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.939991E+00 | grad norm: 1.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.794 | TFLOPs: 9.26 | 7: iteration 6400/ 173500 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.926790E+00 | grad norm: 1.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2282.946 | TFLOPs: 8.49 | 7: iteration 6410/ 173500 | consumed samples: 1640960 | consumed tokens: 3360686080 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.929419E+00 | grad norm: 1.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.741 | TFLOPs: 11.37 | 7: iteration 6420/ 173500 | consumed samples: 1643520 | consumed tokens: 3365928960 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.931281E+00 | grad norm: 1.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.597 | TFLOPs: 8.32 | 7: iteration 6430/ 173500 | consumed samples: 1646080 | consumed tokens: 3371171840 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.931193E+00 | grad norm: 0.994 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.440 | TFLOPs: 11.58 | 7: iteration 6440/ 173500 | consumed samples: 1648640 | consumed tokens: 3376414720 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.921699E+00 | grad norm: 1.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.290 | TFLOPs: 10.27 | 7: iteration 6450/ 173500 | consumed samples: 1651200 | consumed tokens: 3381657600 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.923390E+00 | grad norm: 1.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.716 | TFLOPs: 11.83 | 7: iteration 6460/ 173500 | consumed samples: 1653760 | consumed tokens: 3386900480 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.924302E+00 | grad norm: 1.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.120 | TFLOPs: 10.26 | 7: iteration 6470/ 173500 | consumed samples: 1656320 | consumed tokens: 3392143360 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.923521E+00 | grad norm: 1.037 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2846.536 | TFLOPs: 10.59 | 7: iteration 6480/ 173500 | consumed samples: 1658880 | consumed tokens: 3397386240 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.927472E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.014 | TFLOPs: 11.20 | 7: iteration 6490/ 173500 | consumed samples: 1661440 | consumed tokens: 3402629120 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.928296E+00 | grad norm: 0.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.857 | TFLOPs: 11.87 | 7: iteration 6500/ 173500 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.935406E+00 | grad norm: 0.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2291.133 | TFLOPs: 8.52 | 7: iteration 6510/ 173500 | consumed samples: 1666560 | consumed tokens: 3413114880 | elapsed time per iteration (s): 0.12 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.924000E+00 | grad norm: 0.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2074.935 | TFLOPs: 7.72 | 7: iteration 6520/ 173500 | consumed samples: 1669120 | consumed tokens: 3418357760 | elapsed time per iteration (s): 0.09 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.923166E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.738 | TFLOPs: 10.10 | 7: iteration 6530/ 173500 | consumed samples: 1671680 | consumed tokens: 3423600640 | elapsed time per iteration (s): 0.08 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.921707E+00 | grad norm: 0.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.964 | TFLOPs: 11.90 | 7: iteration 6540/ 173500 | consumed samples: 1674240 | consumed tokens: 3428843520 | elapsed time per iteration (s): 0.10 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.913216E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.284 | TFLOPs: 9.65 | 7: iteration 6550/ 173500 | consumed samples: 1676800 | consumed tokens: 3434086400 | elapsed time per iteration (s): 0.11 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 4.919132E+00 | grad norm: 0.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.921 | TFLOPs: 8.99 | 7: iteration 6560/ 173500 | consumed samples: 1679360 | consumed tokens: 3439329280 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.928553E+00 | grad norm: 0.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.376 | TFLOPs: 8.56 | 7: iteration 6570/ 173500 | consumed samples: 1681920 | consumed tokens: 3444572160 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.925134E+00 | grad norm: 0.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2514.536 | TFLOPs: 9.35 | 7: iteration 6580/ 173500 | consumed samples: 1684480 | consumed tokens: 3449815040 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.918505E+00 | grad norm: 1.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.059 | TFLOPs: 9.64 | 7: iteration 6590/ 173500 | consumed samples: 1687040 | consumed tokens: 3455057920 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.917447E+00 | grad norm: 1.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2787.355 | TFLOPs: 10.37 | 7: iteration 6600/ 173500 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.927468E+00 | grad norm: 1.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.035 | TFLOPs: 10.53 | 7: iteration 6610/ 173500 | consumed samples: 1692160 | consumed tokens: 3465543680 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.913537E+00 | grad norm: 0.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2853.813 | TFLOPs: 10.61 | 7: iteration 6620/ 173500 | consumed samples: 1694720 | consumed tokens: 3470786560 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.909282E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.402 | TFLOPs: 11.19 | 7: iteration 6630/ 173500 | consumed samples: 1697280 | consumed tokens: 3476029440 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.913469E+00 | grad norm: 1.118 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.061 | TFLOPs: 11.81 | 7: iteration 6640/ 173500 | consumed samples: 1699840 | consumed tokens: 3481272320 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.914674E+00 | grad norm: 1.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.411 | TFLOPs: 8.34 | 7: iteration 6650/ 173500 | consumed samples: 1702400 | consumed tokens: 3486515200 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.912389E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2312.323 | TFLOPs: 8.60 | 7: iteration 6660/ 173500 | consumed samples: 1704960 | consumed tokens: 3491758080 | elapsed time per iteration (s): 0.12 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.908294E+00 | grad norm: 0.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.431 | TFLOPs: 8.10 | 7: iteration 6670/ 173500 | consumed samples: 1707520 | consumed tokens: 3497000960 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.908778E+00 | grad norm: 1.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.333 | TFLOPs: 8.55 | 7: iteration 6680/ 173500 | consumed samples: 1710080 | consumed tokens: 3502243840 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.919132E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1914.292 | TFLOPs: 7.12 | 7: iteration 6690/ 173500 | consumed samples: 1712640 | consumed tokens: 3507486720 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.904805E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.281 | TFLOPs: 10.33 | 7: iteration 6700/ 173500 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.920173E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.856 | TFLOPs: 11.92 | 7: iteration 6710/ 173500 | consumed samples: 1717760 | consumed tokens: 3517972480 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.911152E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2451.499 | TFLOPs: 9.12 | 7: iteration 6720/ 173500 | consumed samples: 1720320 | consumed tokens: 3523215360 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.920662E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.732 | TFLOPs: 10.70 | 7: iteration 6730/ 173500 | consumed samples: 1722880 | consumed tokens: 3528458240 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.907329E+00 | grad norm: 1.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.611 | TFLOPs: 11.79 | 7: iteration 6740/ 173500 | consumed samples: 1725440 | consumed tokens: 3533701120 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.912841E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.888 | TFLOPs: 11.79 | 7: iteration 6750/ 173500 | consumed samples: 1728000 | consumed tokens: 3538944000 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.913148E+00 | grad norm: 1.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2308.782 | TFLOPs: 8.59 | 7: iteration 6760/ 173500 | consumed samples: 1730560 | consumed tokens: 3544186880 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.914475E+00 | grad norm: 0.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.192 | TFLOPs: 7.32 | 7: iteration 6770/ 173500 | consumed samples: 1733120 | consumed tokens: 3549429760 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.900068E+00 | grad norm: 0.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.421 | TFLOPs: 9.60 | 7: iteration 6780/ 173500 | consumed samples: 1735680 | consumed tokens: 3554672640 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.905515E+00 | grad norm: 1.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.285 | TFLOPs: 8.66 | 7: iteration 6790/ 173500 | consumed samples: 1738240 | consumed tokens: 3559915520 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.903272E+00 | grad norm: 1.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.141 | TFLOPs: 11.78 | 7: iteration 6800/ 173500 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.892682E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.701 | TFLOPs: 8.53 | 7: iteration 6810/ 173500 | consumed samples: 1743360 | consumed tokens: 3570401280 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.900095E+00 | grad norm: 0.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.438 | TFLOPs: 9.41 | 7: iteration 6820/ 173500 | consumed samples: 1745920 | consumed tokens: 3575644160 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.905238E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.902 | TFLOPs: 11.81 | 7: iteration 6830/ 173500 | consumed samples: 1748480 | consumed tokens: 3580887040 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.892657E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.980 | TFLOPs: 11.84 | 7: iteration 6840/ 173500 | consumed samples: 1751040 | consumed tokens: 3586129920 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.908725E+00 | grad norm: 0.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.672 | TFLOPs: 11.40 | 7: iteration 6850/ 173500 | consumed samples: 1753600 | consumed tokens: 3591372800 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.902993E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.479 | TFLOPs: 10.56 | 7: iteration 6860/ 173500 | consumed samples: 1756160 | consumed tokens: 3596615680 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.907571E+00 | grad norm: 0.915 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2493.108 | TFLOPs: 9.27 | 7: iteration 6870/ 173500 | consumed samples: 1758720 | consumed tokens: 3601858560 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.896181E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.618 | TFLOPs: 9.17 | 7: iteration 6880/ 173500 | consumed samples: 1761280 | consumed tokens: 3607101440 | elapsed time per iteration (s): 0.12 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.898523E+00 | grad norm: 0.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2087.284 | TFLOPs: 7.76 | 7: iteration 6890/ 173500 | consumed samples: 1763840 | consumed tokens: 3612344320 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.899997E+00 | grad norm: 1.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2238.134 | TFLOPs: 8.32 | 7: iteration 6900/ 173500 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.902570E+00 | grad norm: 0.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1956.788 | TFLOPs: 7.28 | 7: iteration 6910/ 173500 | consumed samples: 1768960 | consumed tokens: 3622830080 | elapsed time per iteration (s): 0.12 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.901720E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2208.589 | TFLOPs: 8.21 | 7: iteration 6920/ 173500 | consumed samples: 1771520 | consumed tokens: 3628072960 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.896486E+00 | grad norm: 1.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2609.956 | TFLOPs: 9.71 | 7: iteration 6930/ 173500 | consumed samples: 1774080 | consumed tokens: 3633315840 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.905936E+00 | grad norm: 1.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.715 | TFLOPs: 11.20 | 7: iteration 6940/ 173500 | consumed samples: 1776640 | consumed tokens: 3638558720 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.891615E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.585 | TFLOPs: 11.83 | 7: iteration 6950/ 173500 | consumed samples: 1779200 | consumed tokens: 3643801600 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.906522E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.401 | TFLOPs: 11.80 | 7: iteration 6960/ 173500 | consumed samples: 1781760 | consumed tokens: 3649044480 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.895071E+00 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.835 | TFLOPs: 11.91 | 7: iteration 6970/ 173500 | consumed samples: 1784320 | consumed tokens: 3654287360 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.893720E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2558.235 | TFLOPs: 9.52 | 7: iteration 6980/ 173500 | consumed samples: 1786880 | consumed tokens: 3659530240 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.893689E+00 | grad norm: 0.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2351.886 | TFLOPs: 8.75 | 7: iteration 6990/ 173500 | consumed samples: 1789440 | consumed tokens: 3664773120 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.891596E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.221 | TFLOPs: 11.35 | 7: iteration 7000/ 173500 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.892033E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.285 | TFLOPs: 10.22 | 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 7000 | lm loss value: 4.770740E+00 | lm loss PPL: 1.180065E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 7000 to checkpoints_14m91b100m 0: [2023-03-17 00:27:01,415] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step7000 is begin to save! 0: [2023-03-17 00:27:01,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:27:01,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:27:01,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:27:01,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:27:01,447] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:27:01,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:27:01,450] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:27:01,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:27:01,453] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:27:01,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:27:01,456] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:27:01,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:27:01,457] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step7000/mp_rank_00_model_states.pt 0: [2023-03-17 00:27:01,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:27:01,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:27:01,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:27:01,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:27:01,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2023-03-17 00:27:01,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:27:01,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2023-03-17 00:27:01,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:27:01,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: successfully saved checkpoint at iteration 7000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.65 7: iteration 7010/ 173500 | consumed samples: 1794560 | consumed tokens: 3675258880 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.896130E+00 | grad norm: 1.017 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.870 | TFLOPs: 10.22 | 7: iteration 7020/ 173500 | consumed samples: 1797120 | consumed tokens: 3680501760 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.894729E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.828 | TFLOPs: 10.54 | 7: iteration 7030/ 173500 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.890722E+00 | grad norm: 0.997 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.043 | TFLOPs: 11.10 | 7: iteration 7040/ 173500 | consumed samples: 1802240 | consumed tokens: 3690987520 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.895831E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2362.563 | TFLOPs: 8.79 | 7: iteration 7050/ 173500 | consumed samples: 1804800 | consumed tokens: 3696230400 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.897252E+00 | grad norm: 0.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2363.908 | TFLOPs: 8.79 | 7: iteration 7060/ 173500 | consumed samples: 1807360 | consumed tokens: 3701473280 | elapsed time per iteration (s): 0.12 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.895292E+00 | grad norm: 1.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2049.722 | TFLOPs: 7.62 | 7: iteration 7070/ 173500 | consumed samples: 1809920 | consumed tokens: 3706716160 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.898433E+00 | grad norm: 0.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2034.435 | TFLOPs: 7.57 | 7: iteration 7080/ 173500 | consumed samples: 1812480 | consumed tokens: 3711959040 | elapsed time per iteration (s): 0.10 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.901662E+00 | grad norm: 0.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2614.911 | TFLOPs: 9.73 | 7: iteration 7090/ 173500 | consumed samples: 1815040 | consumed tokens: 3717201920 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.888704E+00 | grad norm: 1.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.692 | TFLOPs: 11.92 | 7: iteration 7100/ 173500 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.893568E+00 | grad norm: 1.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.535 | TFLOPs: 11.32 | 7: iteration 7110/ 173500 | consumed samples: 1820160 | consumed tokens: 3727687680 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.886829E+00 | grad norm: 0.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2307.811 | TFLOPs: 8.58 | 7: iteration 7120/ 173500 | consumed samples: 1822720 | consumed tokens: 3732930560 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.891795E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.540 | TFLOPs: 7.21 | 7: iteration 7130/ 173500 | consumed samples: 1825280 | consumed tokens: 3738173440 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.892714E+00 | grad norm: 0.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2230.050 | TFLOPs: 8.29 | 7: iteration 7140/ 173500 | consumed samples: 1827840 | consumed tokens: 3743416320 | elapsed time per iteration (s): 0.12 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.901736E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2070.931 | TFLOPs: 7.70 | 7: iteration 7150/ 173500 | consumed samples: 1830400 | consumed tokens: 3748659200 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.890836E+00 | grad norm: 0.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.509 | TFLOPs: 11.15 | 7: iteration 7160/ 173500 | consumed samples: 1832960 | consumed tokens: 3753902080 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.880840E+00 | grad norm: 1.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2889.304 | TFLOPs: 10.75 | 7: iteration 7170/ 173500 | consumed samples: 1835520 | consumed tokens: 3759144960 | elapsed time per iteration (s): 0.11 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.890802E+00 | grad norm: 1.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2262.790 | TFLOPs: 8.42 | 7: iteration 7180/ 173500 | consumed samples: 1838080 | consumed tokens: 3764387840 | elapsed time per iteration (s): 0.13 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.885947E+00 | grad norm: 1.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.322 | TFLOPs: 7.50 | 7: iteration 7190/ 173500 | consumed samples: 1840640 | consumed tokens: 3769630720 | elapsed time per iteration (s): 0.09 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.876641E+00 | grad norm: 1.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.876 | TFLOPs: 10.21 | 7: iteration 7200/ 173500 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.08 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 4.882569E+00 | grad norm: 1.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.948 | TFLOPs: 11.81 | 7: iteration 7210/ 173500 | consumed samples: 1845760 | consumed tokens: 3780116480 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.883557E+00 | grad norm: 1.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.439 | TFLOPs: 10.53 | 7: iteration 7220/ 173500 | consumed samples: 1848320 | consumed tokens: 3785359360 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.884615E+00 | grad norm: 1.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.805 | TFLOPs: 11.86 | 7: iteration 7230/ 173500 | consumed samples: 1850880 | consumed tokens: 3790602240 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.897670E+00 | grad norm: 0.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.273 | TFLOPs: 11.54 | 7: iteration 7240/ 173500 | consumed samples: 1853440 | consumed tokens: 3795845120 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.896348E+00 | grad norm: 0.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.085 | TFLOPs: 11.82 | 7: iteration 7250/ 173500 | consumed samples: 1856000 | consumed tokens: 3801088000 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.886302E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.957 | TFLOPs: 9.33 | 7: iteration 7260/ 173500 | consumed samples: 1858560 | consumed tokens: 3806330880 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.895997E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.130 | TFLOPs: 10.86 | 7: iteration 7270/ 173500 | consumed samples: 1861120 | consumed tokens: 3811573760 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.878019E+00 | grad norm: 0.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.178 | TFLOPs: 12.00 | 7: iteration 7280/ 173500 | consumed samples: 1863680 | consumed tokens: 3816816640 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.885080E+00 | grad norm: 0.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.579 | TFLOPs: 11.35 | 7: iteration 7290/ 173500 | consumed samples: 1866240 | consumed tokens: 3822059520 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.884468E+00 | grad norm: 0.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2900.958 | TFLOPs: 10.79 | 7: iteration 7300/ 173500 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.878188E+00 | grad norm: 0.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.718 | TFLOPs: 9.46 | 7: iteration 7310/ 173500 | consumed samples: 1871360 | consumed tokens: 3832545280 | elapsed time per iteration (s): 0.12 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.870491E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2049.293 | TFLOPs: 7.62 | 7: iteration 7320/ 173500 | consumed samples: 1873920 | consumed tokens: 3837788160 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.875697E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.186 | TFLOPs: 8.41 | 7: iteration 7330/ 173500 | consumed samples: 1876480 | consumed tokens: 3843031040 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.885191E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2472.106 | TFLOPs: 9.20 | 7: iteration 7340/ 173500 | consumed samples: 1879040 | consumed tokens: 3848273920 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.885778E+00 | grad norm: 0.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2316.419 | TFLOPs: 8.62 | 7: iteration 7350/ 173500 | consumed samples: 1881600 | consumed tokens: 3853516800 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.873475E+00 | grad norm: 1.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.605 | TFLOPs: 8.85 | 7: iteration 7360/ 173500 | consumed samples: 1884160 | consumed tokens: 3858759680 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.871677E+00 | grad norm: 0.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2331.214 | TFLOPs: 8.67 | 7: iteration 7370/ 173500 | consumed samples: 1886720 | consumed tokens: 3864002560 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.878648E+00 | grad norm: 1.120 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.440 | TFLOPs: 8.32 | 7: iteration 7380/ 173500 | consumed samples: 1889280 | consumed tokens: 3869245440 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.881371E+00 | grad norm: 1.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.629 | TFLOPs: 11.99 | 7: iteration 7390/ 173500 | consumed samples: 1891840 | consumed tokens: 3874488320 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.874738E+00 | grad norm: 0.897 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.908 | TFLOPs: 11.96 | 7: iteration 7400/ 173500 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.872831E+00 | grad norm: 0.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.045 | TFLOPs: 12.01 | 7: iteration 7410/ 173500 | consumed samples: 1896960 | consumed tokens: 3884974080 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.874643E+00 | grad norm: 1.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.431 | TFLOPs: 11.62 | 7: iteration 7420/ 173500 | consumed samples: 1899520 | consumed tokens: 3890216960 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.887073E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.941 | TFLOPs: 11.85 | 7: iteration 7430/ 173500 | consumed samples: 1902080 | consumed tokens: 3895459840 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.877165E+00 | grad norm: 0.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2383.296 | TFLOPs: 8.86 | 7: iteration 7440/ 173500 | consumed samples: 1904640 | consumed tokens: 3900702720 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.876535E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2384.652 | TFLOPs: 8.87 | 7: iteration 7450/ 173500 | consumed samples: 1907200 | consumed tokens: 3905945600 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.881144E+00 | grad norm: 1.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.350 | TFLOPs: 11.06 | 7: iteration 7460/ 173500 | consumed samples: 1909760 | consumed tokens: 3911188480 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.879556E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.530 | TFLOPs: 11.95 | 7: iteration 7470/ 173500 | consumed samples: 1912320 | consumed tokens: 3916431360 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.871767E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.209 | TFLOPs: 12.00 | 7: iteration 7480/ 173500 | consumed samples: 1914880 | consumed tokens: 3921674240 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.875687E+00 | grad norm: 0.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.095 | TFLOPs: 11.02 | 7: iteration 7490/ 173500 | consumed samples: 1917440 | consumed tokens: 3926917120 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.872138E+00 | grad norm: 0.936 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.592 | TFLOPs: 8.50 | 7: iteration 7500/ 173500 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.868665E+00 | grad norm: 1.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.058 | TFLOPs: 11.91 | 7: iteration 7510/ 173500 | consumed samples: 1922560 | consumed tokens: 3937402880 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.871743E+00 | grad norm: 1.096 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.398 | TFLOPs: 10.04 | 7: iteration 7520/ 173500 | consumed samples: 1925120 | consumed tokens: 3942645760 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.878215E+00 | grad norm: 0.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.094 | TFLOPs: 10.41 | 7: iteration 7530/ 173500 | consumed samples: 1927680 | consumed tokens: 3947888640 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.871242E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.313 | TFLOPs: 12.03 | 7: iteration 7540/ 173500 | consumed samples: 1930240 | consumed tokens: 3953131520 | elapsed time per iteration (s): 0.12 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.867474E+00 | grad norm: 0.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2171.474 | TFLOPs: 8.08 | 7: iteration 7550/ 173500 | consumed samples: 1932800 | consumed tokens: 3958374400 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.870996E+00 | grad norm: 0.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2312.958 | TFLOPs: 8.60 | 7: iteration 7560/ 173500 | consumed samples: 1935360 | consumed tokens: 3963617280 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.868453E+00 | grad norm: 0.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.380 | TFLOPs: 11.58 | 7: iteration 7570/ 173500 | consumed samples: 1937920 | consumed tokens: 3968860160 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.884784E+00 | grad norm: 0.964 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.581 | TFLOPs: 11.97 | 7: iteration 7580/ 173500 | consumed samples: 1940480 | consumed tokens: 3974103040 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.877092E+00 | grad norm: 1.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2662.076 | TFLOPs: 9.90 | 7: iteration 7590/ 173500 | consumed samples: 1943040 | consumed tokens: 3979345920 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.857557E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.388 | TFLOPs: 11.99 | 7: iteration 7600/ 173500 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.872028E+00 | grad norm: 1.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.755 | TFLOPs: 11.81 | 7: iteration 7610/ 173500 | consumed samples: 1948160 | consumed tokens: 3989831680 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.870183E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.726 | TFLOPs: 12.02 | 7: iteration 7620/ 173500 | consumed samples: 1950720 | consumed tokens: 3995074560 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.861658E+00 | grad norm: 0.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.002 | TFLOPs: 12.06 | 7: iteration 7630/ 173500 | consumed samples: 1953280 | consumed tokens: 4000317440 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.866359E+00 | grad norm: 0.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.783 | TFLOPs: 11.69 | 7: iteration 7640/ 173500 | consumed samples: 1955840 | consumed tokens: 4005560320 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.872742E+00 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.758 | TFLOPs: 11.93 | 7: iteration 7650/ 173500 | consumed samples: 1958400 | consumed tokens: 4010803200 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.867102E+00 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2976.307 | TFLOPs: 11.07 | 7: iteration 7660/ 173500 | consumed samples: 1960960 | consumed tokens: 4016046080 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.862740E+00 | grad norm: 0.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.711 | TFLOPs: 11.06 | 7: iteration 7670/ 173500 | consumed samples: 1963520 | consumed tokens: 4021288960 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.870454E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.097 | TFLOPs: 9.96 | 7: iteration 7680/ 173500 | consumed samples: 1966080 | consumed tokens: 4026531840 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.867696E+00 | grad norm: 1.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.892 | TFLOPs: 10.27 | 7: iteration 7690/ 173500 | consumed samples: 1968640 | consumed tokens: 4031774720 | elapsed time per iteration (s): 0.08 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.865725E+00 | grad norm: 0.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.293 | TFLOPs: 11.65 | 7: iteration 7700/ 173500 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.863811E+00 | grad norm: 1.005 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.076 | TFLOPs: 9.41 | 7: iteration 7710/ 173500 | consumed samples: 1973760 | consumed tokens: 4042260480 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.869112E+00 | grad norm: 0.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.639 | TFLOPs: 9.92 | 7: iteration 7720/ 173500 | consumed samples: 1976320 | consumed tokens: 4047503360 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.865466E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2602.234 | TFLOPs: 9.68 | 7: iteration 7730/ 173500 | consumed samples: 1978880 | consumed tokens: 4052746240 | elapsed time per iteration (s): 0.11 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.856466E+00 | grad norm: 1.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2290.887 | TFLOPs: 8.52 | 7: iteration 7740/ 173500 | consumed samples: 1981440 | consumed tokens: 4057989120 | elapsed time per iteration (s): 0.12 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.872525E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.614 | TFLOPs: 7.78 | 7: iteration 7750/ 173500 | consumed samples: 1984000 | consumed tokens: 4063232000 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.863029E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2505.799 | TFLOPs: 9.32 | 7: iteration 7760/ 173500 | consumed samples: 1986560 | consumed tokens: 4068474880 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.858551E+00 | grad norm: 1.025 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.086 | TFLOPs: 10.42 | 7: iteration 7770/ 173500 | consumed samples: 1989120 | consumed tokens: 4073717760 | elapsed time per iteration (s): 0.09 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.866740E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.225 | TFLOPs: 10.79 | 7: iteration 7780/ 173500 | consumed samples: 1991680 | consumed tokens: 4078960640 | elapsed time per iteration (s): 0.10 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 4.861697E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.323 | TFLOPs: 9.29 | 7: iteration 7790/ 173500 | consumed samples: 1994240 | consumed tokens: 4084203520 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.871511E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.694 | TFLOPs: 10.94 | 7: iteration 7800/ 173500 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.846786E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.269 | TFLOPs: 11.99 | 7: iteration 7810/ 173500 | consumed samples: 1999360 | consumed tokens: 4094689280 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.853751E+00 | grad norm: 1.018 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2547.873 | TFLOPs: 9.48 | 7: iteration 7820/ 173500 | consumed samples: 2001920 | consumed tokens: 4099932160 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.870036E+00 | grad norm: 0.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.888 | TFLOPs: 9.52 | 7: iteration 7830/ 173500 | consumed samples: 2004480 | consumed tokens: 4105175040 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.867380E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.045 | TFLOPs: 12.04 | 7: iteration 7840/ 173500 | consumed samples: 2007040 | consumed tokens: 4110417920 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.856728E+00 | grad norm: 0.881 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2665.448 | TFLOPs: 9.91 | 7: iteration 7850/ 173500 | consumed samples: 2009600 | consumed tokens: 4115660800 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.853683E+00 | grad norm: 0.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.239 | TFLOPs: 11.97 | 7: iteration 7860/ 173500 | consumed samples: 2012160 | consumed tokens: 4120903680 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.855846E+00 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.994 | TFLOPs: 10.44 | 7: iteration 7870/ 173500 | consumed samples: 2014720 | consumed tokens: 4126146560 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.860382E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.617 | TFLOPs: 9.41 | 7: iteration 7880/ 173500 | consumed samples: 2017280 | consumed tokens: 4131389440 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.860115E+00 | grad norm: 0.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.333 | TFLOPs: 10.33 | 7: iteration 7890/ 173500 | consumed samples: 2019840 | consumed tokens: 4136632320 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.860681E+00 | grad norm: 0.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2606.206 | TFLOPs: 9.69 | 7: iteration 7900/ 173500 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.858121E+00 | grad norm: 1.030 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.315 | TFLOPs: 10.48 | 7: iteration 7910/ 173500 | consumed samples: 2024960 | consumed tokens: 4147118080 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.854642E+00 | grad norm: 1.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.028 | TFLOPs: 10.45 | 7: iteration 7920/ 173500 | consumed samples: 2027520 | consumed tokens: 4152360960 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.866895E+00 | grad norm: 0.887 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.332 | TFLOPs: 10.63 | 7: iteration 7930/ 173500 | consumed samples: 2030080 | consumed tokens: 4157603840 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.850512E+00 | grad norm: 1.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.551 | TFLOPs: 12.06 | 7: iteration 7940/ 173500 | consumed samples: 2032640 | consumed tokens: 4162846720 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.864687E+00 | grad norm: 0.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.519 | TFLOPs: 12.03 | 7: iteration 7950/ 173500 | consumed samples: 2035200 | consumed tokens: 4168089600 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.862542E+00 | grad norm: 0.751 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.568 | TFLOPs: 10.36 | 7: iteration 7960/ 173500 | consumed samples: 2037760 | consumed tokens: 4173332480 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.850297E+00 | grad norm: 1.178 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.344 | TFLOPs: 10.45 | 7: iteration 7970/ 173500 | consumed samples: 2040320 | consumed tokens: 4178575360 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848681E+00 | grad norm: 0.952 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.107 | TFLOPs: 12.03 | 7: iteration 7980/ 173500 | consumed samples: 2042880 | consumed tokens: 4183818240 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.854601E+00 | grad norm: 0.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.152 | TFLOPs: 12.00 | 7: iteration 7990/ 173500 | consumed samples: 2045440 | consumed tokens: 4189061120 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848451E+00 | grad norm: 1.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.835 | TFLOPs: 12.05 | 0: [2023-03-17 00:28:35,260] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[0.00019940979012929202, 0.00019940979012929202, 0.00019940979012929202], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 8000/ 173500 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.852095E+00 | grad norm: 0.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.614 | TFLOPs: 12.02 | 0: steps: 8000 loss: 4.8738 iter time (s): 0.094 samples/sec: 2718.465 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 8000 | lm loss value: 4.712807E+00 | lm loss PPL: 1.113643E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 8000 to checkpoints_14m91b100m 0: [2023-03-17 00:28:35,316] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is begin to save! 0: [2023-03-17 00:28:35,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:28:35,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:28:35,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:28:35,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:28:35,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:28:35,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:28:35,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:28:35,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:28:35,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:28:35,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:28:35,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:28:35,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:28:35,358] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step8000/mp_rank_00_model_states.pt 0: [2023-03-17 00:28:35,358] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:28:35,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:28:35,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:28:35,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:28:35,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:28:35,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: successfully saved checkpoint at iteration 8000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.66 7: iteration 8010/ 173500 | consumed samples: 2050560 | consumed tokens: 4199546880 | elapsed time per iteration (s): 0.13 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.859370E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2037.074 | TFLOPs: 7.58 | 7: iteration 8020/ 173500 | consumed samples: 2053120 | consumed tokens: 4204789760 | elapsed time per iteration (s): 0.11 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.856180E+00 | grad norm: 0.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2380.748 | TFLOPs: 8.86 | 7: iteration 8030/ 173500 | consumed samples: 2055680 | consumed tokens: 4210032640 | elapsed time per iteration (s): 0.11 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.857755E+00 | grad norm: 0.777 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2413.456 | TFLOPs: 8.98 | 7: iteration 8040/ 173500 | consumed samples: 2058240 | consumed tokens: 4215275520 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848462E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2538.908 | TFLOPs: 9.44 | 7: iteration 8050/ 173500 | consumed samples: 2060800 | consumed tokens: 4220518400 | elapsed time per iteration (s): 0.11 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.851788E+00 | grad norm: 0.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2389.098 | TFLOPs: 8.89 | 7: iteration 8060/ 173500 | consumed samples: 2063360 | consumed tokens: 4225761280 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.841988E+00 | grad norm: 0.916 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.306 | TFLOPs: 11.99 | 7: iteration 8070/ 173500 | consumed samples: 2065920 | consumed tokens: 4231004160 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848308E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.272 | TFLOPs: 10.70 | 7: iteration 8080/ 173500 | consumed samples: 2068480 | consumed tokens: 4236247040 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.845364E+00 | grad norm: 0.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.058 | TFLOPs: 12.04 | 7: iteration 8090/ 173500 | consumed samples: 2071040 | consumed tokens: 4241489920 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.843012E+00 | grad norm: 0.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.020 | TFLOPs: 11.99 | 7: iteration 8100/ 173500 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.859823E+00 | grad norm: 0.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2859.909 | TFLOPs: 10.64 | 7: iteration 8110/ 173500 | consumed samples: 2076160 | consumed tokens: 4251975680 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.856933E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.823 | TFLOPs: 11.76 | 7: iteration 8120/ 173500 | consumed samples: 2078720 | consumed tokens: 4257218560 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.853639E+00 | grad norm: 0.816 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2824.336 | TFLOPs: 10.51 | 7: iteration 8130/ 173500 | consumed samples: 2081280 | consumed tokens: 4262461440 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.840770E+00 | grad norm: 0.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.880 | TFLOPs: 11.91 | 7: iteration 8140/ 173500 | consumed samples: 2083840 | consumed tokens: 4267704320 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.853442E+00 | grad norm: 0.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.328 | TFLOPs: 10.58 | 7: iteration 8150/ 173500 | consumed samples: 2086400 | consumed tokens: 4272947200 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.850038E+00 | grad norm: 0.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.045 | TFLOPs: 11.83 | 7: iteration 8160/ 173500 | consumed samples: 2088960 | consumed tokens: 4278190080 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848626E+00 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.574 | TFLOPs: 11.28 | 7: iteration 8170/ 173500 | consumed samples: 2091520 | consumed tokens: 4283432960 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.845296E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.489 | TFLOPs: 10.56 | 7: iteration 8180/ 173500 | consumed samples: 2094080 | consumed tokens: 4288675840 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.842074E+00 | grad norm: 0.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.792 | TFLOPs: 10.24 | 7: iteration 8190/ 173500 | consumed samples: 2096640 | consumed tokens: 4293918720 | elapsed time per iteration (s): 0.10 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.837199E+00 | grad norm: 0.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.142 | TFLOPs: 10.00 | 7: iteration 8200/ 173500 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.840312E+00 | grad norm: 0.767 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2793.653 | TFLOPs: 10.39 | 7: iteration 8210/ 173500 | consumed samples: 2101760 | consumed tokens: 4304404480 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.841638E+00 | grad norm: 0.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.259 | TFLOPs: 11.74 | 7: iteration 8220/ 173500 | consumed samples: 2104320 | consumed tokens: 4309647360 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.852959E+00 | grad norm: 0.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.051 | TFLOPs: 11.77 | 7: iteration 8230/ 173500 | consumed samples: 2106880 | consumed tokens: 4314890240 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.837987E+00 | grad norm: 0.957 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.038 | TFLOPs: 11.93 | 7: iteration 8240/ 173500 | consumed samples: 2109440 | consumed tokens: 4320133120 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.843920E+00 | grad norm: 0.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.509 | TFLOPs: 10.92 | 7: iteration 8250/ 173500 | consumed samples: 2112000 | consumed tokens: 4325376000 | elapsed time per iteration (s): 0.11 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.852699E+00 | grad norm: 0.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2354.194 | TFLOPs: 8.76 | 7: iteration 8260/ 173500 | consumed samples: 2114560 | consumed tokens: 4330618880 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.850753E+00 | grad norm: 0.891 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.795 | TFLOPs: 12.03 | 7: iteration 8270/ 173500 | consumed samples: 2117120 | consumed tokens: 4335861760 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.847028E+00 | grad norm: 0.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.548 | TFLOPs: 12.06 | 7: iteration 8280/ 173500 | consumed samples: 2119680 | consumed tokens: 4341104640 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.838929E+00 | grad norm: 0.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.726 | TFLOPs: 11.22 | 7: iteration 8290/ 173500 | consumed samples: 2122240 | consumed tokens: 4346347520 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.845923E+00 | grad norm: 0.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.844 | TFLOPs: 11.56 | 7: iteration 8300/ 173500 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.09 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.848536E+00 | grad norm: 0.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.510 | TFLOPs: 10.82 | 7: iteration 8310/ 173500 | consumed samples: 2127360 | consumed tokens: 4356833280 | elapsed time per iteration (s): 0.08 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 4.840200E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.110 | TFLOPs: 12.06 | 7: iteration 8320/ 173500 | consumed samples: 2129920 | consumed tokens: 4362076160 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.856314E+00 | grad norm: 1.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.395 | TFLOPs: 12.02 | 7: iteration 8330/ 173500 | consumed samples: 2132480 | consumed tokens: 4367319040 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.841681E+00 | grad norm: 0.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.394 | TFLOPs: 10.07 | 7: iteration 8340/ 173500 | consumed samples: 2135040 | consumed tokens: 4372561920 | elapsed time per iteration (s): 0.11 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.848889E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2429.756 | TFLOPs: 9.04 | 7: iteration 8350/ 173500 | consumed samples: 2137600 | consumed tokens: 4377804800 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.834604E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.522 | TFLOPs: 11.69 | 7: iteration 8360/ 173500 | consumed samples: 2140160 | consumed tokens: 4383047680 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.843198E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2679.777 | TFLOPs: 9.97 | 7: iteration 8370/ 173500 | consumed samples: 2142720 | consumed tokens: 4388290560 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.846074E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2606.197 | TFLOPs: 9.69 | 7: iteration 8380/ 173500 | consumed samples: 2145280 | consumed tokens: 4393533440 | elapsed time per iteration (s): 0.12 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.851187E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2129.218 | TFLOPs: 7.92 | 7: iteration 8390/ 173500 | consumed samples: 2147840 | consumed tokens: 4398776320 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.830872E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.095 | TFLOPs: 11.88 | 7: iteration 8400/ 173500 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.833482E+00 | grad norm: 1.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.085 | TFLOPs: 9.64 | 7: iteration 8410/ 173500 | consumed samples: 2152960 | consumed tokens: 4409262080 | elapsed time per iteration (s): 0.11 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.842281E+00 | grad norm: 0.996 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2305.090 | TFLOPs: 8.57 | 7: iteration 8420/ 173500 | consumed samples: 2155520 | consumed tokens: 4414504960 | elapsed time per iteration (s): 0.12 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.841630E+00 | grad norm: 0.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2189.099 | TFLOPs: 8.14 | 7: iteration 8430/ 173500 | consumed samples: 2158080 | consumed tokens: 4419747840 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.834288E+00 | grad norm: 0.902 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.768 | TFLOPs: 11.67 | 7: iteration 8440/ 173500 | consumed samples: 2160640 | consumed tokens: 4424990720 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.846615E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.767 | TFLOPs: 11.94 | 7: iteration 8450/ 173500 | consumed samples: 2163200 | consumed tokens: 4430233600 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.830494E+00 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.846 | TFLOPs: 12.04 | 7: iteration 8460/ 173500 | consumed samples: 2165760 | consumed tokens: 4435476480 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.845485E+00 | grad norm: 0.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.654 | TFLOPs: 11.73 | 7: iteration 8470/ 173500 | consumed samples: 2168320 | consumed tokens: 4440719360 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.817495E+00 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.991 | TFLOPs: 12.03 | 7: iteration 8480/ 173500 | consumed samples: 2170880 | consumed tokens: 4445962240 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.840033E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.088 | TFLOPs: 12.03 | 7: iteration 8490/ 173500 | consumed samples: 2173440 | consumed tokens: 4451205120 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.831723E+00 | grad norm: 0.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.725 | TFLOPs: 10.41 | 7: iteration 8500/ 173500 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.835628E+00 | grad norm: 0.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.483 | TFLOPs: 11.49 | 7: iteration 8510/ 173500 | consumed samples: 2178560 | consumed tokens: 4461690880 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839722E+00 | grad norm: 0.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2564.707 | TFLOPs: 9.54 | 7: iteration 8520/ 173500 | consumed samples: 2181120 | consumed tokens: 4466933760 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839592E+00 | grad norm: 0.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.462 | TFLOPs: 11.88 | 7: iteration 8530/ 173500 | consumed samples: 2183680 | consumed tokens: 4472176640 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.843307E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2939.631 | TFLOPs: 10.93 | 7: iteration 8540/ 173500 | consumed samples: 2186240 | consumed tokens: 4477419520 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.848682E+00 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2779.538 | TFLOPs: 10.34 | 7: iteration 8550/ 173500 | consumed samples: 2188800 | consumed tokens: 4482662400 | elapsed time per iteration (s): 0.12 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.847996E+00 | grad norm: 0.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.739 | TFLOPs: 7.83 | 7: iteration 8560/ 173500 | consumed samples: 2191360 | consumed tokens: 4487905280 | elapsed time per iteration (s): 0.12 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.835463E+00 | grad norm: 0.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.860 | TFLOPs: 8.02 | 7: iteration 8570/ 173500 | consumed samples: 2193920 | consumed tokens: 4493148160 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.835854E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2553.332 | TFLOPs: 9.50 | 7: iteration 8580/ 173500 | consumed samples: 2196480 | consumed tokens: 4498391040 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839602E+00 | grad norm: 0.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2827.217 | TFLOPs: 10.52 | 7: iteration 8590/ 173500 | consumed samples: 2199040 | consumed tokens: 4503633920 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839703E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2733.615 | TFLOPs: 10.17 | 7: iteration 8600/ 173500 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.834465E+00 | grad norm: 1.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.479 | TFLOPs: 11.95 | 7: iteration 8610/ 173500 | consumed samples: 2204160 | consumed tokens: 4514119680 | elapsed time per iteration (s): 0.11 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.838707E+00 | grad norm: 0.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2346.781 | TFLOPs: 8.73 | 7: iteration 8620/ 173500 | consumed samples: 2206720 | consumed tokens: 4519362560 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839977E+00 | grad norm: 0.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.683 | TFLOPs: 9.61 | 7: iteration 8630/ 173500 | consumed samples: 2209280 | consumed tokens: 4524605440 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.835798E+00 | grad norm: 0.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.382 | TFLOPs: 12.11 | 7: iteration 8640/ 173500 | consumed samples: 2211840 | consumed tokens: 4529848320 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.831425E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2605.824 | TFLOPs: 9.69 | 7: iteration 8650/ 173500 | consumed samples: 2214400 | consumed tokens: 4535091200 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.845897E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2647.086 | TFLOPs: 9.85 | 7: iteration 8660/ 173500 | consumed samples: 2216960 | consumed tokens: 4540334080 | elapsed time per iteration (s): 0.22 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.825733E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1185.743 | TFLOPs: 4.41 | 7: iteration 8670/ 173500 | consumed samples: 2219520 | consumed tokens: 4545576960 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.827823E+00 | grad norm: 0.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.793 | TFLOPs: 11.65 | 7: iteration 8680/ 173500 | consumed samples: 2222080 | consumed tokens: 4550819840 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.837087E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.153 | TFLOPs: 11.89 | 7: iteration 8690/ 173500 | consumed samples: 2224640 | consumed tokens: 4556062720 | elapsed time per iteration (s): 0.10 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.834608E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.493 | TFLOPs: 9.56 | 7: iteration 8700/ 173500 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.832776E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.318 | TFLOPs: 10.96 | 7: iteration 8710/ 173500 | consumed samples: 2229760 | consumed tokens: 4566548480 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.832265E+00 | grad norm: 0.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.550 | TFLOPs: 11.05 | 7: iteration 8720/ 173500 | consumed samples: 2232320 | consumed tokens: 4571791360 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.829783E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.306 | TFLOPs: 11.06 | 7: iteration 8730/ 173500 | consumed samples: 2234880 | consumed tokens: 4577034240 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.833792E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.926 | TFLOPs: 10.33 | 7: iteration 8740/ 173500 | consumed samples: 2237440 | consumed tokens: 4582277120 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.822214E+00 | grad norm: 1.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.951 | TFLOPs: 11.15 | 7: iteration 8750/ 173500 | consumed samples: 2240000 | consumed tokens: 4587520000 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.830121E+00 | grad norm: 0.741 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.620 | TFLOPs: 10.67 | 7: iteration 8760/ 173500 | consumed samples: 2242560 | consumed tokens: 4592762880 | elapsed time per iteration (s): 0.11 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.839093E+00 | grad norm: 0.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.037 | TFLOPs: 9.02 | 7: iteration 8770/ 173500 | consumed samples: 2245120 | consumed tokens: 4598005760 | elapsed time per iteration (s): 0.09 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.840607E+00 | grad norm: 1.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2725.352 | TFLOPs: 10.14 | 7: iteration 8780/ 173500 | consumed samples: 2247680 | consumed tokens: 4603248640 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.827655E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.693 | TFLOPs: 11.90 | 7: iteration 8790/ 173500 | consumed samples: 2250240 | consumed tokens: 4608491520 | elapsed time per iteration (s): 0.08 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 4.831442E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.539 | TFLOPs: 11.78 | 7: iteration 8800/ 173500 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.836612E+00 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.772 | TFLOPs: 11.31 | 7: iteration 8810/ 173500 | consumed samples: 2255360 | consumed tokens: 4618977280 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.822051E+00 | grad norm: 0.791 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.014 | TFLOPs: 11.97 | 7: iteration 8820/ 173500 | consumed samples: 2257920 | consumed tokens: 4624220160 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.834583E+00 | grad norm: 0.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2828.264 | TFLOPs: 10.52 | 7: iteration 8830/ 173500 | consumed samples: 2260480 | consumed tokens: 4629463040 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.832032E+00 | grad norm: 0.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.468 | TFLOPs: 10.39 | 7: iteration 8840/ 173500 | consumed samples: 2263040 | consumed tokens: 4634705920 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.830099E+00 | grad norm: 0.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2447.021 | TFLOPs: 9.10 | 7: iteration 8850/ 173500 | consumed samples: 2265600 | consumed tokens: 4639948800 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.821381E+00 | grad norm: 1.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.274 | TFLOPs: 10.61 | 7: iteration 8860/ 173500 | consumed samples: 2268160 | consumed tokens: 4645191680 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.824821E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.027 | TFLOPs: 11.39 | 7: iteration 8870/ 173500 | consumed samples: 2270720 | consumed tokens: 4650434560 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.824714E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.475 | TFLOPs: 11.84 | 7: iteration 8880/ 173500 | consumed samples: 2273280 | consumed tokens: 4655677440 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.834263E+00 | grad norm: 0.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.819 | TFLOPs: 11.88 | 7: iteration 8890/ 173500 | consumed samples: 2275840 | consumed tokens: 4660920320 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.826831E+00 | grad norm: 0.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.134 | TFLOPs: 11.08 | 7: iteration 8900/ 173500 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.821232E+00 | grad norm: 0.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.357 | TFLOPs: 10.20 | 7: iteration 8910/ 173500 | consumed samples: 2280960 | consumed tokens: 4671406080 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.834261E+00 | grad norm: 0.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.009 | TFLOPs: 10.78 | 7: iteration 8920/ 173500 | consumed samples: 2283520 | consumed tokens: 4676648960 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.815283E+00 | grad norm: 0.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.529 | TFLOPs: 11.84 | 7: iteration 8930/ 173500 | consumed samples: 2286080 | consumed tokens: 4681891840 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.827547E+00 | grad norm: 0.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.641 | TFLOPs: 9.83 | 7: iteration 8940/ 173500 | consumed samples: 2288640 | consumed tokens: 4687134720 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.830751E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.499 | TFLOPs: 9.56 | 7: iteration 8950/ 173500 | consumed samples: 2291200 | consumed tokens: 4692377600 | elapsed time per iteration (s): 0.11 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.828958E+00 | grad norm: 0.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2282.481 | TFLOPs: 8.49 | 7: iteration 8960/ 173500 | consumed samples: 2293760 | consumed tokens: 4697620480 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.823087E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.570 | TFLOPs: 10.55 | 7: iteration 8970/ 173500 | consumed samples: 2296320 | consumed tokens: 4702863360 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.826497E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.127 | TFLOPs: 11.58 | 7: iteration 8980/ 173500 | consumed samples: 2298880 | consumed tokens: 4708106240 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.818760E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.404 | TFLOPs: 11.90 | 7: iteration 8990/ 173500 | consumed samples: 2301440 | consumed tokens: 4713349120 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.831791E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.072 | TFLOPs: 11.72 | 7: iteration 9000/ 173500 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.835569E+00 | grad norm: 0.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.348 | TFLOPs: 9.74 | 7: ----------------------------------------------------------------------------------------------- 7: validation loss at iteration 9000 | lm loss value: 4.648119E+00 | lm loss PPL: 1.043885E+02 | 7: ----------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 9000 to checkpoints_14m91b100m 0: [2023-03-17 00:30:07,175] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is begin to save! 0: [2023-03-17 00:30:07,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:30:07,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:30:07,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:30:07,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:30:07,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:30:07,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:30:07,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:30:07,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:30:07,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:30:07,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:30:07,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:30:07,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:30:07,217] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step9000/mp_rank_00_model_states.pt 0: [2023-03-17 00:30:07,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:30:07,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:30:07,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:30:07,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:30:07,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2023-03-17 00:30:07,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2023-03-17 00:30:07,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: successfully saved checkpoint at iteration 9000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.20 7: iteration 9010/ 173500 | consumed samples: 2306560 | consumed tokens: 4723834880 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.823827E+00 | grad norm: 0.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.353 | TFLOPs: 10.20 | 7: iteration 9020/ 173500 | consumed samples: 2309120 | consumed tokens: 4729077760 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.823995E+00 | grad norm: 0.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.664 | TFLOPs: 11.93 | 7: iteration 9030/ 173500 | consumed samples: 2311680 | consumed tokens: 4734320640 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.826849E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.952 | TFLOPs: 11.92 | 7: iteration 9040/ 173500 | consumed samples: 2314240 | consumed tokens: 4739563520 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.818155E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2585.738 | TFLOPs: 9.62 | 7: iteration 9050/ 173500 | consumed samples: 2316800 | consumed tokens: 4744806400 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.823723E+00 | grad norm: 0.842 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.988 | TFLOPs: 10.92 | 7: iteration 9060/ 173500 | consumed samples: 2319360 | consumed tokens: 4750049280 | elapsed time per iteration (s): 0.12 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.816146E+00 | grad norm: 0.945 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2148.833 | TFLOPs: 7.99 | 7: iteration 9070/ 173500 | consumed samples: 2321920 | consumed tokens: 4755292160 | elapsed time per iteration (s): 0.12 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.827604E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.581 | TFLOPs: 8.10 | 7: iteration 9080/ 173500 | consumed samples: 2324480 | consumed tokens: 4760535040 | elapsed time per iteration (s): 0.12 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.816122E+00 | grad norm: 0.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.692 | TFLOPs: 8.07 | 7: iteration 9090/ 173500 | consumed samples: 2327040 | consumed tokens: 4765777920 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.820733E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.936 | TFLOPs: 10.80 | 7: iteration 9100/ 173500 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.814936E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.706 | TFLOPs: 10.13 | 7: iteration 9110/ 173500 | consumed samples: 2332160 | consumed tokens: 4776263680 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.828503E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2581.327 | TFLOPs: 9.60 | 7: iteration 9120/ 173500 | consumed samples: 2334720 | consumed tokens: 4781506560 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.816617E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.400 | TFLOPs: 11.89 | 7: iteration 9130/ 173500 | consumed samples: 2337280 | consumed tokens: 4786749440 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.819720E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.761 | TFLOPs: 11.57 | 7: iteration 9140/ 173500 | consumed samples: 2339840 | consumed tokens: 4791992320 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.821189E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.879 | TFLOPs: 11.81 | 7: iteration 9150/ 173500 | consumed samples: 2342400 | consumed tokens: 4797235200 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.822097E+00 | grad norm: 0.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.461 | TFLOPs: 11.66 | 7: iteration 9160/ 173500 | consumed samples: 2344960 | consumed tokens: 4802478080 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.811199E+00 | grad norm: 0.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.594 | TFLOPs: 11.96 | 7: iteration 9170/ 173500 | consumed samples: 2347520 | consumed tokens: 4807720960 | elapsed time per iteration (s): 0.12 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.810673E+00 | grad norm: 0.809 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2065.441 | TFLOPs: 7.68 | 7: iteration 9180/ 173500 | consumed samples: 2350080 | consumed tokens: 4812963840 | elapsed time per iteration (s): 0.13 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.828233E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.319 | TFLOPs: 7.53 | 7: iteration 9190/ 173500 | consumed samples: 2352640 | consumed tokens: 4818206720 | elapsed time per iteration (s): 0.13 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.828540E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.431 | TFLOPs: 7.37 | 7: iteration 9200/ 173500 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.09 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.818129E+00 | grad norm: 0.680 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.849 | TFLOPs: 10.94 | 7: iteration 9210/ 173500 | consumed samples: 2357760 | consumed tokens: 4828692480 | elapsed time per iteration (s): 0.11 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.820906E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.389 | TFLOPs: 8.41 | 7: iteration 9220/ 173500 | consumed samples: 2360320 | consumed tokens: 4833935360 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.820916E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.505 | TFLOPs: 11.69 | 7: iteration 9230/ 173500 | consumed samples: 2362880 | consumed tokens: 4839178240 | elapsed time per iteration (s): 0.08 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.816940E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.709 | TFLOPs: 11.64 | 7: iteration 9240/ 173500 | consumed samples: 2365440 | consumed tokens: 4844421120 | elapsed time per iteration (s): 0.11 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.824652E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2266.520 | TFLOPs: 8.43 | 7: iteration 9250/ 173500 | consumed samples: 2368000 | consumed tokens: 4849664000 | elapsed time per iteration (s): 0.10 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 4.817312E+00 | grad norm: 0.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2463.631 | TFLOPs: 9.16 | 7: iteration 9260/ 173500 | consumed samples: 2370560 | consumed tokens: 4854906880 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.826842E+00 | grad norm: 0.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.363 | TFLOPs: 11.98 | 7: iteration 9270/ 173500 | consumed samples: 2373120 | consumed tokens: 4860149760 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.810279E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.867 | TFLOPs: 12.04 | 7: iteration 9280/ 173500 | consumed samples: 2375680 | consumed tokens: 4865392640 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.821614E+00 | grad norm: 0.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.094 | TFLOPs: 12.09 | 7: iteration 9290/ 173500 | consumed samples: 2378240 | consumed tokens: 4870635520 | elapsed time per iteration (s): 0.11 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.817617E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2247.538 | TFLOPs: 8.36 | 7: iteration 9300/ 173500 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.13 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816406E+00 | grad norm: 0.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2031.036 | TFLOPs: 7.55 | 7: iteration 9310/ 173500 | consumed samples: 2383360 | consumed tokens: 4881121280 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.803078E+00 | grad norm: 0.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.066 | TFLOPs: 11.35 | 7: iteration 9320/ 173500 | consumed samples: 2385920 | consumed tokens: 4886364160 | elapsed time per iteration (s): 0.10 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.817118E+00 | grad norm: 0.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.513 | TFLOPs: 9.34 | 7: iteration 9330/ 173500 | consumed samples: 2388480 | consumed tokens: 4891607040 | elapsed time per iteration (s): 0.13 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.807579E+00 | grad norm: 0.894 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1979.384 | TFLOPs: 7.36 | 7: iteration 9340/ 173500 | consumed samples: 2391040 | consumed tokens: 4896849920 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.826083E+00 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.072 | TFLOPs: 10.70 | 7: iteration 9350/ 173500 | consumed samples: 2393600 | consumed tokens: 4902092800 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.822475E+00 | grad norm: 0.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.137 | TFLOPs: 11.47 | 7: iteration 9360/ 173500 | consumed samples: 2396160 | consumed tokens: 4907335680 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.805952E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.835 | TFLOPs: 10.69 | 7: iteration 9370/ 173500 | consumed samples: 2398720 | consumed tokens: 4912578560 | elapsed time per iteration (s): 0.12 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.820868E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2064.352 | TFLOPs: 7.68 | 7: iteration 9380/ 173500 | consumed samples: 2401280 | consumed tokens: 4917821440 | elapsed time per iteration (s): 0.13 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.821540E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.596 | TFLOPs: 7.24 | 7: iteration 9390/ 173500 | consumed samples: 2403840 | consumed tokens: 4923064320 | elapsed time per iteration (s): 0.11 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.821696E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.663 | TFLOPs: 8.80 | 7: iteration 9400/ 173500 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816638E+00 | grad norm: 0.812 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.346 | TFLOPs: 10.99 | 7: iteration 9410/ 173500 | consumed samples: 2408960 | consumed tokens: 4933550080 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.810382E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.135 | TFLOPs: 12.00 | 7: iteration 9420/ 173500 | consumed samples: 2411520 | consumed tokens: 4938792960 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.797291E+00 | grad norm: 0.848 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.488 | TFLOPs: 11.92 | 7: iteration 9430/ 173500 | consumed samples: 2414080 | consumed tokens: 4944035840 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.814350E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.799 | TFLOPs: 11.97 | 7: iteration 9440/ 173500 | consumed samples: 2416640 | consumed tokens: 4949278720 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.815171E+00 | grad norm: 0.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.392 | TFLOPs: 11.80 | 7: iteration 9450/ 173500 | consumed samples: 2419200 | consumed tokens: 4954521600 | elapsed time per iteration (s): 0.10 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816172E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.104 | TFLOPs: 9.82 | 7: iteration 9460/ 173500 | consumed samples: 2421760 | consumed tokens: 4959764480 | elapsed time per iteration (s): 0.12 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.817656E+00 | grad norm: 0.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.236 | TFLOPs: 7.70 | 7: iteration 9470/ 173500 | consumed samples: 2424320 | consumed tokens: 4965007360 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.805460E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.167 | TFLOPs: 10.74 | 7: iteration 9480/ 173500 | consumed samples: 2426880 | consumed tokens: 4970250240 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816233E+00 | grad norm: 0.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.651 | TFLOPs: 11.83 | 7: iteration 9490/ 173500 | consumed samples: 2429440 | consumed tokens: 4975493120 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.812331E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.114 | TFLOPs: 11.90 | 7: iteration 9500/ 173500 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.807764E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.364 | TFLOPs: 11.99 | 7: iteration 9510/ 173500 | consumed samples: 2434560 | consumed tokens: 4985978880 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.807411E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.288 | TFLOPs: 11.86 | 7: iteration 9520/ 173500 | consumed samples: 2437120 | consumed tokens: 4991221760 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.797175E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.568 | TFLOPs: 11.83 | 7: iteration 9530/ 173500 | consumed samples: 2439680 | consumed tokens: 4996464640 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.815933E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.021 | TFLOPs: 11.88 | 7: iteration 9540/ 173500 | consumed samples: 2442240 | consumed tokens: 5001707520 | elapsed time per iteration (s): 0.10 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816872E+00 | grad norm: 0.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2528.154 | TFLOPs: 9.40 | 7: iteration 9550/ 173500 | consumed samples: 2444800 | consumed tokens: 5006950400 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.808329E+00 | grad norm: 0.918 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.205 | TFLOPs: 10.63 | 7: iteration 9560/ 173500 | consumed samples: 2447360 | consumed tokens: 5012193280 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.812240E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.523 | TFLOPs: 11.90 | 7: iteration 9570/ 173500 | consumed samples: 2449920 | consumed tokens: 5017436160 | elapsed time per iteration (s): 0.11 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.805274E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.108 | TFLOPs: 9.05 | 7: iteration 9580/ 173500 | consumed samples: 2452480 | consumed tokens: 5022679040 | elapsed time per iteration (s): 0.10 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.812445E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.322 | TFLOPs: 9.42 | 7: iteration 9590/ 173500 | consumed samples: 2455040 | consumed tokens: 5027921920 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.806703E+00 | grad norm: 0.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.105 | TFLOPs: 11.48 | 7: iteration 9600/ 173500 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.811875E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.353 | TFLOPs: 11.96 | 7: iteration 9610/ 173500 | consumed samples: 2460160 | consumed tokens: 5038407680 | elapsed time per iteration (s): 0.12 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.826120E+00 | grad norm: 0.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2224.328 | TFLOPs: 8.27 | 7: iteration 9620/ 173500 | consumed samples: 2462720 | consumed tokens: 5043650560 | elapsed time per iteration (s): 0.10 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.821698E+00 | grad norm: 0.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2621.434 | TFLOPs: 9.75 | 7: iteration 9630/ 173500 | consumed samples: 2465280 | consumed tokens: 5048893440 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.811127E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.603 | TFLOPs: 11.97 | 7: iteration 9640/ 173500 | consumed samples: 2467840 | consumed tokens: 5054136320 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.816907E+00 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.000 | TFLOPs: 11.98 | 7: iteration 9650/ 173500 | consumed samples: 2470400 | consumed tokens: 5059379200 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.803046E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.592 | TFLOPs: 11.99 | 7: iteration 9660/ 173500 | consumed samples: 2472960 | consumed tokens: 5064622080 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.820045E+00 | grad norm: 1.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.703 | TFLOPs: 11.93 | 7: iteration 9670/ 173500 | consumed samples: 2475520 | consumed tokens: 5069864960 | elapsed time per iteration (s): 0.08 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.818285E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.577 | TFLOPs: 11.65 | 7: iteration 9680/ 173500 | consumed samples: 2478080 | consumed tokens: 5075107840 | elapsed time per iteration (s): 0.09 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 4.819886E+00 | grad norm: 0.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.276 | TFLOPs: 10.91 | 7: iteration 9690/ 173500 | consumed samples: 2480640 | consumed tokens: 5080350720 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.812722E+00 | grad norm: 0.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.033 | TFLOPs: 11.23 | 7: iteration 9700/ 173500 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.794861E+00 | grad norm: 0.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.177 | TFLOPs: 11.91 | 7: iteration 9710/ 173500 | consumed samples: 2485760 | consumed tokens: 5090836480 | elapsed time per iteration (s): 0.10 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.794671E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.243 | TFLOPs: 9.92 | 7: iteration 9720/ 173500 | consumed samples: 2488320 | consumed tokens: 5096079360 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.811417E+00 | grad norm: 0.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2867.490 | TFLOPs: 10.67 | 7: iteration 9730/ 173500 | consumed samples: 2490880 | consumed tokens: 5101322240 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.804886E+00 | grad norm: 0.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.840 | TFLOPs: 11.67 | 7: iteration 9740/ 173500 | consumed samples: 2493440 | consumed tokens: 5106565120 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.809328E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.040 | TFLOPs: 11.57 | 7: iteration 9750/ 173500 | consumed samples: 2496000 | consumed tokens: 5111808000 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.800451E+00 | grad norm: 0.963 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.130 | TFLOPs: 11.39 | 7: iteration 9760/ 173500 | consumed samples: 2498560 | consumed tokens: 5117050880 | elapsed time per iteration (s): 0.12 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.808612E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2192.543 | TFLOPs: 8.16 | 7: iteration 9770/ 173500 | consumed samples: 2501120 | consumed tokens: 5122293760 | elapsed time per iteration (s): 0.11 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.811494E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2347.091 | TFLOPs: 8.73 | 7: iteration 9780/ 173500 | consumed samples: 2503680 | consumed tokens: 5127536640 | elapsed time per iteration (s): 0.10 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.815134E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.575 | TFLOPs: 9.52 | 7: iteration 9790/ 173500 | consumed samples: 2506240 | consumed tokens: 5132779520 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.800539E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.885 | TFLOPs: 10.25 | 7: iteration 9800/ 173500 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.814674E+00 | grad norm: 0.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.805 | TFLOPs: 11.09 | 7: iteration 9810/ 173500 | consumed samples: 2511360 | consumed tokens: 5143265280 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.809303E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2943.205 | TFLOPs: 10.95 | 7: iteration 9820/ 173500 | consumed samples: 2513920 | consumed tokens: 5148508160 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.804617E+00 | grad norm: 0.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.347 | TFLOPs: 11.96 | 7: iteration 9830/ 173500 | consumed samples: 2516480 | consumed tokens: 5153751040 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.812181E+00 | grad norm: 0.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.338 | TFLOPs: 11.03 | 7: iteration 9840/ 173500 | consumed samples: 2519040 | consumed tokens: 5158993920 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.804282E+00 | grad norm: 0.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.114 | TFLOPs: 11.91 | 7: iteration 9850/ 173500 | consumed samples: 2521600 | consumed tokens: 5164236800 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.803061E+00 | grad norm: 0.978 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.797 | TFLOPs: 10.80 | 7: iteration 9860/ 173500 | consumed samples: 2524160 | consumed tokens: 5169479680 | elapsed time per iteration (s): 0.11 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.802647E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.122 | TFLOPs: 8.98 | 7: iteration 9870/ 173500 | consumed samples: 2526720 | consumed tokens: 5174722560 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.801567E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.436 | TFLOPs: 11.89 | 7: iteration 9880/ 173500 | consumed samples: 2529280 | consumed tokens: 5179965440 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.808040E+00 | grad norm: 0.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.224 | TFLOPs: 11.88 | 7: iteration 9890/ 173500 | consumed samples: 2531840 | consumed tokens: 5185208320 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.830780E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.189 | TFLOPs: 10.68 | 7: iteration 9900/ 173500 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.803484E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.178 | TFLOPs: 11.49 | 7: iteration 9910/ 173500 | consumed samples: 2536960 | consumed tokens: 5195694080 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.810419E+00 | grad norm: 0.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.031 | TFLOPs: 10.58 | 7: iteration 9920/ 173500 | consumed samples: 2539520 | consumed tokens: 5200936960 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.795745E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.199 | TFLOPs: 11.59 | 7: iteration 9930/ 173500 | consumed samples: 2542080 | consumed tokens: 5206179840 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.800678E+00 | grad norm: 0.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.001 | TFLOPs: 10.93 | 7: iteration 9940/ 173500 | consumed samples: 2544640 | consumed tokens: 5211422720 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.796617E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.412 | TFLOPs: 11.85 | 7: iteration 9950/ 173500 | consumed samples: 2547200 | consumed tokens: 5216665600 | elapsed time per iteration (s): 0.10 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.799658E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.229 | TFLOPs: 9.60 | 7: iteration 9960/ 173500 | consumed samples: 2549760 | consumed tokens: 5221908480 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.805500E+00 | grad norm: 1.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.958 | TFLOPs: 11.89 | 7: iteration 9970/ 173500 | consumed samples: 2552320 | consumed tokens: 5227151360 | elapsed time per iteration (s): 0.10 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.800388E+00 | grad norm: 0.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2553.820 | TFLOPs: 9.50 | 7: iteration 9980/ 173500 | consumed samples: 2554880 | consumed tokens: 5232394240 | elapsed time per iteration (s): 0.11 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.806347E+00 | grad norm: 0.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2253.156 | TFLOPs: 8.38 | 7: iteration 9990/ 173500 | consumed samples: 2557440 | consumed tokens: 5237637120 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.808702E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.880 | TFLOPs: 11.84 | 0: [2023-03-17 00:31:39,612] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[0.00019897364350587667, 0.00019897364350587667, 0.00019897364350587667], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 10000/ 173500 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.813185E+00 | grad norm: 0.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.854 | TFLOPs: 11.87 | 0: steps: 10000 loss: 4.8061 iter time (s): 0.091 samples/sec: 2818.612 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 10000 | lm loss value: 4.595667E+00 | lm loss PPL: 9.905417E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 10000 to checkpoints_14m91b100m 0: [2023-03-17 00:31:39,669] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! 0: [2023-03-17 00:31:39,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:31:39,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:31:39,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:31:39,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:31:39,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:31:39,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:31:39,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:31:39,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:31:39,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:31:39,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:31:39,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:31:39,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:31:39,711] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step10000/mp_rank_00_model_states.pt 0: [2023-03-17 00:31:39,711] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:31:39,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:31:39,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:31:39,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:31:39,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:31:39,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:31:39,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 0: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2023-03-17 00:31:39,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:31:39,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2023-03-17 00:31:39,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:31:39,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:31:39,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: successfully saved checkpoint at iteration 10000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.08 7: iteration 10010/ 173500 | consumed samples: 2562560 | consumed tokens: 5248122880 | elapsed time per iteration (s): 0.09 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.812436E+00 | grad norm: 0.771 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.067 | TFLOPs: 10.04 | 7: iteration 10020/ 173500 | consumed samples: 2565120 | consumed tokens: 5253365760 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.805927E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.593 | TFLOPs: 11.96 | 7: iteration 10030/ 173500 | consumed samples: 2567680 | consumed tokens: 5258608640 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.802357E+00 | grad norm: 0.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.891 | TFLOPs: 11.96 | 7: iteration 10040/ 173500 | consumed samples: 2570240 | consumed tokens: 5263851520 | elapsed time per iteration (s): 0.10 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.803244E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2521.934 | TFLOPs: 9.38 | 7: iteration 10050/ 173500 | consumed samples: 2572800 | consumed tokens: 5269094400 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.793863E+00 | grad norm: 1.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.099 | TFLOPs: 11.93 | 7: iteration 10060/ 173500 | consumed samples: 2575360 | consumed tokens: 5274337280 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.800717E+00 | grad norm: 0.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.026 | TFLOPs: 11.91 | 7: iteration 10070/ 173500 | consumed samples: 2577920 | consumed tokens: 5279580160 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.809331E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.820 | TFLOPs: 11.83 | 7: iteration 10080/ 173500 | consumed samples: 2580480 | consumed tokens: 5284823040 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.797113E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.942 | TFLOPs: 11.97 | 7: iteration 10090/ 173500 | consumed samples: 2583040 | consumed tokens: 5290065920 | elapsed time per iteration (s): 0.08 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 4.802939E+00 | grad norm: 0.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.728 | TFLOPs: 11.92 | 7: iteration 10100/ 173500 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.800089E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.669 | TFLOPs: 10.58 | 7: iteration 10110/ 173500 | consumed samples: 2588160 | consumed tokens: 5300551680 | elapsed time per iteration (s): 0.12 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.799020E+00 | grad norm: 0.817 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2096.317 | TFLOPs: 7.80 | 7: iteration 10120/ 173500 | consumed samples: 2590720 | consumed tokens: 5305794560 | elapsed time per iteration (s): 0.12 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.804541E+00 | grad norm: 0.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.026 | TFLOPs: 8.02 | 7: iteration 10130/ 173500 | consumed samples: 2593280 | consumed tokens: 5311037440 | elapsed time per iteration (s): 0.13 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.806963E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.259 | TFLOPs: 7.58 | 7: iteration 10140/ 173500 | consumed samples: 2595840 | consumed tokens: 5316280320 | elapsed time per iteration (s): 0.13 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.798830E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.827 | TFLOPs: 7.61 | 7: iteration 10150/ 173500 | consumed samples: 2598400 | consumed tokens: 5321523200 | elapsed time per iteration (s): 0.11 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.805712E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.383 | TFLOPs: 9.05 | 7: iteration 10160/ 173500 | consumed samples: 2600960 | consumed tokens: 5326766080 | elapsed time per iteration (s): 0.10 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797124E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.396 | TFLOPs: 9.12 | 7: iteration 10170/ 173500 | consumed samples: 2603520 | consumed tokens: 5332008960 | elapsed time per iteration (s): 0.10 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797613E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.554 | TFLOPs: 9.16 | 7: iteration 10180/ 173500 | consumed samples: 2606080 | consumed tokens: 5337251840 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.803303E+00 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.793 | TFLOPs: 11.21 | 7: iteration 10190/ 173500 | consumed samples: 2608640 | consumed tokens: 5342494720 | elapsed time per iteration (s): 0.12 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.808084E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2102.772 | TFLOPs: 7.82 | 7: iteration 10200/ 173500 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.13 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.805738E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.482 | TFLOPs: 7.24 | 7: iteration 10210/ 173500 | consumed samples: 2613760 | consumed tokens: 5352980480 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.792702E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2696.819 | TFLOPs: 10.03 | 7: iteration 10220/ 173500 | consumed samples: 2616320 | consumed tokens: 5358223360 | elapsed time per iteration (s): 0.12 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.807351E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2186.875 | TFLOPs: 8.13 | 7: iteration 10230/ 173500 | consumed samples: 2618880 | consumed tokens: 5363466240 | elapsed time per iteration (s): 0.13 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.794247E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.367 | TFLOPs: 7.37 | 7: iteration 10240/ 173500 | consumed samples: 2621440 | consumed tokens: 5368709120 | elapsed time per iteration (s): 0.13 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.800513E+00 | grad norm: 0.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.041 | TFLOPs: 7.30 | 7: iteration 10250/ 173500 | consumed samples: 2624000 | consumed tokens: 5373952000 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.805647E+00 | grad norm: 0.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.550 | TFLOPs: 10.53 | 7: iteration 10260/ 173500 | consumed samples: 2626560 | consumed tokens: 5379194880 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.789648E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.366 | TFLOPs: 11.90 | 7: iteration 10270/ 173500 | consumed samples: 2629120 | consumed tokens: 5384437760 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.798913E+00 | grad norm: 0.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.228 | TFLOPs: 11.55 | 7: iteration 10280/ 173500 | consumed samples: 2631680 | consumed tokens: 5389680640 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.803982E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.167 | TFLOPs: 12.01 | 7: iteration 10290/ 173500 | consumed samples: 2634240 | consumed tokens: 5394923520 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.794344E+00 | grad norm: 0.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.198 | TFLOPs: 11.97 | 7: iteration 10300/ 173500 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797216E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.236 | TFLOPs: 12.05 | 7: iteration 10310/ 173500 | consumed samples: 2639360 | consumed tokens: 5405409280 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.799215E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.915 | TFLOPs: 11.91 | 7: iteration 10320/ 173500 | consumed samples: 2641920 | consumed tokens: 5410652160 | elapsed time per iteration (s): 0.22 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.801589E+00 | grad norm: 0.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1174.035 | TFLOPs: 4.37 | 7: iteration 10330/ 173500 | consumed samples: 2644480 | consumed tokens: 5415895040 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.798404E+00 | grad norm: 0.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.260 | TFLOPs: 11.44 | 7: iteration 10340/ 173500 | consumed samples: 2647040 | consumed tokens: 5421137920 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.808015E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.627 | TFLOPs: 11.74 | 7: iteration 10350/ 173500 | consumed samples: 2649600 | consumed tokens: 5426380800 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.801512E+00 | grad norm: 0.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2759.696 | TFLOPs: 10.26 | 7: iteration 10360/ 173500 | consumed samples: 2652160 | consumed tokens: 5431623680 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.812896E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.932 | TFLOPs: 11.56 | 7: iteration 10370/ 173500 | consumed samples: 2654720 | consumed tokens: 5436866560 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797833E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.804 | TFLOPs: 11.60 | 7: iteration 10380/ 173500 | consumed samples: 2657280 | consumed tokens: 5442109440 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.791988E+00 | grad norm: 0.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.626 | TFLOPs: 11.86 | 7: iteration 10390/ 173500 | consumed samples: 2659840 | consumed tokens: 5447352320 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.799435E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.401 | TFLOPs: 11.85 | 7: iteration 10400/ 173500 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.793418E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.001 | TFLOPs: 11.88 | 7: iteration 10410/ 173500 | consumed samples: 2664960 | consumed tokens: 5457838080 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.808692E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.536 | TFLOPs: 11.88 | 7: iteration 10420/ 173500 | consumed samples: 2667520 | consumed tokens: 5463080960 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.798032E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.194 | TFLOPs: 11.90 | 7: iteration 10430/ 173500 | consumed samples: 2670080 | consumed tokens: 5468323840 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.795762E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.506 | TFLOPs: 11.80 | 7: iteration 10440/ 173500 | consumed samples: 2672640 | consumed tokens: 5473566720 | elapsed time per iteration (s): 0.10 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.782052E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.464 | TFLOPs: 9.34 | 7: iteration 10450/ 173500 | consumed samples: 2675200 | consumed tokens: 5478809600 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797879E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.339 | TFLOPs: 11.14 | 7: iteration 10460/ 173500 | consumed samples: 2677760 | consumed tokens: 5484052480 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.804980E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.253 | TFLOPs: 11.48 | 7: iteration 10470/ 173500 | consumed samples: 2680320 | consumed tokens: 5489295360 | elapsed time per iteration (s): 0.09 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.799740E+00 | grad norm: 0.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2855.613 | TFLOPs: 10.62 | 7: iteration 10480/ 173500 | consumed samples: 2682880 | consumed tokens: 5494538240 | elapsed time per iteration (s): 0.08 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 4.797373E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.691 | TFLOPs: 11.35 | 7: iteration 10490/ 173500 | consumed samples: 2685440 | consumed tokens: 5499781120 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.799508E+00 | grad norm: 0.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.582 | TFLOPs: 11.74 | 7: iteration 10500/ 173500 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.793415E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.189 | TFLOPs: 11.86 | 7: iteration 10510/ 173500 | consumed samples: 2690560 | consumed tokens: 5510266880 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.783382E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.544 | TFLOPs: 11.67 | 7: iteration 10520/ 173500 | consumed samples: 2693120 | consumed tokens: 5515509760 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.799076E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.149 | TFLOPs: 11.58 | 7: iteration 10530/ 173500 | consumed samples: 2695680 | consumed tokens: 5520752640 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.793284E+00 | grad norm: 0.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.050 | TFLOPs: 11.75 | 7: iteration 10540/ 173500 | consumed samples: 2698240 | consumed tokens: 5525995520 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.801250E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.135 | TFLOPs: 11.87 | 7: iteration 10550/ 173500 | consumed samples: 2700800 | consumed tokens: 5531238400 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.793399E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.715 | TFLOPs: 11.84 | 7: iteration 10560/ 173500 | consumed samples: 2703360 | consumed tokens: 5536481280 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.794382E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.102 | TFLOPs: 11.59 | 7: iteration 10570/ 173500 | consumed samples: 2705920 | consumed tokens: 5541724160 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.797706E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.242 | TFLOPs: 11.57 | 7: iteration 10580/ 173500 | consumed samples: 2708480 | consumed tokens: 5546967040 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.809333E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.129 | TFLOPs: 11.84 | 7: iteration 10590/ 173500 | consumed samples: 2711040 | consumed tokens: 5552209920 | elapsed time per iteration (s): 0.09 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.796890E+00 | grad norm: 0.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2862.151 | TFLOPs: 10.65 | 7: iteration 10600/ 173500 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.798901E+00 | grad norm: 0.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2360.230 | TFLOPs: 8.78 | 7: iteration 10610/ 173500 | consumed samples: 2716160 | consumed tokens: 5562695680 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.786600E+00 | grad norm: 0.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2339.894 | TFLOPs: 8.70 | 7: iteration 10620/ 173500 | consumed samples: 2718720 | consumed tokens: 5567938560 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.789534E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.105 | TFLOPs: 8.46 | 7: iteration 10630/ 173500 | consumed samples: 2721280 | consumed tokens: 5573181440 | elapsed time per iteration (s): 0.12 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.794620E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.625 | TFLOPs: 8.18 | 7: iteration 10640/ 173500 | consumed samples: 2723840 | consumed tokens: 5578424320 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.794051E+00 | grad norm: 0.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2334.048 | TFLOPs: 8.68 | 7: iteration 10650/ 173500 | consumed samples: 2726400 | consumed tokens: 5583667200 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.778014E+00 | grad norm: 0.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.685 | TFLOPs: 8.88 | 7: iteration 10660/ 173500 | consumed samples: 2728960 | consumed tokens: 5588910080 | elapsed time per iteration (s): 0.12 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.792631E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.866 | TFLOPs: 8.21 | 7: iteration 10670/ 173500 | consumed samples: 2731520 | consumed tokens: 5594152960 | elapsed time per iteration (s): 0.10 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.793628E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.483 | TFLOPs: 9.23 | 7: iteration 10680/ 173500 | consumed samples: 2734080 | consumed tokens: 5599395840 | elapsed time per iteration (s): 0.09 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.798426E+00 | grad norm: 0.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.672 | TFLOPs: 10.61 | 7: iteration 10690/ 173500 | consumed samples: 2736640 | consumed tokens: 5604638720 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.798311E+00 | grad norm: 0.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.930 | TFLOPs: 11.86 | 7: iteration 10700/ 173500 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.794587E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.810 | TFLOPs: 11.88 | 7: iteration 10710/ 173500 | consumed samples: 2741760 | consumed tokens: 5615124480 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.785758E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.513 | TFLOPs: 11.28 | 7: iteration 10720/ 173500 | consumed samples: 2744320 | consumed tokens: 5620367360 | elapsed time per iteration (s): 0.10 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.788979E+00 | grad norm: 0.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2457.360 | TFLOPs: 9.14 | 7: iteration 10730/ 173500 | consumed samples: 2746880 | consumed tokens: 5625610240 | elapsed time per iteration (s): 0.09 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.797424E+00 | grad norm: 0.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.964 | TFLOPs: 11.18 | 7: iteration 10740/ 173500 | consumed samples: 2749440 | consumed tokens: 5630853120 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.793306E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.701 | TFLOPs: 11.91 | 7: iteration 10750/ 173500 | consumed samples: 2752000 | consumed tokens: 5636096000 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.796507E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.722 | TFLOPs: 11.90 | 7: iteration 10760/ 173500 | consumed samples: 2754560 | consumed tokens: 5641338880 | elapsed time per iteration (s): 0.09 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.795851E+00 | grad norm: 0.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.534 | TFLOPs: 10.45 | 7: iteration 10770/ 173500 | consumed samples: 2757120 | consumed tokens: 5646581760 | elapsed time per iteration (s): 0.11 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.803432E+00 | grad norm: 0.821 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.771 | TFLOPs: 8.88 | 7: iteration 10780/ 173500 | consumed samples: 2759680 | consumed tokens: 5651824640 | elapsed time per iteration (s): 0.09 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.804024E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.225 | TFLOPs: 10.48 | 7: iteration 10790/ 173500 | consumed samples: 2762240 | consumed tokens: 5657067520 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.787347E+00 | grad norm: 0.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.399 | TFLOPs: 11.68 | 7: iteration 10800/ 173500 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.780235E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.647 | TFLOPs: 11.91 | 7: iteration 10810/ 173500 | consumed samples: 2767360 | consumed tokens: 5667553280 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.796065E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.616 | TFLOPs: 11.35 | 7: iteration 10820/ 173500 | consumed samples: 2769920 | consumed tokens: 5672796160 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.802723E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.375 | TFLOPs: 11.95 | 7: iteration 10830/ 173500 | consumed samples: 2772480 | consumed tokens: 5678039040 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.787140E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.040 | TFLOPs: 11.93 | 7: iteration 10840/ 173500 | consumed samples: 2775040 | consumed tokens: 5683281920 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.794764E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.745 | TFLOPs: 11.95 | 7: iteration 10850/ 173500 | consumed samples: 2777600 | consumed tokens: 5688524800 | elapsed time per iteration (s): 0.08 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 4.796837E+00 | grad norm: 0.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.184 | TFLOPs: 11.80 | 7: iteration 10860/ 173500 | consumed samples: 2780160 | consumed tokens: 5693767680 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.801654E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2822.741 | TFLOPs: 10.50 | 7: iteration 10870/ 173500 | consumed samples: 2782720 | consumed tokens: 5699010560 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.792154E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.669 | TFLOPs: 11.88 | 7: iteration 10880/ 173500 | consumed samples: 2785280 | consumed tokens: 5704253440 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.796058E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.407 | TFLOPs: 11.79 | 7: iteration 10890/ 173500 | consumed samples: 2787840 | consumed tokens: 5709496320 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.799134E+00 | grad norm: 0.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.223 | TFLOPs: 11.81 | 7: iteration 10900/ 173500 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.795727E+00 | grad norm: 0.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.456 | TFLOPs: 11.77 | 7: iteration 10910/ 173500 | consumed samples: 2792960 | consumed tokens: 5719982080 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.787847E+00 | grad norm: 0.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.466 | TFLOPs: 11.83 | 7: iteration 10920/ 173500 | consumed samples: 2795520 | consumed tokens: 5725224960 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.796282E+00 | grad norm: 0.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.968 | TFLOPs: 11.87 | 7: iteration 10930/ 173500 | consumed samples: 2798080 | consumed tokens: 5730467840 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.782590E+00 | grad norm: 0.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.815 | TFLOPs: 11.78 | 7: iteration 10940/ 173500 | consumed samples: 2800640 | consumed tokens: 5735710720 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.794525E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.474 | TFLOPs: 10.57 | 7: iteration 10950/ 173500 | consumed samples: 2803200 | consumed tokens: 5740953600 | elapsed time per iteration (s): 0.10 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.786547E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.029 | TFLOPs: 9.74 | 7: iteration 10960/ 173500 | consumed samples: 2805760 | consumed tokens: 5746196480 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.804412E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.224 | TFLOPs: 11.37 | 7: iteration 10970/ 173500 | consumed samples: 2808320 | consumed tokens: 5751439360 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.784036E+00 | grad norm: 0.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.781 | TFLOPs: 11.82 | 7: iteration 10980/ 173500 | consumed samples: 2810880 | consumed tokens: 5756682240 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.795954E+00 | grad norm: 0.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.629 | TFLOPs: 11.78 | 7: iteration 10990/ 173500 | consumed samples: 2813440 | consumed tokens: 5761925120 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.794623E+00 | grad norm: 0.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.078 | TFLOPs: 11.39 | 7: iteration 11000/ 173500 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.792195E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.987 | TFLOPs: 10.73 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 11000 | lm loss value: 4.667966E+00 | lm loss PPL: 1.064810E+02 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 11000 to checkpoints_14m91b100m 0: [2023-03-17 00:33:11,440] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step11000 is begin to save! 0: [2023-03-17 00:33:11,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:33:11,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:33:11,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:33:11,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:33:11,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:33:11,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:33:11,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:33:11,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:33:11,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:33:11,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:33:11,480] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:33:11,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:33:11,481] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step11000/mp_rank_00_model_states.pt 0: [2023-03-17 00:33:11,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:33:11,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:33:11,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 3: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2023-03-17 00:33:11,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:33:11,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: successfully saved checkpoint at iteration 11000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 76.85 7: iteration 11010/ 173500 | consumed samples: 2818560 | consumed tokens: 5772410880 | elapsed time per iteration (s): 0.11 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.798267E+00 | grad norm: 0.803 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2380.010 | TFLOPs: 8.85 | 7: iteration 11020/ 173500 | consumed samples: 2821120 | consumed tokens: 5777653760 | elapsed time per iteration (s): 0.10 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.798703E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2578.010 | TFLOPs: 9.59 | 7: iteration 11030/ 173500 | consumed samples: 2823680 | consumed tokens: 5782896640 | elapsed time per iteration (s): 0.10 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.785991E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.896 | TFLOPs: 9.83 | 7: iteration 11040/ 173500 | consumed samples: 2826240 | consumed tokens: 5788139520 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.790787E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.178 | TFLOPs: 10.57 | 7: iteration 11050/ 173500 | consumed samples: 2828800 | consumed tokens: 5793382400 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.790156E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.239 | TFLOPs: 11.81 | 7: iteration 11060/ 173500 | consumed samples: 2831360 | consumed tokens: 5798625280 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.783694E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.758 | TFLOPs: 11.78 | 7: iteration 11070/ 173500 | consumed samples: 2833920 | consumed tokens: 5803868160 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.788737E+00 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.925 | TFLOPs: 11.52 | 7: iteration 11080/ 173500 | consumed samples: 2836480 | consumed tokens: 5809111040 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.788546E+00 | grad norm: 0.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.562 | TFLOPs: 11.81 | 7: iteration 11090/ 173500 | consumed samples: 2839040 | consumed tokens: 5814353920 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.784032E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.131 | TFLOPs: 10.30 | 7: iteration 11100/ 173500 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.784846E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2814.409 | TFLOPs: 10.47 | 7: iteration 11110/ 173500 | consumed samples: 2844160 | consumed tokens: 5824839680 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.786671E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2725.616 | TFLOPs: 10.14 | 7: iteration 11120/ 173500 | consumed samples: 2846720 | consumed tokens: 5830082560 | elapsed time per iteration (s): 0.10 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.790698E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2564.103 | TFLOPs: 9.54 | 7: iteration 11130/ 173500 | consumed samples: 2849280 | consumed tokens: 5835325440 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.778060E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.793 | TFLOPs: 10.33 | 7: iteration 11140/ 173500 | consumed samples: 2851840 | consumed tokens: 5840568320 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.784283E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.123 | TFLOPs: 10.24 | 7: iteration 11150/ 173500 | consumed samples: 2854400 | consumed tokens: 5845811200 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.787251E+00 | grad norm: 0.831 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2779.248 | TFLOPs: 10.34 | 7: iteration 11160/ 173500 | consumed samples: 2856960 | consumed tokens: 5851054080 | elapsed time per iteration (s): 0.10 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.771943E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.514 | TFLOPs: 9.56 | 7: iteration 11170/ 173500 | consumed samples: 2859520 | consumed tokens: 5856296960 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.787507E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.527 | TFLOPs: 11.59 | 7: iteration 11180/ 173500 | consumed samples: 2862080 | consumed tokens: 5861539840 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.799242E+00 | grad norm: 0.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2880.013 | TFLOPs: 10.71 | 7: iteration 11190/ 173500 | consumed samples: 2864640 | consumed tokens: 5866782720 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.790439E+00 | grad norm: 0.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.570 | TFLOPs: 11.94 | 7: iteration 11200/ 173500 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.09 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.796144E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.668 | TFLOPs: 10.66 | 7: iteration 11210/ 173500 | consumed samples: 2869760 | consumed tokens: 5877268480 | elapsed time per iteration (s): 0.08 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 4.782138E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.857 | TFLOPs: 11.97 | 7: iteration 11220/ 173500 | consumed samples: 2872320 | consumed tokens: 5882511360 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.782877E+00 | grad norm: 0.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.628 | TFLOPs: 11.44 | 7: iteration 11230/ 173500 | consumed samples: 2874880 | consumed tokens: 5887754240 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.788651E+00 | grad norm: 0.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.601 | TFLOPs: 11.43 | 7: iteration 11240/ 173500 | consumed samples: 2877440 | consumed tokens: 5892997120 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781880E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.391 | TFLOPs: 11.92 | 7: iteration 11250/ 173500 | consumed samples: 2880000 | consumed tokens: 5898240000 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.787064E+00 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.330 | TFLOPs: 11.91 | 7: iteration 11260/ 173500 | consumed samples: 2882560 | consumed tokens: 5903482880 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.790917E+00 | grad norm: 0.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.577 | TFLOPs: 11.95 | 7: iteration 11270/ 173500 | consumed samples: 2885120 | consumed tokens: 5908725760 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.785195E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.365 | TFLOPs: 11.94 | 7: iteration 11280/ 173500 | consumed samples: 2887680 | consumed tokens: 5913968640 | elapsed time per iteration (s): 0.11 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.778185E+00 | grad norm: 0.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2360.554 | TFLOPs: 8.78 | 7: iteration 11290/ 173500 | consumed samples: 2890240 | consumed tokens: 5919211520 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.780111E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.421 | TFLOPs: 11.15 | 7: iteration 11300/ 173500 | consumed samples: 2892800 | consumed tokens: 5924454400 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.780470E+00 | grad norm: 0.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.475 | TFLOPs: 11.70 | 7: iteration 11310/ 173500 | consumed samples: 2895360 | consumed tokens: 5929697280 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.790631E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.001 | TFLOPs: 11.97 | 7: iteration 11320/ 173500 | consumed samples: 2897920 | consumed tokens: 5934940160 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.787955E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.339 | TFLOPs: 11.79 | 7: iteration 11330/ 173500 | consumed samples: 2900480 | consumed tokens: 5940183040 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781255E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.484 | TFLOPs: 10.09 | 7: iteration 11340/ 173500 | consumed samples: 2903040 | consumed tokens: 5945425920 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.784517E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2087.058 | TFLOPs: 7.76 | 7: iteration 11350/ 173500 | consumed samples: 2905600 | consumed tokens: 5950668800 | elapsed time per iteration (s): 0.13 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.774120E+00 | grad norm: 0.762 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2010.627 | TFLOPs: 7.48 | 7: iteration 11360/ 173500 | consumed samples: 2908160 | consumed tokens: 5955911680 | elapsed time per iteration (s): 0.13 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.784976E+00 | grad norm: 0.839 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.488 | TFLOPs: 7.26 | 7: iteration 11370/ 173500 | consumed samples: 2910720 | consumed tokens: 5961154560 | elapsed time per iteration (s): 0.13 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781771E+00 | grad norm: 0.778 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1930.917 | TFLOPs: 7.18 | 7: iteration 11380/ 173500 | consumed samples: 2913280 | consumed tokens: 5966397440 | elapsed time per iteration (s): 0.14 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781738E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1859.215 | TFLOPs: 6.92 | 7: iteration 11390/ 173500 | consumed samples: 2915840 | consumed tokens: 5971640320 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.775568E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.018 | TFLOPs: 7.99 | 7: iteration 11400/ 173500 | consumed samples: 2918400 | consumed tokens: 5976883200 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.792527E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.305 | TFLOPs: 11.47 | 7: iteration 11410/ 173500 | consumed samples: 2920960 | consumed tokens: 5982126080 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.779544E+00 | grad norm: 0.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2183.105 | TFLOPs: 8.12 | 7: iteration 11420/ 173500 | consumed samples: 2923520 | consumed tokens: 5987368960 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.785652E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.643 | TFLOPs: 8.27 | 7: iteration 11430/ 173500 | consumed samples: 2926080 | consumed tokens: 5992611840 | elapsed time per iteration (s): 0.13 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781131E+00 | grad norm: 1.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.349 | TFLOPs: 7.21 | 7: iteration 11440/ 173500 | consumed samples: 2928640 | consumed tokens: 5997854720 | elapsed time per iteration (s): 0.13 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.771592E+00 | grad norm: 0.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.208 | TFLOPs: 7.53 | 7: iteration 11450/ 173500 | consumed samples: 2931200 | consumed tokens: 6003097600 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.772036E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.828 | TFLOPs: 8.18 | 7: iteration 11460/ 173500 | consumed samples: 2933760 | consumed tokens: 6008340480 | elapsed time per iteration (s): 0.11 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.784465E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.200 | TFLOPs: 8.46 | 7: iteration 11470/ 173500 | consumed samples: 2936320 | consumed tokens: 6013583360 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.783952E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.388 | TFLOPs: 7.75 | 7: iteration 11480/ 173500 | consumed samples: 2938880 | consumed tokens: 6018826240 | elapsed time per iteration (s): 0.12 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.789352E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2126.616 | TFLOPs: 7.91 | 7: iteration 11490/ 173500 | consumed samples: 2941440 | consumed tokens: 6024069120 | elapsed time per iteration (s): 0.10 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.784529E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.144 | TFLOPs: 9.29 | 7: iteration 11500/ 173500 | consumed samples: 2944000 | consumed tokens: 6029312000 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.781827E+00 | grad norm: 0.715 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.263 | TFLOPs: 11.89 | 7: iteration 11510/ 173500 | consumed samples: 2946560 | consumed tokens: 6034554880 | elapsed time per iteration (s): 0.10 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.777785E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.672 | TFLOPs: 9.83 | 7: iteration 11520/ 173500 | consumed samples: 2949120 | consumed tokens: 6039797760 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.783575E+00 | grad norm: 0.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.370 | TFLOPs: 11.19 | 7: iteration 11530/ 173500 | consumed samples: 2951680 | consumed tokens: 6045040640 | elapsed time per iteration (s): 0.08 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.780065E+00 | grad norm: 0.676 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.142 | TFLOPs: 11.88 | 7: iteration 11540/ 173500 | consumed samples: 2954240 | consumed tokens: 6050283520 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.788427E+00 | grad norm: 0.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.206 | TFLOPs: 10.96 | 7: iteration 11550/ 173500 | consumed samples: 2956800 | consumed tokens: 6055526400 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.782311E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2763.814 | TFLOPs: 10.28 | 7: iteration 11560/ 173500 | consumed samples: 2959360 | consumed tokens: 6060769280 | elapsed time per iteration (s): 0.09 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 4.780665E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.365 | TFLOPs: 10.30 | 7: iteration 11570/ 173500 | consumed samples: 2961920 | consumed tokens: 6066012160 | elapsed time per iteration (s): 0.13 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.787481E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1959.070 | TFLOPs: 7.29 | 7: iteration 11580/ 173500 | consumed samples: 2964480 | consumed tokens: 6071255040 | elapsed time per iteration (s): 0.12 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.771879E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2160.338 | TFLOPs: 8.04 | 7: iteration 11590/ 173500 | consumed samples: 2967040 | consumed tokens: 6076497920 | elapsed time per iteration (s): 0.10 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.782894E+00 | grad norm: 1.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2531.852 | TFLOPs: 9.42 | 7: iteration 11600/ 173500 | consumed samples: 2969600 | consumed tokens: 6081740800 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.789954E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.385 | TFLOPs: 11.93 | 7: iteration 11610/ 173500 | consumed samples: 2972160 | consumed tokens: 6086983680 | elapsed time per iteration (s): 0.10 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778659E+00 | grad norm: 0.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2602.884 | TFLOPs: 9.68 | 7: iteration 11620/ 173500 | consumed samples: 2974720 | consumed tokens: 6092226560 | elapsed time per iteration (s): 0.09 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.779708E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.771 | TFLOPs: 10.43 | 7: iteration 11630/ 173500 | consumed samples: 2977280 | consumed tokens: 6097469440 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.781272E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.984 | TFLOPs: 11.80 | 7: iteration 11640/ 173500 | consumed samples: 2979840 | consumed tokens: 6102712320 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778622E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.089 | TFLOPs: 11.42 | 7: iteration 11650/ 173500 | consumed samples: 2982400 | consumed tokens: 6107955200 | elapsed time per iteration (s): 0.09 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.777268E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2708.073 | TFLOPs: 10.07 | 7: iteration 11660/ 173500 | consumed samples: 2984960 | consumed tokens: 6113198080 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778091E+00 | grad norm: 0.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.254 | TFLOPs: 11.36 | 7: iteration 11670/ 173500 | consumed samples: 2987520 | consumed tokens: 6118440960 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.773723E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.122 | TFLOPs: 11.88 | 7: iteration 11680/ 173500 | consumed samples: 2990080 | consumed tokens: 6123683840 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.783218E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.462 | TFLOPs: 11.90 | 7: iteration 11690/ 173500 | consumed samples: 2992640 | consumed tokens: 6128926720 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.777398E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.998 | TFLOPs: 11.44 | 7: iteration 11700/ 173500 | consumed samples: 2995200 | consumed tokens: 6134169600 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.792144E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.367 | TFLOPs: 11.94 | 7: iteration 11710/ 173500 | consumed samples: 2997760 | consumed tokens: 6139412480 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.774945E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.444 | TFLOPs: 11.88 | 7: iteration 11720/ 173500 | consumed samples: 3000320 | consumed tokens: 6144655360 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.784148E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.553 | TFLOPs: 11.45 | 7: iteration 11730/ 173500 | consumed samples: 3002880 | consumed tokens: 6149898240 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.771364E+00 | grad norm: 0.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.156 | TFLOPs: 11.99 | 7: iteration 11740/ 173500 | consumed samples: 3005440 | consumed tokens: 6155141120 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778127E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.258 | TFLOPs: 11.97 | 7: iteration 11750/ 173500 | consumed samples: 3008000 | consumed tokens: 6160384000 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.772054E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.265 | TFLOPs: 11.85 | 7: iteration 11760/ 173500 | consumed samples: 3010560 | consumed tokens: 6165626880 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.785395E+00 | grad norm: 0.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.241 | TFLOPs: 11.88 | 7: iteration 11770/ 173500 | consumed samples: 3013120 | consumed tokens: 6170869760 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.790758E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.733 | TFLOPs: 11.90 | 7: iteration 11780/ 173500 | consumed samples: 3015680 | consumed tokens: 6176112640 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.773292E+00 | grad norm: 0.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.032 | TFLOPs: 11.89 | 7: iteration 11790/ 173500 | consumed samples: 3018240 | consumed tokens: 6181355520 | elapsed time per iteration (s): 0.10 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.776414E+00 | grad norm: 0.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.901 | TFLOPs: 9.61 | 7: iteration 11800/ 173500 | consumed samples: 3020800 | consumed tokens: 6186598400 | elapsed time per iteration (s): 0.10 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.779647E+00 | grad norm: 0.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2545.058 | TFLOPs: 9.47 | 7: iteration 11810/ 173500 | consumed samples: 3023360 | consumed tokens: 6191841280 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.765990E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.511 | TFLOPs: 11.57 | 7: iteration 11820/ 173500 | consumed samples: 3025920 | consumed tokens: 6197084160 | elapsed time per iteration (s): 0.09 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778118E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2983.544 | TFLOPs: 11.10 | 7: iteration 11830/ 173500 | consumed samples: 3028480 | consumed tokens: 6202327040 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.778713E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.907 | TFLOPs: 11.34 | 7: iteration 11840/ 173500 | consumed samples: 3031040 | consumed tokens: 6207569920 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.790582E+00 | grad norm: 0.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.652 | TFLOPs: 11.23 | 7: iteration 11850/ 173500 | consumed samples: 3033600 | consumed tokens: 6212812800 | elapsed time per iteration (s): 0.08 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.770433E+00 | grad norm: 0.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.136 | TFLOPs: 11.81 | 7: iteration 11860/ 173500 | consumed samples: 3036160 | consumed tokens: 6218055680 | elapsed time per iteration (s): 0.09 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.765295E+00 | grad norm: 0.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.014 | TFLOPs: 10.32 | 7: iteration 11870/ 173500 | consumed samples: 3038720 | consumed tokens: 6223298560 | elapsed time per iteration (s): 0.12 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.781038E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2118.992 | TFLOPs: 7.88 | 7: iteration 11880/ 173500 | consumed samples: 3041280 | consumed tokens: 6228541440 | elapsed time per iteration (s): 0.10 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.771994E+00 | grad norm: 0.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2636.714 | TFLOPs: 9.81 | 7: iteration 11890/ 173500 | consumed samples: 3043840 | consumed tokens: 6233784320 | elapsed time per iteration (s): 0.09 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 4.766308E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.003 | TFLOPs: 10.78 | 7: iteration 11900/ 173500 | consumed samples: 3046400 | consumed tokens: 6239027200 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.773729E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.619 | TFLOPs: 11.83 | 7: iteration 11910/ 173500 | consumed samples: 3048960 | consumed tokens: 6244270080 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.784535E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.118 | TFLOPs: 11.95 | 7: iteration 11920/ 173500 | consumed samples: 3051520 | consumed tokens: 6249512960 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.780546E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.988 | TFLOPs: 11.95 | 7: iteration 11930/ 173500 | consumed samples: 3054080 | consumed tokens: 6254755840 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.770232E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.583 | TFLOPs: 11.95 | 7: iteration 11940/ 173500 | consumed samples: 3056640 | consumed tokens: 6259998720 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.773572E+00 | grad norm: 0.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.455 | TFLOPs: 11.88 | 7: iteration 11950/ 173500 | consumed samples: 3059200 | consumed tokens: 6265241600 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.772702E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.296 | TFLOPs: 11.91 | 7: iteration 11960/ 173500 | consumed samples: 3061760 | consumed tokens: 6270484480 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.774086E+00 | grad norm: 0.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.388 | TFLOPs: 11.94 | 7: iteration 11970/ 173500 | consumed samples: 3064320 | consumed tokens: 6275727360 | elapsed time per iteration (s): 0.09 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.765643E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.692 | TFLOPs: 11.16 | 7: iteration 11980/ 173500 | consumed samples: 3066880 | consumed tokens: 6280970240 | elapsed time per iteration (s): 0.09 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.784557E+00 | grad norm: 0.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.041 | TFLOPs: 10.54 | 7: iteration 11990/ 173500 | consumed samples: 3069440 | consumed tokens: 6286213120 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.777990E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.923 | TFLOPs: 11.78 | 0: [2023-03-17 00:34:44,239] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=0, lr=[0.0001984184547955352, 0.0001984184547955352, 0.0001984184547955352], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 12000/ 173500 | consumed samples: 3072000 | consumed tokens: 6291456000 | elapsed time per iteration (s): 0.09 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.770351E+00 | grad norm: 0.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.952 | TFLOPs: 11.20 | 0: steps: 12000 loss: 4.7061 iter time (s): 0.091 samples/sec: 2815.887 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 12000 | lm loss value: 4.573766E+00 | lm loss PPL: 9.690836E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 12000 to checkpoints_14m91b100m 0: [2023-03-17 00:34:44,321] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step12000 is begin to save! 0: [2023-03-17 00:34:44,324] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:34:44,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:34:44,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:34:44,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:34:44,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:34:44,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:34:44,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:34:44,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:34:44,358] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:34:44,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:34:44,361] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:34:44,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:34:44,362] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step12000/mp_rank_00_model_states.pt 0: [2023-03-17 00:34:44,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:34:44,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:34:44,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:34:44,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:34:44,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:34:44,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:34:44,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:34:44,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:34:44,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:34:44,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 00:34:44,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 00:34:44,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2023-03-17 00:34:44,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2023-03-17 00:34:44,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:34:44,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:34:44,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: successfully saved checkpoint at iteration 12000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.84 7: iteration 12010/ 173500 | consumed samples: 3074560 | consumed tokens: 6296698880 | elapsed time per iteration (s): 0.12 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.766943E+00 | grad norm: 0.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.320 | TFLOPs: 8.02 | 7: iteration 12020/ 173500 | consumed samples: 3077120 | consumed tokens: 6301941760 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.779903E+00 | grad norm: 0.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.090 | TFLOPs: 11.72 | 7: iteration 12030/ 173500 | consumed samples: 3079680 | consumed tokens: 6307184640 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.780735E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.881 | TFLOPs: 11.77 | 7: iteration 12040/ 173500 | consumed samples: 3082240 | consumed tokens: 6312427520 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.780045E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.517 | TFLOPs: 11.95 | 7: iteration 12050/ 173500 | consumed samples: 3084800 | consumed tokens: 6317670400 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.784841E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.179 | TFLOPs: 11.66 | 7: iteration 12060/ 173500 | consumed samples: 3087360 | consumed tokens: 6322913280 | elapsed time per iteration (s): 0.13 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.780900E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1954.875 | TFLOPs: 7.27 | 7: iteration 12070/ 173500 | consumed samples: 3089920 | consumed tokens: 6328156160 | elapsed time per iteration (s): 0.11 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.773972E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2235.331 | TFLOPs: 8.31 | 7: iteration 12080/ 173500 | consumed samples: 3092480 | consumed tokens: 6333399040 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.777188E+00 | grad norm: 0.795 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.902 | TFLOPs: 11.78 | 7: iteration 12090/ 173500 | consumed samples: 3095040 | consumed tokens: 6338641920 | elapsed time per iteration (s): 0.12 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.786782E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2136.489 | TFLOPs: 7.95 | 7: iteration 12100/ 173500 | consumed samples: 3097600 | consumed tokens: 6343884800 | elapsed time per iteration (s): 0.12 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.770724E+00 | grad norm: 0.920 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2100.791 | TFLOPs: 7.81 | 7: iteration 12110/ 173500 | consumed samples: 3100160 | consumed tokens: 6349127680 | elapsed time per iteration (s): 0.11 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.775125E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2322.990 | TFLOPs: 8.64 | 7: iteration 12120/ 173500 | consumed samples: 3102720 | consumed tokens: 6354370560 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.780337E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.414 | TFLOPs: 11.80 | 7: iteration 12130/ 173500 | consumed samples: 3105280 | consumed tokens: 6359613440 | elapsed time per iteration (s): 0.11 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.775807E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2262.799 | TFLOPs: 8.42 | 7: iteration 12140/ 173500 | consumed samples: 3107840 | consumed tokens: 6364856320 | elapsed time per iteration (s): 0.12 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.781288E+00 | grad norm: 0.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2076.266 | TFLOPs: 7.72 | 7: iteration 12150/ 173500 | consumed samples: 3110400 | consumed tokens: 6370099200 | elapsed time per iteration (s): 0.12 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.768718E+00 | grad norm: 0.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.405 | TFLOPs: 7.63 | 7: iteration 12160/ 173500 | consumed samples: 3112960 | consumed tokens: 6375342080 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.768653E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.949 | TFLOPs: 11.29 | 7: iteration 12170/ 173500 | consumed samples: 3115520 | consumed tokens: 6380584960 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.773462E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.667 | TFLOPs: 11.82 | 7: iteration 12180/ 173500 | consumed samples: 3118080 | consumed tokens: 6385827840 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.782473E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.511 | TFLOPs: 11.83 | 7: iteration 12190/ 173500 | consumed samples: 3120640 | consumed tokens: 6391070720 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.763892E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.939 | TFLOPs: 11.88 | 7: iteration 12200/ 173500 | consumed samples: 3123200 | consumed tokens: 6396313600 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.765310E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.992 | TFLOPs: 11.65 | 7: iteration 12210/ 173500 | consumed samples: 3125760 | consumed tokens: 6401556480 | elapsed time per iteration (s): 0.10 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.777492E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.389 | TFLOPs: 9.21 | 7: iteration 12220/ 173500 | consumed samples: 3128320 | consumed tokens: 6406799360 | elapsed time per iteration (s): 0.08 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 4.768894E+00 | grad norm: 0.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.606 | TFLOPs: 11.88 | 7: iteration 12230/ 173500 | consumed samples: 3130880 | consumed tokens: 6412042240 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.773377E+00 | grad norm: 0.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.826 | TFLOPs: 11.90 | 7: iteration 12240/ 173500 | consumed samples: 3133440 | consumed tokens: 6417285120 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.768666E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.964 | TFLOPs: 11.94 | 7: iteration 12250/ 173500 | consumed samples: 3136000 | consumed tokens: 6422528000 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.773776E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.883 | TFLOPs: 11.85 | 7: iteration 12260/ 173500 | consumed samples: 3138560 | consumed tokens: 6427770880 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.763668E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.265 | TFLOPs: 11.91 | 7: iteration 12270/ 173500 | consumed samples: 3141120 | consumed tokens: 6433013760 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.784040E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.856 | TFLOPs: 11.92 | 7: iteration 12280/ 173500 | consumed samples: 3143680 | consumed tokens: 6438256640 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.762877E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.483 | TFLOPs: 11.64 | 7: iteration 12290/ 173500 | consumed samples: 3146240 | consumed tokens: 6443499520 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.770586E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.784 | TFLOPs: 11.37 | 7: iteration 12300/ 173500 | consumed samples: 3148800 | consumed tokens: 6448742400 | elapsed time per iteration (s): 0.09 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.772527E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.328 | TFLOPs: 10.96 | 7: iteration 12310/ 173500 | consumed samples: 3151360 | consumed tokens: 6453985280 | elapsed time per iteration (s): 0.10 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.770726E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.646 | TFLOPs: 9.26 | 7: iteration 12320/ 173500 | consumed samples: 3153920 | consumed tokens: 6459228160 | elapsed time per iteration (s): 0.09 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.761354E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.153 | TFLOPs: 10.47 | 7: iteration 12330/ 173500 | consumed samples: 3156480 | consumed tokens: 6464471040 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.779845E+00 | grad norm: 0.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.179 | TFLOPs: 11.90 | 7: iteration 12340/ 173500 | consumed samples: 3159040 | consumed tokens: 6469713920 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.781612E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.072 | TFLOPs: 11.91 | 7: iteration 12350/ 173500 | consumed samples: 3161600 | consumed tokens: 6474956800 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.770828E+00 | grad norm: 0.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.937 | TFLOPs: 11.80 | 7: iteration 12360/ 173500 | consumed samples: 3164160 | consumed tokens: 6480199680 | elapsed time per iteration (s): 0.12 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.771739E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.652 | TFLOPs: 8.22 | 7: iteration 12370/ 173500 | consumed samples: 3166720 | consumed tokens: 6485442560 | elapsed time per iteration (s): 0.13 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.762555E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.607 | TFLOPs: 7.24 | 7: iteration 12380/ 173500 | consumed samples: 3169280 | consumed tokens: 6490685440 | elapsed time per iteration (s): 0.12 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.759471E+00 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2080.190 | TFLOPs: 7.74 | 7: iteration 12390/ 173500 | consumed samples: 3171840 | consumed tokens: 6495928320 | elapsed time per iteration (s): 0.11 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.770895E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2347.281 | TFLOPs: 8.73 | 7: iteration 12400/ 173500 | consumed samples: 3174400 | consumed tokens: 6501171200 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.762880E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.975 | TFLOPs: 11.87 | 7: iteration 12410/ 173500 | consumed samples: 3176960 | consumed tokens: 6506414080 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.768077E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.355 | TFLOPs: 11.64 | 7: iteration 12420/ 173500 | consumed samples: 3179520 | consumed tokens: 6511656960 | elapsed time per iteration (s): 0.09 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.774552E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2922.729 | TFLOPs: 10.87 | 7: iteration 12430/ 173500 | consumed samples: 3182080 | consumed tokens: 6516899840 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.763787E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.603 | TFLOPs: 11.69 | 7: iteration 12440/ 173500 | consumed samples: 3184640 | consumed tokens: 6522142720 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.767432E+00 | grad norm: 0.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.598 | TFLOPs: 11.97 | 7: iteration 12450/ 173500 | consumed samples: 3187200 | consumed tokens: 6527385600 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.768845E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.861 | TFLOPs: 11.93 | 7: iteration 12460/ 173500 | consumed samples: 3189760 | consumed tokens: 6532628480 | elapsed time per iteration (s): 0.09 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.763710E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.695 | TFLOPs: 11.18 | 7: iteration 12470/ 173500 | consumed samples: 3192320 | consumed tokens: 6537871360 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.763302E+00 | grad norm: 0.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.784 | TFLOPs: 11.66 | 7: iteration 12480/ 173500 | consumed samples: 3194880 | consumed tokens: 6543114240 | elapsed time per iteration (s): 0.10 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.762656E+00 | grad norm: 0.854 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.278 | TFLOPs: 9.46 | 7: iteration 12490/ 173500 | consumed samples: 3197440 | consumed tokens: 6548357120 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.768327E+00 | grad norm: 0.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.041 | TFLOPs: 11.82 | 7: iteration 12500/ 173500 | consumed samples: 3200000 | consumed tokens: 6553600000 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.766740E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.276 | TFLOPs: 11.94 | 7: iteration 12510/ 173500 | consumed samples: 3202560 | consumed tokens: 6558842880 | elapsed time per iteration (s): 0.08 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.789498E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.516 | TFLOPs: 11.85 | 7: iteration 12520/ 173500 | consumed samples: 3205120 | consumed tokens: 6564085760 | elapsed time per iteration (s): 0.10 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.761985E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2658.761 | TFLOPs: 9.89 | 7: iteration 12530/ 173500 | consumed samples: 3207680 | consumed tokens: 6569328640 | elapsed time per iteration (s): 0.12 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 4.762110E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.072 | TFLOPs: 8.18 | 7: iteration 12540/ 173500 | consumed samples: 3210240 | consumed tokens: 6574571520 | elapsed time per iteration (s): 0.09 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.772112E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.036 | TFLOPs: 10.48 | 7: iteration 12550/ 173500 | consumed samples: 3212800 | consumed tokens: 6579814400 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.761994E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.854 | TFLOPs: 11.74 | 7: iteration 12560/ 173500 | consumed samples: 3215360 | consumed tokens: 6585057280 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.762385E+00 | grad norm: 0.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.724 | TFLOPs: 11.75 | 7: iteration 12570/ 173500 | consumed samples: 3217920 | consumed tokens: 6590300160 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.765282E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.595 | TFLOPs: 11.98 | 7: iteration 12580/ 173500 | consumed samples: 3220480 | consumed tokens: 6595543040 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.771175E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.200 | TFLOPs: 12.02 | 7: iteration 12590/ 173500 | consumed samples: 3223040 | consumed tokens: 6600785920 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.773021E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.949 | TFLOPs: 11.68 | 7: iteration 12600/ 173500 | consumed samples: 3225600 | consumed tokens: 6606028800 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.765330E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.584 | TFLOPs: 11.55 | 7: iteration 12610/ 173500 | consumed samples: 3228160 | consumed tokens: 6611271680 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.770831E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.232 | TFLOPs: 11.37 | 7: iteration 12620/ 173500 | consumed samples: 3230720 | consumed tokens: 6616514560 | elapsed time per iteration (s): 0.09 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766494E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.099 | TFLOPs: 10.65 | 7: iteration 12630/ 173500 | consumed samples: 3233280 | consumed tokens: 6621757440 | elapsed time per iteration (s): 0.09 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.765587E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.232 | TFLOPs: 11.15 | 7: iteration 12640/ 173500 | consumed samples: 3235840 | consumed tokens: 6627000320 | elapsed time per iteration (s): 0.09 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.768695E+00 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2737.224 | TFLOPs: 10.18 | 7: iteration 12650/ 173500 | consumed samples: 3238400 | consumed tokens: 6632243200 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.762422E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.444 | TFLOPs: 11.92 | 7: iteration 12660/ 173500 | consumed samples: 3240960 | consumed tokens: 6637486080 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.776391E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.823 | TFLOPs: 11.95 | 7: iteration 12670/ 173500 | consumed samples: 3243520 | consumed tokens: 6642728960 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766211E+00 | grad norm: 0.769 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.096 | TFLOPs: 11.86 | 7: iteration 12680/ 173500 | consumed samples: 3246080 | consumed tokens: 6647971840 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.761882E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.894 | TFLOPs: 11.87 | 7: iteration 12690/ 173500 | consumed samples: 3248640 | consumed tokens: 6653214720 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.770497E+00 | grad norm: 0.745 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.045 | TFLOPs: 11.89 | 7: iteration 12700/ 173500 | consumed samples: 3251200 | consumed tokens: 6658457600 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.776403E+00 | grad norm: 0.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.516 | TFLOPs: 11.95 | 7: iteration 12710/ 173500 | consumed samples: 3253760 | consumed tokens: 6663700480 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766782E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.500 | TFLOPs: 11.94 | 7: iteration 12720/ 173500 | consumed samples: 3256320 | consumed tokens: 6668943360 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.773936E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.757 | TFLOPs: 11.96 | 7: iteration 12730/ 173500 | consumed samples: 3258880 | consumed tokens: 6674186240 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766116E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.536 | TFLOPs: 11.92 | 7: iteration 12740/ 173500 | consumed samples: 3261440 | consumed tokens: 6679429120 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.759518E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.842 | TFLOPs: 11.87 | 7: iteration 12750/ 173500 | consumed samples: 3264000 | consumed tokens: 6684672000 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.764907E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.999 | TFLOPs: 11.71 | 7: iteration 12760/ 173500 | consumed samples: 3266560 | consumed tokens: 6689914880 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.765814E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.960 | TFLOPs: 11.58 | 7: iteration 12770/ 173500 | consumed samples: 3269120 | consumed tokens: 6695157760 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766343E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.507 | TFLOPs: 11.90 | 7: iteration 12780/ 173500 | consumed samples: 3271680 | consumed tokens: 6700400640 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.761153E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.295 | TFLOPs: 11.61 | 7: iteration 12790/ 173500 | consumed samples: 3274240 | consumed tokens: 6705643520 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.771283E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.712 | TFLOPs: 11.90 | 7: iteration 12800/ 173500 | consumed samples: 3276800 | consumed tokens: 6710886400 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.760194E+00 | grad norm: 0.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.138 | TFLOPs: 11.88 | 7: iteration 12810/ 173500 | consumed samples: 3279360 | consumed tokens: 6716129280 | elapsed time per iteration (s): 0.09 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.766381E+00 | grad norm: 0.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.103 | TFLOPs: 10.94 | 7: iteration 12820/ 173500 | consumed samples: 3281920 | consumed tokens: 6721372160 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.763372E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.602 | TFLOPs: 11.83 | 7: iteration 12830/ 173500 | consumed samples: 3284480 | consumed tokens: 6726615040 | elapsed time per iteration (s): 0.08 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 4.754584E+00 | grad norm: 0.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.620 | TFLOPs: 11.89 | 7: iteration 12840/ 173500 | consumed samples: 3287040 | consumed tokens: 6731857920 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.766668E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.336 | TFLOPs: 11.92 | 7: iteration 12850/ 173500 | consumed samples: 3289600 | consumed tokens: 6737100800 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.763196E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.287 | TFLOPs: 11.89 | 7: iteration 12860/ 173500 | consumed samples: 3292160 | consumed tokens: 6742343680 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.779538E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.812 | TFLOPs: 11.91 | 7: iteration 12870/ 173500 | consumed samples: 3294720 | consumed tokens: 6747586560 | elapsed time per iteration (s): 0.09 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.778185E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.682 | TFLOPs: 11.05 | 7: iteration 12880/ 173500 | consumed samples: 3297280 | consumed tokens: 6752829440 | elapsed time per iteration (s): 0.09 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.769961E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.860 | TFLOPs: 10.15 | 7: iteration 12890/ 173500 | consumed samples: 3299840 | consumed tokens: 6758072320 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.769563E+00 | grad norm: 0.726 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.816 | TFLOPs: 11.55 | 7: iteration 12900/ 173500 | consumed samples: 3302400 | consumed tokens: 6763315200 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.778942E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.530 | TFLOPs: 11.80 | 7: iteration 12910/ 173500 | consumed samples: 3304960 | consumed tokens: 6768558080 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.758099E+00 | grad norm: 0.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.088 | TFLOPs: 11.81 | 7: iteration 12920/ 173500 | consumed samples: 3307520 | consumed tokens: 6773800960 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.764707E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.223 | TFLOPs: 11.80 | 7: iteration 12930/ 173500 | consumed samples: 3310080 | consumed tokens: 6779043840 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.747545E+00 | grad norm: 0.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.614 | TFLOPs: 11.51 | 7: iteration 12940/ 173500 | consumed samples: 3312640 | consumed tokens: 6784286720 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.762616E+00 | grad norm: 0.764 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.304 | TFLOPs: 11.94 | 7: iteration 12950/ 173500 | consumed samples: 3315200 | consumed tokens: 6789529600 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.767750E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.093 | TFLOPs: 11.67 | 7: iteration 12960/ 173500 | consumed samples: 3317760 | consumed tokens: 6794772480 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.761929E+00 | grad norm: 0.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.071 | TFLOPs: 11.98 | 7: iteration 12970/ 173500 | consumed samples: 3320320 | consumed tokens: 6800015360 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.767766E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.431 | TFLOPs: 11.97 | 7: iteration 12980/ 173500 | consumed samples: 3322880 | consumed tokens: 6805258240 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.759760E+00 | grad norm: 0.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.123 | TFLOPs: 11.97 | 7: iteration 12990/ 173500 | consumed samples: 3325440 | consumed tokens: 6810501120 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.769557E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.500 | TFLOPs: 11.97 | 7: iteration 13000/ 173500 | consumed samples: 3328000 | consumed tokens: 6815744000 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.752732E+00 | grad norm: 0.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.761 | TFLOPs: 11.93 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 13000 | lm loss value: 4.627243E+00 | lm loss PPL: 1.022318E+02 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 13000 to checkpoints_14m91b100m 0: [2023-03-17 00:36:12,081] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step13000 is begin to save! 0: [2023-03-17 00:36:12,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:36:12,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:36:12,122] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:36:12,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:36:12,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:36:12,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:36:12,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:36:12,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:36:12,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:36:12,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:36:12,133] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:36:12,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:36:12,135] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step13000/mp_rank_00_model_states.pt 0: [2023-03-17 00:36:12,135] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:36:12,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:36:12,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:36:12,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 00:36:12,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2023-03-17 00:36:12,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:36:12,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2023-03-17 00:36:12,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:36:12,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: successfully saved checkpoint at iteration 13000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 90.94 7: iteration 13010/ 173500 | consumed samples: 3330560 | consumed tokens: 6820986880 | elapsed time per iteration (s): 0.09 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.761287E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.126 | TFLOPs: 10.19 | 7: iteration 13020/ 173500 | consumed samples: 3333120 | consumed tokens: 6826229760 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.772222E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.480 | TFLOPs: 11.61 | 7: iteration 13030/ 173500 | consumed samples: 3335680 | consumed tokens: 6831472640 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.767220E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.723 | TFLOPs: 11.86 | 7: iteration 13040/ 173500 | consumed samples: 3338240 | consumed tokens: 6836715520 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.766763E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.577 | TFLOPs: 11.95 | 7: iteration 13050/ 173500 | consumed samples: 3340800 | consumed tokens: 6841958400 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.769848E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.554 | TFLOPs: 11.94 | 7: iteration 13060/ 173500 | consumed samples: 3343360 | consumed tokens: 6847201280 | elapsed time per iteration (s): 0.11 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.765887E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.906 | TFLOPs: 8.41 | 7: iteration 13070/ 173500 | consumed samples: 3345920 | consumed tokens: 6852444160 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.766986E+00 | grad norm: 0.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.214 | TFLOPs: 11.78 | 7: iteration 13080/ 173500 | consumed samples: 3348480 | consumed tokens: 6857687040 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.776929E+00 | grad norm: 1.046 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.849 | TFLOPs: 11.91 | 7: iteration 13090/ 173500 | consumed samples: 3351040 | consumed tokens: 6862929920 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.770456E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.180 | TFLOPs: 12.00 | 7: iteration 13100/ 173500 | consumed samples: 3353600 | consumed tokens: 6868172800 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.753607E+00 | grad norm: 0.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.555 | TFLOPs: 12.03 | 7: iteration 13110/ 173500 | consumed samples: 3356160 | consumed tokens: 6873415680 | elapsed time per iteration (s): 0.11 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.760305E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2254.144 | TFLOPs: 8.38 | 7: iteration 13120/ 173500 | consumed samples: 3358720 | consumed tokens: 6878658560 | elapsed time per iteration (s): 0.09 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.768616E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2779.828 | TFLOPs: 10.34 | 7: iteration 13130/ 173500 | consumed samples: 3361280 | consumed tokens: 6883901440 | elapsed time per iteration (s): 0.08 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 4.763352E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.997 | TFLOPs: 11.81 | 7: iteration 13140/ 173500 | consumed samples: 3363840 | consumed tokens: 6889144320 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.771342E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.465 | TFLOPs: 11.95 | 7: iteration 13150/ 173500 | consumed samples: 3366400 | consumed tokens: 6894387200 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.765837E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.573 | TFLOPs: 12.01 | 7: iteration 13160/ 173500 | consumed samples: 3368960 | consumed tokens: 6899630080 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.754036E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.590 | TFLOPs: 12.01 | 7: iteration 13170/ 173500 | consumed samples: 3371520 | consumed tokens: 6904872960 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.761448E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.935 | TFLOPs: 11.69 | 7: iteration 13180/ 173500 | consumed samples: 3374080 | consumed tokens: 6910115840 | elapsed time per iteration (s): 0.11 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.759140E+00 | grad norm: 0.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.771 | TFLOPs: 8.95 | 7: iteration 13190/ 173500 | consumed samples: 3376640 | consumed tokens: 6915358720 | elapsed time per iteration (s): 0.09 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.763583E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.776 | TFLOPs: 10.45 | 7: iteration 13200/ 173500 | consumed samples: 3379200 | consumed tokens: 6920601600 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.762316E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.197 | TFLOPs: 11.81 | 7: iteration 13210/ 173500 | consumed samples: 3381760 | consumed tokens: 6925844480 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.754098E+00 | grad norm: 0.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.186 | TFLOPs: 11.81 | 7: iteration 13220/ 173500 | consumed samples: 3384320 | consumed tokens: 6931087360 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.767599E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.662 | TFLOPs: 11.98 | 7: iteration 13230/ 173500 | consumed samples: 3386880 | consumed tokens: 6936330240 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.763414E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.987 | TFLOPs: 12.05 | 7: iteration 13240/ 173500 | consumed samples: 3389440 | consumed tokens: 6941573120 | elapsed time per iteration (s): 0.09 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.768373E+00 | grad norm: 0.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.454 | TFLOPs: 10.23 | 7: iteration 13250/ 173500 | consumed samples: 3392000 | consumed tokens: 6946816000 | elapsed time per iteration (s): 0.11 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.768356E+00 | grad norm: 0.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.298 | TFLOPs: 8.82 | 7: iteration 13260/ 173500 | consumed samples: 3394560 | consumed tokens: 6952058880 | elapsed time per iteration (s): 0.12 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.759805E+00 | grad norm: 0.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2077.979 | TFLOPs: 7.73 | 7: iteration 13270/ 173500 | consumed samples: 3397120 | consumed tokens: 6957301760 | elapsed time per iteration (s): 0.12 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.762930E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2148.604 | TFLOPs: 7.99 | 7: iteration 13280/ 173500 | consumed samples: 3399680 | consumed tokens: 6962544640 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.758837E+00 | grad norm: 0.746 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.625 | TFLOPs: 11.37 | 7: iteration 13290/ 173500 | consumed samples: 3402240 | consumed tokens: 6967787520 | elapsed time per iteration (s): 0.09 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.761433E+00 | grad norm: 0.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.527 | TFLOPs: 11.03 | 7: iteration 13300/ 173500 | consumed samples: 3404800 | consumed tokens: 6973030400 | elapsed time per iteration (s): 0.13 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.759159E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1971.708 | TFLOPs: 7.33 | 7: iteration 13310/ 173500 | consumed samples: 3407360 | consumed tokens: 6978273280 | elapsed time per iteration (s): 0.11 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.761150E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.724 | TFLOPs: 8.72 | 7: iteration 13320/ 173500 | consumed samples: 3409920 | consumed tokens: 6983516160 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.760709E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.960 | TFLOPs: 11.57 | 7: iteration 13330/ 173500 | consumed samples: 3412480 | consumed tokens: 6988759040 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.747460E+00 | grad norm: 0.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.468 | TFLOPs: 11.76 | 7: iteration 13340/ 173500 | consumed samples: 3415040 | consumed tokens: 6994001920 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.750178E+00 | grad norm: 0.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.580 | TFLOPs: 11.56 | 7: iteration 13350/ 173500 | consumed samples: 3417600 | consumed tokens: 6999244800 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.770979E+00 | grad norm: 0.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.324 | TFLOPs: 11.99 | 7: iteration 13360/ 173500 | consumed samples: 3420160 | consumed tokens: 7004487680 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.761013E+00 | grad norm: 0.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.319 | TFLOPs: 11.95 | 7: iteration 13370/ 173500 | consumed samples: 3422720 | consumed tokens: 7009730560 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.751106E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.039 | TFLOPs: 11.90 | 7: iteration 13380/ 173500 | consumed samples: 3425280 | consumed tokens: 7014973440 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.759778E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.146 | TFLOPs: 11.97 | 7: iteration 13390/ 173500 | consumed samples: 3427840 | consumed tokens: 7020216320 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.765360E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.626 | TFLOPs: 12.02 | 7: iteration 13400/ 173500 | consumed samples: 3430400 | consumed tokens: 7025459200 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.756904E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.739 | TFLOPs: 12.01 | 7: iteration 13410/ 173500 | consumed samples: 3432960 | consumed tokens: 7030702080 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.767630E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.981 | TFLOPs: 11.68 | 7: iteration 13420/ 173500 | consumed samples: 3435520 | consumed tokens: 7035944960 | elapsed time per iteration (s): 0.08 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 4.759426E+00 | grad norm: 0.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.984 | TFLOPs: 11.88 | 7: iteration 13430/ 173500 | consumed samples: 3438080 | consumed tokens: 7041187840 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.763782E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.216 | TFLOPs: 11.72 | 7: iteration 13440/ 173500 | consumed samples: 3440640 | consumed tokens: 7046430720 | elapsed time per iteration (s): 0.11 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.763581E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2368.719 | TFLOPs: 8.81 | 7: iteration 13450/ 173500 | consumed samples: 3443200 | consumed tokens: 7051673600 | elapsed time per iteration (s): 0.12 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.767297E+00 | grad norm: 0.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.168 | TFLOPs: 8.27 | 7: iteration 13460/ 173500 | consumed samples: 3445760 | consumed tokens: 7056916480 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.755783E+00 | grad norm: 0.729 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.693 | TFLOPs: 11.46 | 7: iteration 13470/ 173500 | consumed samples: 3448320 | consumed tokens: 7062159360 | elapsed time per iteration (s): 0.10 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.765672E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2537.323 | TFLOPs: 9.44 | 7: iteration 13480/ 173500 | consumed samples: 3450880 | consumed tokens: 7067402240 | elapsed time per iteration (s): 0.11 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.762156E+00 | grad norm: 0.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.527 | TFLOPs: 8.57 | 7: iteration 13490/ 173500 | consumed samples: 3453440 | consumed tokens: 7072645120 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.754420E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.545 | TFLOPs: 11.84 | 7: iteration 13500/ 173500 | consumed samples: 3456000 | consumed tokens: 7077888000 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.763424E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.069 | TFLOPs: 11.57 | 7: iteration 13510/ 173500 | consumed samples: 3458560 | consumed tokens: 7083130880 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.756254E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.414 | TFLOPs: 11.45 | 7: iteration 13520/ 173500 | consumed samples: 3461120 | consumed tokens: 7088373760 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.754816E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.283 | TFLOPs: 12.02 | 7: iteration 13530/ 173500 | consumed samples: 3463680 | consumed tokens: 7093616640 | elapsed time per iteration (s): 0.09 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.758396E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.026 | TFLOPs: 10.80 | 7: iteration 13540/ 173500 | consumed samples: 3466240 | consumed tokens: 7098859520 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.765614E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.658 | TFLOPs: 11.94 | 7: iteration 13550/ 173500 | consumed samples: 3468800 | consumed tokens: 7104102400 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.756968E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.915 | TFLOPs: 11.99 | 7: iteration 13560/ 173500 | consumed samples: 3471360 | consumed tokens: 7109345280 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.755197E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.546 | TFLOPs: 11.42 | 7: iteration 13570/ 173500 | consumed samples: 3473920 | consumed tokens: 7114588160 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.762689E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.808 | TFLOPs: 11.48 | 7: iteration 13580/ 173500 | consumed samples: 3476480 | consumed tokens: 7119831040 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.739294E+00 | grad norm: 0.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.528 | TFLOPs: 11.99 | 7: iteration 13590/ 173500 | consumed samples: 3479040 | consumed tokens: 7125073920 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.756072E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.459 | TFLOPs: 11.96 | 7: iteration 13600/ 173500 | consumed samples: 3481600 | consumed tokens: 7130316800 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.756903E+00 | grad norm: 0.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.148 | TFLOPs: 11.45 | 7: iteration 13610/ 173500 | consumed samples: 3484160 | consumed tokens: 7135559680 | elapsed time per iteration (s): 0.09 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.764144E+00 | grad norm: 0.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.512 | TFLOPs: 11.20 | 7: iteration 13620/ 173500 | consumed samples: 3486720 | consumed tokens: 7140802560 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.760791E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.610 | TFLOPs: 11.38 | 7: iteration 13630/ 173500 | consumed samples: 3489280 | consumed tokens: 7146045440 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.762386E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.718 | TFLOPs: 11.69 | 7: iteration 13640/ 173500 | consumed samples: 3491840 | consumed tokens: 7151288320 | elapsed time per iteration (s): 0.09 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.770115E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.693 | TFLOPs: 11.19 | 7: iteration 13650/ 173500 | consumed samples: 3494400 | consumed tokens: 7156531200 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.741408E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.936 | TFLOPs: 11.68 | 7: iteration 13660/ 173500 | consumed samples: 3496960 | consumed tokens: 7161774080 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.769771E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.231 | TFLOPs: 11.41 | 7: iteration 13670/ 173500 | consumed samples: 3499520 | consumed tokens: 7167016960 | elapsed time per iteration (s): 0.08 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.749472E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.943 | TFLOPs: 11.97 | 7: iteration 13680/ 173500 | consumed samples: 3502080 | consumed tokens: 7172259840 | elapsed time per iteration (s): 0.09 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.752819E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.635 | TFLOPs: 11.13 | 7: iteration 13690/ 173500 | consumed samples: 3504640 | consumed tokens: 7177502720 | elapsed time per iteration (s): 0.11 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.749081E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2233.977 | TFLOPs: 8.31 | 7: iteration 13700/ 173500 | consumed samples: 3507200 | consumed tokens: 7182745600 | elapsed time per iteration (s): 0.11 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 4.746816E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2427.263 | TFLOPs: 9.03 | 7: iteration 13710/ 173500 | consumed samples: 3509760 | consumed tokens: 7187988480 | elapsed time per iteration (s): 0.10 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.778023E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2453.562 | TFLOPs: 9.13 | 7: iteration 13720/ 173500 | consumed samples: 3512320 | consumed tokens: 7193231360 | elapsed time per iteration (s): 0.09 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.764134E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.938 | TFLOPs: 11.00 | 7: iteration 13730/ 173500 | consumed samples: 3514880 | consumed tokens: 7198474240 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.749463E+00 | grad norm: 0.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.230 | TFLOPs: 11.66 | 7: iteration 13740/ 173500 | consumed samples: 3517440 | consumed tokens: 7203717120 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.771311E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.810 | TFLOPs: 11.89 | 7: iteration 13750/ 173500 | consumed samples: 3520000 | consumed tokens: 7208960000 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.755556E+00 | grad norm: 0.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.496 | TFLOPs: 11.87 | 7: iteration 13760/ 173500 | consumed samples: 3522560 | consumed tokens: 7214202880 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.739652E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.093 | TFLOPs: 11.29 | 7: iteration 13770/ 173500 | consumed samples: 3525120 | consumed tokens: 7219445760 | elapsed time per iteration (s): 0.12 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.748878E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2161.353 | TFLOPs: 8.04 | 7: iteration 13780/ 173500 | consumed samples: 3527680 | consumed tokens: 7224688640 | elapsed time per iteration (s): 0.11 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.756959E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2292.554 | TFLOPs: 8.53 | 7: iteration 13790/ 173500 | consumed samples: 3530240 | consumed tokens: 7229931520 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.762569E+00 | grad norm: 0.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.992 | TFLOPs: 11.84 | 7: iteration 13800/ 173500 | consumed samples: 3532800 | consumed tokens: 7235174400 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.758022E+00 | grad norm: 0.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.044 | TFLOPs: 11.90 | 7: iteration 13810/ 173500 | consumed samples: 3535360 | consumed tokens: 7240417280 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.764622E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.138 | TFLOPs: 11.84 | 7: iteration 13820/ 173500 | consumed samples: 3537920 | consumed tokens: 7245660160 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.748413E+00 | grad norm: 0.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.880 | TFLOPs: 11.62 | 7: iteration 13830/ 173500 | consumed samples: 3540480 | consumed tokens: 7250903040 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.762844E+00 | grad norm: 0.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.835 | TFLOPs: 11.92 | 7: iteration 13840/ 173500 | consumed samples: 3543040 | consumed tokens: 7256145920 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.752657E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.906 | TFLOPs: 11.89 | 7: iteration 13850/ 173500 | consumed samples: 3545600 | consumed tokens: 7261388800 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.761978E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.139 | TFLOPs: 11.85 | 7: iteration 13860/ 173500 | consumed samples: 3548160 | consumed tokens: 7266631680 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.764756E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.364 | TFLOPs: 11.84 | 7: iteration 13870/ 173500 | consumed samples: 3550720 | consumed tokens: 7271874560 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.762138E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.607 | TFLOPs: 11.96 | 7: iteration 13880/ 173500 | consumed samples: 3553280 | consumed tokens: 7277117440 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.746869E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.521 | TFLOPs: 11.97 | 7: iteration 13890/ 173500 | consumed samples: 3555840 | consumed tokens: 7282360320 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.752684E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.011 | TFLOPs: 11.97 | 7: iteration 13900/ 173500 | consumed samples: 3558400 | consumed tokens: 7287603200 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.750060E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.181 | TFLOPs: 11.96 | 7: iteration 13910/ 173500 | consumed samples: 3560960 | consumed tokens: 7292846080 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.756639E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.334 | TFLOPs: 11.98 | 7: iteration 13920/ 173500 | consumed samples: 3563520 | consumed tokens: 7298088960 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.746028E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.866 | TFLOPs: 11.96 | 7: iteration 13930/ 173500 | consumed samples: 3566080 | consumed tokens: 7303331840 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.762907E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.304 | TFLOPs: 11.66 | 7: iteration 13940/ 173500 | consumed samples: 3568640 | consumed tokens: 7308574720 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.750048E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.398 | TFLOPs: 11.96 | 7: iteration 13950/ 173500 | consumed samples: 3571200 | consumed tokens: 7313817600 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.751100E+00 | grad norm: 0.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.337 | TFLOPs: 11.45 | 7: iteration 13960/ 173500 | consumed samples: 3573760 | consumed tokens: 7319060480 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.756840E+00 | grad norm: 0.775 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.940 | TFLOPs: 11.95 | 7: iteration 13970/ 173500 | consumed samples: 3576320 | consumed tokens: 7324303360 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.758713E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.197 | TFLOPs: 11.93 | 7: iteration 13980/ 173500 | consumed samples: 3578880 | consumed tokens: 7329546240 | elapsed time per iteration (s): 0.08 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 4.756603E+00 | grad norm: 0.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.387 | TFLOPs: 11.91 | 7: iteration 13990/ 173500 | consumed samples: 3581440 | consumed tokens: 7334789120 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.761699E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.042 | TFLOPs: 11.84 | 0: [2023-03-17 00:37:38,907] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=0, lr=[0.00019774496681175836, 0.00019774496681175836, 0.00019774496681175836], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 14000/ 173500 | consumed samples: 3584000 | consumed tokens: 7340032000 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.749470E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.164 | TFLOPs: 11.96 | 0: steps: 14000 loss: 4.7388 iter time (s): 0.086 samples/sec: 2977.902 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 14000 | lm loss value: 4.587405E+00 | lm loss PPL: 9.823919E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 14000 to checkpoints_14m91b100m 0: [2023-03-17 00:37:38,965] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step14000 is begin to save! 0: [2023-03-17 00:37:38,968] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:37:38,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:37:38,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:37:38,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:37:38,996] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:37:39,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:37:39,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:37:39,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:37:39,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:37:39,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:37:39,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:37:39,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:37:39,007] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step14000/mp_rank_00_model_states.pt 0: [2023-03-17 00:37:39,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:37:39,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,026] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:37:39,026] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:37:39,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:37:39,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 00:37:39,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2023-03-17 00:37:39,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: successfully saved checkpoint at iteration 14000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.97 7: iteration 14010/ 173500 | consumed samples: 3586560 | consumed tokens: 7345274880 | elapsed time per iteration (s): 0.09 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.757590E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.008 | TFLOPs: 10.23 | 7: iteration 14020/ 173500 | consumed samples: 3589120 | consumed tokens: 7350517760 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.760726E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.235 | TFLOPs: 12.02 | 7: iteration 14030/ 173500 | consumed samples: 3591680 | consumed tokens: 7355760640 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.751685E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.235 | TFLOPs: 11.74 | 7: iteration 14040/ 173500 | consumed samples: 3594240 | consumed tokens: 7361003520 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.759177E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.383 | TFLOPs: 12.02 | 7: iteration 14050/ 173500 | consumed samples: 3596800 | consumed tokens: 7366246400 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.757157E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.663 | TFLOPs: 12.02 | 7: iteration 14060/ 173500 | consumed samples: 3599360 | consumed tokens: 7371489280 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.761547E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.672 | TFLOPs: 12.02 | 7: iteration 14070/ 173500 | consumed samples: 3601920 | consumed tokens: 7376732160 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.740977E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.694 | TFLOPs: 11.93 | 7: iteration 14080/ 173500 | consumed samples: 3604480 | consumed tokens: 7381975040 | elapsed time per iteration (s): 0.12 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.748244E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2104.304 | TFLOPs: 7.83 | 7: iteration 14090/ 173500 | consumed samples: 3607040 | consumed tokens: 7387217920 | elapsed time per iteration (s): 0.13 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.749965E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2026.868 | TFLOPs: 7.54 | 7: iteration 14100/ 173500 | consumed samples: 3609600 | consumed tokens: 7392460800 | elapsed time per iteration (s): 0.10 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.760513E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.466 | TFLOPs: 9.34 | 7: iteration 14110/ 173500 | consumed samples: 3612160 | consumed tokens: 7397703680 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.764656E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.400 | TFLOPs: 11.74 | 7: iteration 14120/ 173500 | consumed samples: 3614720 | consumed tokens: 7402946560 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.755806E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3250.953 | TFLOPs: 12.09 | 7: iteration 14130/ 173500 | consumed samples: 3617280 | consumed tokens: 7408189440 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.754651E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.204 | TFLOPs: 12.10 | 7: iteration 14140/ 173500 | consumed samples: 3619840 | consumed tokens: 7413432320 | elapsed time per iteration (s): 0.10 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.757139E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2653.077 | TFLOPs: 9.87 | 7: iteration 14150/ 173500 | consumed samples: 3622400 | consumed tokens: 7418675200 | elapsed time per iteration (s): 0.09 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.743660E+00 | grad norm: 0.757 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.639 | TFLOPs: 10.89 | 7: iteration 14160/ 173500 | consumed samples: 3624960 | consumed tokens: 7423918080 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.752664E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3258.091 | TFLOPs: 12.12 | 7: iteration 14170/ 173500 | consumed samples: 3627520 | consumed tokens: 7429160960 | elapsed time per iteration (s): 0.09 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.762765E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.131 | TFLOPs: 11.16 | 7: iteration 14180/ 173500 | consumed samples: 3630080 | consumed tokens: 7434403840 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.757290E+00 | grad norm: 0.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.496 | TFLOPs: 12.11 | 7: iteration 14190/ 173500 | consumed samples: 3632640 | consumed tokens: 7439646720 | elapsed time per iteration (s): 0.11 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.752725E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.878 | TFLOPs: 8.35 | 7: iteration 14200/ 173500 | consumed samples: 3635200 | consumed tokens: 7444889600 | elapsed time per iteration (s): 0.13 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.747410E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.264 | TFLOPs: 7.32 | 7: iteration 14210/ 173500 | consumed samples: 3637760 | consumed tokens: 7450132480 | elapsed time per iteration (s): 0.13 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.755501E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.912 | TFLOPs: 7.37 | 7: iteration 14220/ 173500 | consumed samples: 3640320 | consumed tokens: 7455375360 | elapsed time per iteration (s): 0.12 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.751136E+00 | grad norm: 0.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2073.148 | TFLOPs: 7.71 | 7: iteration 14230/ 173500 | consumed samples: 3642880 | consumed tokens: 7460618240 | elapsed time per iteration (s): 0.13 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.744727E+00 | grad norm: 0.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1997.976 | TFLOPs: 7.43 | 7: iteration 14240/ 173500 | consumed samples: 3645440 | consumed tokens: 7465861120 | elapsed time per iteration (s): 0.09 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.749223E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.282 | TFLOPs: 10.51 | 7: iteration 14250/ 173500 | consumed samples: 3648000 | consumed tokens: 7471104000 | elapsed time per iteration (s): 0.08 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 4.763768E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.790 | TFLOPs: 11.85 | 7: iteration 14260/ 173500 | consumed samples: 3650560 | consumed tokens: 7476346880 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.740718E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.764 | TFLOPs: 11.82 | 7: iteration 14270/ 173500 | consumed samples: 3653120 | consumed tokens: 7481589760 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.748497E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.458 | TFLOPs: 11.81 | 7: iteration 14280/ 173500 | consumed samples: 3655680 | consumed tokens: 7486832640 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.742035E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.759 | TFLOPs: 11.85 | 7: iteration 14290/ 173500 | consumed samples: 3658240 | consumed tokens: 7492075520 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.760392E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.277 | TFLOPs: 11.79 | 7: iteration 14300/ 173500 | consumed samples: 3660800 | consumed tokens: 7497318400 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.743563E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.761 | TFLOPs: 11.56 | 7: iteration 14310/ 173500 | consumed samples: 3663360 | consumed tokens: 7502561280 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.768419E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.728 | TFLOPs: 11.56 | 7: iteration 14320/ 173500 | consumed samples: 3665920 | consumed tokens: 7507804160 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.738164E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.568 | TFLOPs: 11.36 | 7: iteration 14330/ 173500 | consumed samples: 3668480 | consumed tokens: 7513047040 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.755018E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.846 | TFLOPs: 11.95 | 7: iteration 14340/ 173500 | consumed samples: 3671040 | consumed tokens: 7518289920 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.756335E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.397 | TFLOPs: 11.68 | 7: iteration 14350/ 173500 | consumed samples: 3673600 | consumed tokens: 7523532800 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.757645E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.183 | TFLOPs: 11.71 | 7: iteration 14360/ 173500 | consumed samples: 3676160 | consumed tokens: 7528775680 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.751981E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.476 | TFLOPs: 11.66 | 7: iteration 14370/ 173500 | consumed samples: 3678720 | consumed tokens: 7534018560 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.752565E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.879 | TFLOPs: 11.85 | 7: iteration 14380/ 173500 | consumed samples: 3681280 | consumed tokens: 7539261440 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.747494E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.450 | TFLOPs: 11.61 | 7: iteration 14390/ 173500 | consumed samples: 3683840 | consumed tokens: 7544504320 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.751318E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.032 | TFLOPs: 11.63 | 7: iteration 14400/ 173500 | consumed samples: 3686400 | consumed tokens: 7549747200 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.745687E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.965 | TFLOPs: 11.51 | 7: iteration 14410/ 173500 | consumed samples: 3688960 | consumed tokens: 7554990080 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.751892E+00 | grad norm: 0.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.084 | TFLOPs: 11.30 | 7: iteration 14420/ 173500 | consumed samples: 3691520 | consumed tokens: 7560232960 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.760635E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.882 | TFLOPs: 11.71 | 7: iteration 14430/ 173500 | consumed samples: 3694080 | consumed tokens: 7565475840 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.747545E+00 | grad norm: 0.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.980 | TFLOPs: 11.64 | 7: iteration 14440/ 173500 | consumed samples: 3696640 | consumed tokens: 7570718720 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.744377E+00 | grad norm: 0.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.355 | TFLOPs: 11.89 | 7: iteration 14450/ 173500 | consumed samples: 3699200 | consumed tokens: 7575961600 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.735179E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.460 | TFLOPs: 11.94 | 7: iteration 14460/ 173500 | consumed samples: 3701760 | consumed tokens: 7581204480 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.759315E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.230 | TFLOPs: 11.93 | 7: iteration 14470/ 173500 | consumed samples: 3704320 | consumed tokens: 7586447360 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.760376E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.972 | TFLOPs: 11.90 | 7: iteration 14480/ 173500 | consumed samples: 3706880 | consumed tokens: 7591690240 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.743141E+00 | grad norm: 0.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.371 | TFLOPs: 11.93 | 7: iteration 14490/ 173500 | consumed samples: 3709440 | consumed tokens: 7596933120 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.749090E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.334 | TFLOPs: 11.95 | 7: iteration 14500/ 173500 | consumed samples: 3712000 | consumed tokens: 7602176000 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.747586E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.567 | TFLOPs: 11.93 | 7: iteration 14510/ 173500 | consumed samples: 3714560 | consumed tokens: 7607418880 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.759100E+00 | grad norm: 0.697 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.516 | TFLOPs: 11.97 | 7: iteration 14520/ 173500 | consumed samples: 3717120 | consumed tokens: 7612661760 | elapsed time per iteration (s): 0.08 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 4.754360E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.648 | TFLOPs: 11.92 | 7: iteration 14530/ 173500 | consumed samples: 3719680 | consumed tokens: 7617904640 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.752618E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.025 | TFLOPs: 11.34 | 7: iteration 14540/ 173500 | consumed samples: 3722240 | consumed tokens: 7623147520 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.740148E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.866 | TFLOPs: 11.52 | 7: iteration 14550/ 173500 | consumed samples: 3724800 | consumed tokens: 7628390400 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.751526E+00 | grad norm: 0.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.263 | TFLOPs: 11.51 | 7: iteration 14560/ 173500 | consumed samples: 3727360 | consumed tokens: 7633633280 | elapsed time per iteration (s): 0.09 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.755767E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2927.883 | TFLOPs: 10.89 | 7: iteration 14570/ 173500 | consumed samples: 3729920 | consumed tokens: 7638876160 | elapsed time per iteration (s): 0.09 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.769162E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.877 | TFLOPs: 10.41 | 7: iteration 14580/ 173500 | consumed samples: 3732480 | consumed tokens: 7644119040 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.747457E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.616 | TFLOPs: 11.23 | 7: iteration 14590/ 173500 | consumed samples: 3735040 | consumed tokens: 7649361920 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.770530E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.568 | TFLOPs: 11.89 | 7: iteration 14600/ 173500 | consumed samples: 3737600 | consumed tokens: 7654604800 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.755480E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.078 | TFLOPs: 11.52 | 7: iteration 14610/ 173500 | consumed samples: 3740160 | consumed tokens: 7659847680 | elapsed time per iteration (s): 0.10 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.752576E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2598.394 | TFLOPs: 9.66 | 7: iteration 14620/ 173500 | consumed samples: 3742720 | consumed tokens: 7665090560 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.744603E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.074 | TFLOPs: 11.55 | 7: iteration 14630/ 173500 | consumed samples: 3745280 | consumed tokens: 7670333440 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.757789E+00 | grad norm: 0.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.653 | TFLOPs: 11.79 | 7: iteration 14640/ 173500 | consumed samples: 3747840 | consumed tokens: 7675576320 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.746953E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.813 | TFLOPs: 11.52 | 7: iteration 14650/ 173500 | consumed samples: 3750400 | consumed tokens: 7680819200 | elapsed time per iteration (s): 0.10 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.763450E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.737 | TFLOPs: 9.16 | 7: iteration 14660/ 173500 | consumed samples: 3752960 | consumed tokens: 7686062080 | elapsed time per iteration (s): 0.12 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.753226E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2110.063 | TFLOPs: 7.85 | 7: iteration 14670/ 173500 | consumed samples: 3755520 | consumed tokens: 7691304960 | elapsed time per iteration (s): 0.11 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.746346E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2241.666 | TFLOPs: 8.34 | 7: iteration 14680/ 173500 | consumed samples: 3758080 | consumed tokens: 7696547840 | elapsed time per iteration (s): 0.10 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.758167E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.722 | TFLOPs: 9.16 | 7: iteration 14690/ 173500 | consumed samples: 3760640 | consumed tokens: 7701790720 | elapsed time per iteration (s): 0.12 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.745569E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2198.313 | TFLOPs: 8.18 | 7: iteration 14700/ 173500 | consumed samples: 3763200 | consumed tokens: 7707033600 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.754549E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.130 | TFLOPs: 11.28 | 7: iteration 14710/ 173500 | consumed samples: 3765760 | consumed tokens: 7712276480 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.750676E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.074 | TFLOPs: 11.87 | 7: iteration 14720/ 173500 | consumed samples: 3768320 | consumed tokens: 7717519360 | elapsed time per iteration (s): 0.09 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.752990E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2859.533 | TFLOPs: 10.64 | 7: iteration 14730/ 173500 | consumed samples: 3770880 | consumed tokens: 7722762240 | elapsed time per iteration (s): 0.11 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.754752E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.580 | TFLOPs: 8.44 | 7: iteration 14740/ 173500 | consumed samples: 3773440 | consumed tokens: 7728005120 | elapsed time per iteration (s): 0.13 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.747533E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.773 | TFLOPs: 7.42 | 7: iteration 14750/ 173500 | consumed samples: 3776000 | consumed tokens: 7733248000 | elapsed time per iteration (s): 0.13 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.741353E+00 | grad norm: 0.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.148 | TFLOPs: 7.46 | 7: iteration 14760/ 173500 | consumed samples: 3778560 | consumed tokens: 7738490880 | elapsed time per iteration (s): 0.13 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.750651E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2000.086 | TFLOPs: 7.44 | 7: iteration 14770/ 173500 | consumed samples: 3781120 | consumed tokens: 7743733760 | elapsed time per iteration (s): 0.11 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.748528E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2311.162 | TFLOPs: 8.60 | 7: iteration 14780/ 173500 | consumed samples: 3783680 | consumed tokens: 7748976640 | elapsed time per iteration (s): 0.08 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 4.744160E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.263 | TFLOPs: 11.86 | 7: iteration 14790/ 173500 | consumed samples: 3786240 | consumed tokens: 7754219520 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.743652E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.290 | TFLOPs: 11.77 | 7: iteration 14800/ 173500 | consumed samples: 3788800 | consumed tokens: 7759462400 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.759208E+00 | grad norm: 0.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.263 | TFLOPs: 11.72 | 7: iteration 14810/ 173500 | consumed samples: 3791360 | consumed tokens: 7764705280 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.741268E+00 | grad norm: 0.763 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.910 | TFLOPs: 11.82 | 7: iteration 14820/ 173500 | consumed samples: 3793920 | consumed tokens: 7769948160 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.760269E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.479 | TFLOPs: 11.79 | 7: iteration 14830/ 173500 | consumed samples: 3796480 | consumed tokens: 7775191040 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.745154E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.942 | TFLOPs: 11.56 | 7: iteration 14840/ 173500 | consumed samples: 3799040 | consumed tokens: 7780433920 | elapsed time per iteration (s): 0.09 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.750661E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.985 | TFLOPs: 10.71 | 7: iteration 14850/ 173500 | consumed samples: 3801600 | consumed tokens: 7785676800 | elapsed time per iteration (s): 0.10 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.757359E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2449.686 | TFLOPs: 9.11 | 7: iteration 14860/ 173500 | consumed samples: 3804160 | consumed tokens: 7790919680 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.755846E+00 | grad norm: 0.753 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.793 | TFLOPs: 11.83 | 7: iteration 14870/ 173500 | consumed samples: 3806720 | consumed tokens: 7796162560 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.738400E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.682 | TFLOPs: 11.25 | 7: iteration 14880/ 173500 | consumed samples: 3809280 | consumed tokens: 7801405440 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.750445E+00 | grad norm: 0.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.537 | TFLOPs: 11.79 | 7: iteration 14890/ 173500 | consumed samples: 3811840 | consumed tokens: 7806648320 | elapsed time per iteration (s): 0.09 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.754737E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2818.576 | TFLOPs: 10.48 | 7: iteration 14900/ 173500 | consumed samples: 3814400 | consumed tokens: 7811891200 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.762193E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.146 | TFLOPs: 11.49 | 7: iteration 14910/ 173500 | consumed samples: 3816960 | consumed tokens: 7817134080 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.757810E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.564 | TFLOPs: 11.80 | 7: iteration 14920/ 173500 | consumed samples: 3819520 | consumed tokens: 7822376960 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.751470E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.691 | TFLOPs: 11.51 | 7: iteration 14930/ 173500 | consumed samples: 3822080 | consumed tokens: 7827619840 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.739462E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.846 | TFLOPs: 11.79 | 7: iteration 14940/ 173500 | consumed samples: 3824640 | consumed tokens: 7832862720 | elapsed time per iteration (s): 0.09 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.752210E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.434 | TFLOPs: 10.21 | 7: iteration 14950/ 173500 | consumed samples: 3827200 | consumed tokens: 7838105600 | elapsed time per iteration (s): 0.13 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.747318E+00 | grad norm: 0.667 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.105 | TFLOPs: 7.30 | 7: iteration 14960/ 173500 | consumed samples: 3829760 | consumed tokens: 7843348480 | elapsed time per iteration (s): 0.11 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.750359E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.169 | TFLOPs: 8.34 | 7: iteration 14970/ 173500 | consumed samples: 3832320 | consumed tokens: 7848591360 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.736999E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.104 | TFLOPs: 11.80 | 7: iteration 14980/ 173500 | consumed samples: 3834880 | consumed tokens: 7853834240 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.741195E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.007 | TFLOPs: 11.74 | 7: iteration 14990/ 173500 | consumed samples: 3837440 | consumed tokens: 7859077120 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.750796E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.257 | TFLOPs: 11.55 | 7: iteration 15000/ 173500 | consumed samples: 3840000 | consumed tokens: 7864320000 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.740351E+00 | grad norm: 0.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.511 | TFLOPs: 11.80 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 15000 | lm loss value: 4.580945E+00 | lm loss PPL: 9.760659E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 15000 to checkpoints_14m91b100m 0: [2023-03-17 00:39:09,122] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step15000 is begin to save! 0: [2023-03-17 00:39:09,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:39:09,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:39:09,148] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:39:09,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:39:09,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:39:09,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:39:09,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:39:09,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:39:09,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:39:09,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:39:09,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:39:09,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:39:09,163] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step15000/mp_rank_00_model_states.pt 0: [2023-03-17 00:39:09,163] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:39:09,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:39:09,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:39:09,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:39:09,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:39:09,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:39:09,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 00:39:09,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 00:39:09,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 00:39:09,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2023-03-17 00:39:09,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2023-03-17 00:39:09,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2023-03-17 00:39:09,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:39:09,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:39:09,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: successfully saved checkpoint at iteration 15000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 76.96 7: iteration 15010/ 173500 | consumed samples: 3842560 | consumed tokens: 7869562880 | elapsed time per iteration (s): 0.09 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.758219E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.151 | TFLOPs: 10.55 | 7: iteration 15020/ 173500 | consumed samples: 3845120 | consumed tokens: 7874805760 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.745610E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.766 | TFLOPs: 11.98 | 7: iteration 15030/ 173500 | consumed samples: 3847680 | consumed tokens: 7880048640 | elapsed time per iteration (s): 0.08 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 4.749657E+00 | grad norm: 0.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.933 | TFLOPs: 12.03 | 7: iteration 15040/ 173500 | consumed samples: 3850240 | consumed tokens: 7885291520 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.748357E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.514 | TFLOPs: 11.99 | 7: iteration 15050/ 173500 | consumed samples: 3852800 | consumed tokens: 7890534400 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.737846E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.980 | TFLOPs: 11.58 | 7: iteration 15060/ 173500 | consumed samples: 3855360 | consumed tokens: 7895777280 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.747934E+00 | grad norm: 0.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.370 | TFLOPs: 11.86 | 7: iteration 15070/ 173500 | consumed samples: 3857920 | consumed tokens: 7901020160 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.742793E+00 | grad norm: 0.754 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.720 | TFLOPs: 11.59 | 7: iteration 15080/ 173500 | consumed samples: 3860480 | consumed tokens: 7906263040 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.742394E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.168 | TFLOPs: 11.89 | 7: iteration 15090/ 173500 | consumed samples: 3863040 | consumed tokens: 7911505920 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.739915E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.830 | TFLOPs: 11.92 | 7: iteration 15100/ 173500 | consumed samples: 3865600 | consumed tokens: 7916748800 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.750135E+00 | grad norm: 0.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.855 | TFLOPs: 11.95 | 7: iteration 15110/ 173500 | consumed samples: 3868160 | consumed tokens: 7921991680 | elapsed time per iteration (s): 0.09 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.749107E+00 | grad norm: 0.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.914 | TFLOPs: 10.02 | 7: iteration 15120/ 173500 | consumed samples: 3870720 | consumed tokens: 7927234560 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.744287E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2521.925 | TFLOPs: 9.38 | 7: iteration 15130/ 173500 | consumed samples: 3873280 | consumed tokens: 7932477440 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.755400E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2513.142 | TFLOPs: 9.35 | 7: iteration 15140/ 173500 | consumed samples: 3875840 | consumed tokens: 7937720320 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.738146E+00 | grad norm: 0.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.711 | TFLOPs: 9.34 | 7: iteration 15150/ 173500 | consumed samples: 3878400 | consumed tokens: 7942963200 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.737745E+00 | grad norm: 0.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2648.310 | TFLOPs: 9.85 | 7: iteration 15160/ 173500 | consumed samples: 3880960 | consumed tokens: 7948206080 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.736608E+00 | grad norm: 0.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.424 | TFLOPs: 11.87 | 7: iteration 15170/ 173500 | consumed samples: 3883520 | consumed tokens: 7953448960 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.742962E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.936 | TFLOPs: 11.84 | 7: iteration 15180/ 173500 | consumed samples: 3886080 | consumed tokens: 7958691840 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.745831E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.104 | TFLOPs: 12.01 | 7: iteration 15190/ 173500 | consumed samples: 3888640 | consumed tokens: 7963934720 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.736876E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.991 | TFLOPs: 9.30 | 7: iteration 15200/ 173500 | consumed samples: 3891200 | consumed tokens: 7969177600 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.754988E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.902 | TFLOPs: 9.09 | 7: iteration 15210/ 173500 | consumed samples: 3893760 | consumed tokens: 7974420480 | elapsed time per iteration (s): 0.11 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.748780E+00 | grad norm: 0.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.807 | TFLOPs: 8.99 | 7: iteration 15220/ 173500 | consumed samples: 3896320 | consumed tokens: 7979663360 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.741672E+00 | grad norm: 0.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.911 | TFLOPs: 9.36 | 7: iteration 15230/ 173500 | consumed samples: 3898880 | consumed tokens: 7984906240 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.744168E+00 | grad norm: 0.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.272 | TFLOPs: 9.08 | 7: iteration 15240/ 173500 | consumed samples: 3901440 | consumed tokens: 7990149120 | elapsed time per iteration (s): 0.10 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.734597E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.321 | TFLOPs: 9.34 | 7: iteration 15250/ 173500 | consumed samples: 3904000 | consumed tokens: 7995392000 | elapsed time per iteration (s): 0.11 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.740105E+00 | grad norm: 0.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.444 | TFLOPs: 8.88 | 7: iteration 15260/ 173500 | consumed samples: 3906560 | consumed tokens: 8000634880 | elapsed time per iteration (s): 0.11 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.742986E+00 | grad norm: 0.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2364.911 | TFLOPs: 8.80 | 7: iteration 15270/ 173500 | consumed samples: 3909120 | consumed tokens: 8005877760 | elapsed time per iteration (s): 0.12 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.745307E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2205.172 | TFLOPs: 8.20 | 7: iteration 15280/ 173500 | consumed samples: 3911680 | consumed tokens: 8011120640 | elapsed time per iteration (s): 0.08 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 4.742244E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.343 | TFLOPs: 11.71 | 7: iteration 15290/ 173500 | consumed samples: 3914240 | consumed tokens: 8016363520 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.748118E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.063 | TFLOPs: 11.03 | 7: iteration 15300/ 173500 | consumed samples: 3916800 | consumed tokens: 8021606400 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.765044E+00 | grad norm: 0.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.120 | TFLOPs: 11.27 | 7: iteration 15310/ 173500 | consumed samples: 3919360 | consumed tokens: 8026849280 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.735865E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.243 | TFLOPs: 11.03 | 7: iteration 15320/ 173500 | consumed samples: 3921920 | consumed tokens: 8032092160 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.750204E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.336 | TFLOPs: 11.78 | 7: iteration 15330/ 173500 | consumed samples: 3924480 | consumed tokens: 8037335040 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.753801E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.363 | TFLOPs: 11.65 | 7: iteration 15340/ 173500 | consumed samples: 3927040 | consumed tokens: 8042577920 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.743189E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.768 | TFLOPs: 11.45 | 7: iteration 15350/ 173500 | consumed samples: 3929600 | consumed tokens: 8047820800 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.738363E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.255 | TFLOPs: 10.50 | 7: iteration 15360/ 173500 | consumed samples: 3932160 | consumed tokens: 8053063680 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.739207E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.465 | TFLOPs: 10.74 | 7: iteration 15370/ 173500 | consumed samples: 3934720 | consumed tokens: 8058306560 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.741211E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.672 | TFLOPs: 10.75 | 7: iteration 15380/ 173500 | consumed samples: 3937280 | consumed tokens: 8063549440 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.753078E+00 | grad norm: 0.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2855.868 | TFLOPs: 10.62 | 7: iteration 15390/ 173500 | consumed samples: 3939840 | consumed tokens: 8068792320 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.744833E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2880.325 | TFLOPs: 10.71 | 7: iteration 15400/ 173500 | consumed samples: 3942400 | consumed tokens: 8074035200 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.739835E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.114 | TFLOPs: 10.44 | 7: iteration 15410/ 173500 | consumed samples: 3944960 | consumed tokens: 8079278080 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.744566E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.816 | TFLOPs: 11.25 | 7: iteration 15420/ 173500 | consumed samples: 3947520 | consumed tokens: 8084520960 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.740546E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.849 | TFLOPs: 12.01 | 7: iteration 15430/ 173500 | consumed samples: 3950080 | consumed tokens: 8089763840 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.745856E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2939.095 | TFLOPs: 10.93 | 7: iteration 15440/ 173500 | consumed samples: 3952640 | consumed tokens: 8095006720 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.736814E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.148 | TFLOPs: 11.12 | 7: iteration 15450/ 173500 | consumed samples: 3955200 | consumed tokens: 8100249600 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.746484E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.567 | TFLOPs: 10.92 | 7: iteration 15460/ 173500 | consumed samples: 3957760 | consumed tokens: 8105492480 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.734708E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.950 | TFLOPs: 11.45 | 7: iteration 15470/ 173500 | consumed samples: 3960320 | consumed tokens: 8110735360 | elapsed time per iteration (s): 0.09 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.743320E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.876 | TFLOPs: 11.12 | 7: iteration 15480/ 173500 | consumed samples: 3962880 | consumed tokens: 8115978240 | elapsed time per iteration (s): 0.10 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.736382E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2680.070 | TFLOPs: 9.97 | 7: iteration 15490/ 173500 | consumed samples: 3965440 | consumed tokens: 8121221120 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.741647E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.318 | TFLOPs: 12.01 | 7: iteration 15500/ 173500 | consumed samples: 3968000 | consumed tokens: 8126464000 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.737567E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.293 | TFLOPs: 12.00 | 7: iteration 15510/ 173500 | consumed samples: 3970560 | consumed tokens: 8131706880 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.743465E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.921 | TFLOPs: 12.03 | 7: iteration 15520/ 173500 | consumed samples: 3973120 | consumed tokens: 8136949760 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.730503E+00 | grad norm: 0.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.343 | TFLOPs: 11.75 | 7: iteration 15530/ 173500 | consumed samples: 3975680 | consumed tokens: 8142192640 | elapsed time per iteration (s): 0.08 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 4.747773E+00 | grad norm: 0.768 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.608 | TFLOPs: 11.74 | 7: iteration 15540/ 173500 | consumed samples: 3978240 | consumed tokens: 8147435520 | elapsed time per iteration (s): 0.09 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.754347E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.369 | TFLOPs: 11.17 | 7: iteration 15550/ 173500 | consumed samples: 3980800 | consumed tokens: 8152678400 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.733447E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.136 | TFLOPs: 11.65 | 7: iteration 15560/ 173500 | consumed samples: 3983360 | consumed tokens: 8157921280 | elapsed time per iteration (s): 0.09 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.743618E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.897 | TFLOPs: 10.32 | 7: iteration 15570/ 173500 | consumed samples: 3985920 | consumed tokens: 8163164160 | elapsed time per iteration (s): 0.09 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.751380E+00 | grad norm: 0.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.102 | TFLOPs: 10.13 | 7: iteration 15580/ 173500 | consumed samples: 3988480 | consumed tokens: 8168407040 | elapsed time per iteration (s): 0.10 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.751961E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2679.196 | TFLOPs: 9.97 | 7: iteration 15590/ 173500 | consumed samples: 3991040 | consumed tokens: 8173649920 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.742470E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.530 | TFLOPs: 11.21 | 7: iteration 15600/ 173500 | consumed samples: 3993600 | consumed tokens: 8178892800 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.739124E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.170 | TFLOPs: 11.89 | 7: iteration 15610/ 173500 | consumed samples: 3996160 | consumed tokens: 8184135680 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.747517E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.903 | TFLOPs: 11.90 | 7: iteration 15620/ 173500 | consumed samples: 3998720 | consumed tokens: 8189378560 | elapsed time per iteration (s): 0.09 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.736076E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.630 | TFLOPs: 10.56 | 7: iteration 15630/ 173500 | consumed samples: 4001280 | consumed tokens: 8194621440 | elapsed time per iteration (s): 0.10 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.735423E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2609.694 | TFLOPs: 9.71 | 7: iteration 15640/ 173500 | consumed samples: 4003840 | consumed tokens: 8199864320 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.748656E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.773 | TFLOPs: 11.89 | 7: iteration 15650/ 173500 | consumed samples: 4006400 | consumed tokens: 8205107200 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.727682E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.897 | TFLOPs: 11.91 | 7: iteration 15660/ 173500 | consumed samples: 4008960 | consumed tokens: 8210350080 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.749938E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.063 | TFLOPs: 11.51 | 7: iteration 15670/ 173500 | consumed samples: 4011520 | consumed tokens: 8215592960 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.756609E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.548 | TFLOPs: 11.89 | 7: iteration 15680/ 173500 | consumed samples: 4014080 | consumed tokens: 8220835840 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.724710E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.841 | TFLOPs: 11.84 | 7: iteration 15690/ 173500 | consumed samples: 4016640 | consumed tokens: 8226078720 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.768233E+00 | grad norm: 0.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.454 | TFLOPs: 11.69 | 7: iteration 15700/ 173500 | consumed samples: 4019200 | consumed tokens: 8231321600 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.736720E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.741 | TFLOPs: 11.97 | 7: iteration 15710/ 173500 | consumed samples: 4021760 | consumed tokens: 8236564480 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.743885E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.599 | TFLOPs: 11.96 | 7: iteration 15720/ 173500 | consumed samples: 4024320 | consumed tokens: 8241807360 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.731131E+00 | grad norm: 0.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.200 | TFLOPs: 11.97 | 7: iteration 15730/ 173500 | consumed samples: 4026880 | consumed tokens: 8247050240 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.753141E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.396 | TFLOPs: 11.54 | 7: iteration 15740/ 173500 | consumed samples: 4029440 | consumed tokens: 8252293120 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.745810E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.293 | TFLOPs: 11.39 | 7: iteration 15750/ 173500 | consumed samples: 4032000 | consumed tokens: 8257536000 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.737907E+00 | grad norm: 0.616 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.193 | TFLOPs: 11.93 | 7: iteration 15760/ 173500 | consumed samples: 4034560 | consumed tokens: 8262778880 | elapsed time per iteration (s): 0.08 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.739135E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.479 | TFLOPs: 11.83 | 7: iteration 15770/ 173500 | consumed samples: 4037120 | consumed tokens: 8268021760 | elapsed time per iteration (s): 0.11 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 4.740922E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2337.372 | TFLOPs: 8.69 | 7: iteration 15780/ 173500 | consumed samples: 4039680 | consumed tokens: 8273264640 | elapsed time per iteration (s): 0.09 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.751888E+00 | grad norm: 0.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.547 | TFLOPs: 10.55 | 7: iteration 15790/ 173500 | consumed samples: 4042240 | consumed tokens: 8278507520 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.742583E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.246 | TFLOPs: 11.94 | 7: iteration 15800/ 173500 | consumed samples: 4044800 | consumed tokens: 8283750400 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.731842E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.485 | TFLOPs: 11.90 | 7: iteration 15810/ 173500 | consumed samples: 4047360 | consumed tokens: 8288993280 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.742763E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.931 | TFLOPs: 11.87 | 7: iteration 15820/ 173500 | consumed samples: 4049920 | consumed tokens: 8294236160 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.742778E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.789 | TFLOPs: 11.93 | 7: iteration 15830/ 173500 | consumed samples: 4052480 | consumed tokens: 8299479040 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.737753E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.421 | TFLOPs: 11.95 | 7: iteration 15840/ 173500 | consumed samples: 4055040 | consumed tokens: 8304721920 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.746570E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.273 | TFLOPs: 11.94 | 7: iteration 15850/ 173500 | consumed samples: 4057600 | consumed tokens: 8309964800 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.743098E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.961 | TFLOPs: 11.96 | 7: iteration 15860/ 173500 | consumed samples: 4060160 | consumed tokens: 8315207680 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.745710E+00 | grad norm: 0.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.695 | TFLOPs: 11.97 | 7: iteration 15870/ 173500 | consumed samples: 4062720 | consumed tokens: 8320450560 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.731569E+00 | grad norm: 0.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.164 | TFLOPs: 11.96 | 7: iteration 15880/ 173500 | consumed samples: 4065280 | consumed tokens: 8325693440 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.746288E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.151 | TFLOPs: 11.93 | 7: iteration 15890/ 173500 | consumed samples: 4067840 | consumed tokens: 8330936320 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.749852E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.108 | TFLOPs: 11.90 | 7: iteration 15900/ 173500 | consumed samples: 4070400 | consumed tokens: 8336179200 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.733746E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.703 | TFLOPs: 11.67 | 7: iteration 15910/ 173500 | consumed samples: 4072960 | consumed tokens: 8341422080 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.731996E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.938 | TFLOPs: 11.96 | 7: iteration 15920/ 173500 | consumed samples: 4075520 | consumed tokens: 8346664960 | elapsed time per iteration (s): 0.10 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.718642E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2664.190 | TFLOPs: 9.91 | 7: iteration 15930/ 173500 | consumed samples: 4078080 | consumed tokens: 8351907840 | elapsed time per iteration (s): 0.10 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.753304E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.804 | TFLOPs: 9.30 | 7: iteration 15940/ 173500 | consumed samples: 4080640 | consumed tokens: 8357150720 | elapsed time per iteration (s): 0.10 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.750483E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.014 | TFLOPs: 9.45 | 7: iteration 15950/ 173500 | consumed samples: 4083200 | consumed tokens: 8362393600 | elapsed time per iteration (s): 0.11 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.734244E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2397.026 | TFLOPs: 8.92 | 7: iteration 15960/ 173500 | consumed samples: 4085760 | consumed tokens: 8367636480 | elapsed time per iteration (s): 0.10 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.740753E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2693.521 | TFLOPs: 10.02 | 7: iteration 15970/ 173500 | consumed samples: 4088320 | consumed tokens: 8372879360 | elapsed time per iteration (s): 0.09 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.734717E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.495 | TFLOPs: 10.56 | 7: iteration 15980/ 173500 | consumed samples: 4090880 | consumed tokens: 8378122240 | elapsed time per iteration (s): 0.09 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.727843E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.821 | TFLOPs: 10.63 | 7: iteration 15990/ 173500 | consumed samples: 4093440 | consumed tokens: 8383365120 | elapsed time per iteration (s): 0.09 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.730696E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.209 | TFLOPs: 11.20 | 0: [2023-03-17 00:40:36,378] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=0, lr=[0.00019695408064628468, 0.00019695408064628468, 0.00019695408064628468], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 16000/ 173500 | consumed samples: 4096000 | consumed tokens: 8388608000 | elapsed time per iteration (s): 0.08 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 4.745570E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.367 | TFLOPs: 11.90 | 0: steps: 16000 loss: 4.7642 iter time (s): 0.088 samples/sec: 2924.488 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 16000 | lm loss value: 4.610323E+00 | lm loss PPL: 1.005166E+02 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 16000 to checkpoints_14m91b100m 0: [2023-03-17 00:40:36,436] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step16000 is begin to save! 0: [2023-03-17 00:40:36,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:40:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:40:36,465] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:40:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:40:36,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:40:36,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:40:36,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:40:36,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:40:36,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:40:36,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:40:36,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:40:36,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:40:36,479] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step16000/mp_rank_00_model_states.pt 0: [2023-03-17 00:40:36,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:40:36,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:40:36,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:40:36,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2023-03-17 00:40:36,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2023-03-17 00:40:36,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: successfully saved checkpoint at iteration 16000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.50 7: iteration 16010/ 173500 | consumed samples: 4098560 | consumed tokens: 8393850880 | elapsed time per iteration (s): 0.10 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.736258E+00 | grad norm: 0.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2680.268 | TFLOPs: 9.97 | 7: iteration 16020/ 173500 | consumed samples: 4101120 | consumed tokens: 8399093760 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.742518E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.186 | TFLOPs: 10.94 | 7: iteration 16030/ 173500 | consumed samples: 4103680 | consumed tokens: 8404336640 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.735631E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.400 | TFLOPs: 11.00 | 7: iteration 16040/ 173500 | consumed samples: 4106240 | consumed tokens: 8409579520 | elapsed time per iteration (s): 0.12 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.748302E+00 | grad norm: 0.693 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2195.836 | TFLOPs: 8.17 | 7: iteration 16050/ 173500 | consumed samples: 4108800 | consumed tokens: 8414822400 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.744677E+00 | grad norm: 0.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2906.339 | TFLOPs: 10.81 | 7: iteration 16060/ 173500 | consumed samples: 4111360 | consumed tokens: 8420065280 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.732918E+00 | grad norm: 0.731 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.531 | TFLOPs: 11.54 | 7: iteration 16070/ 173500 | consumed samples: 4113920 | consumed tokens: 8425308160 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.733820E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.116 | TFLOPs: 11.52 | 7: iteration 16080/ 173500 | consumed samples: 4116480 | consumed tokens: 8430551040 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.738203E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.219 | TFLOPs: 11.16 | 7: iteration 16090/ 173500 | consumed samples: 4119040 | consumed tokens: 8435793920 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.731409E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.952 | TFLOPs: 11.87 | 7: iteration 16100/ 173500 | consumed samples: 4121600 | consumed tokens: 8441036800 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.747223E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.371 | TFLOPs: 11.81 | 7: iteration 16110/ 173500 | consumed samples: 4124160 | consumed tokens: 8446279680 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.736386E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.374 | TFLOPs: 11.74 | 7: iteration 16120/ 173500 | consumed samples: 4126720 | consumed tokens: 8451522560 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.735186E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.692 | TFLOPs: 11.86 | 7: iteration 16130/ 173500 | consumed samples: 4129280 | consumed tokens: 8456765440 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.728549E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.475 | TFLOPs: 11.51 | 7: iteration 16140/ 173500 | consumed samples: 4131840 | consumed tokens: 8462008320 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.745367E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.639 | TFLOPs: 11.32 | 7: iteration 16150/ 173500 | consumed samples: 4134400 | consumed tokens: 8467251200 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.738432E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.341 | TFLOPs: 11.57 | 7: iteration 16160/ 173500 | consumed samples: 4136960 | consumed tokens: 8472494080 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.729152E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.013 | TFLOPs: 11.85 | 7: iteration 16170/ 173500 | consumed samples: 4139520 | consumed tokens: 8477736960 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.736089E+00 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.829 | TFLOPs: 11.80 | 7: iteration 16180/ 173500 | consumed samples: 4142080 | consumed tokens: 8482979840 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.735646E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.811 | TFLOPs: 11.56 | 7: iteration 16190/ 173500 | consumed samples: 4144640 | consumed tokens: 8488222720 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.731590E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.682 | TFLOPs: 11.13 | 7: iteration 16200/ 173500 | consumed samples: 4147200 | consumed tokens: 8493465600 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.728544E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.933 | TFLOPs: 11.28 | 7: iteration 16210/ 173500 | consumed samples: 4149760 | consumed tokens: 8498708480 | elapsed time per iteration (s): 0.09 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.735426E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.731 | TFLOPs: 11.06 | 7: iteration 16220/ 173500 | consumed samples: 4152320 | consumed tokens: 8503951360 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.731486E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.203 | TFLOPs: 11.59 | 7: iteration 16230/ 173500 | consumed samples: 4154880 | consumed tokens: 8509194240 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.737348E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.508 | TFLOPs: 11.83 | 7: iteration 16240/ 173500 | consumed samples: 4157440 | consumed tokens: 8514437120 | elapsed time per iteration (s): 0.08 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 4.745990E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.537 | TFLOPs: 11.28 | 7: iteration 16250/ 173500 | consumed samples: 4160000 | consumed tokens: 8519680000 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.737326E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.837 | TFLOPs: 11.31 | 7: iteration 16260/ 173500 | consumed samples: 4162560 | consumed tokens: 8524922880 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.734287E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.470 | TFLOPs: 11.54 | 7: iteration 16270/ 173500 | consumed samples: 4165120 | consumed tokens: 8530165760 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.742578E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.805 | TFLOPs: 11.35 | 7: iteration 16280/ 173500 | consumed samples: 4167680 | consumed tokens: 8535408640 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.729546E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.307 | TFLOPs: 11.25 | 7: iteration 16290/ 173500 | consumed samples: 4170240 | consumed tokens: 8540651520 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.725489E+00 | grad norm: 0.668 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.722 | TFLOPs: 11.26 | 7: iteration 16300/ 173500 | consumed samples: 4172800 | consumed tokens: 8545894400 | elapsed time per iteration (s): 0.09 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.748658E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.256 | TFLOPs: 10.89 | 7: iteration 16310/ 173500 | consumed samples: 4175360 | consumed tokens: 8551137280 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.738449E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2474.537 | TFLOPs: 9.20 | 7: iteration 16320/ 173500 | consumed samples: 4177920 | consumed tokens: 8556380160 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.738011E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.811 | TFLOPs: 9.27 | 7: iteration 16330/ 173500 | consumed samples: 4180480 | consumed tokens: 8561623040 | elapsed time per iteration (s): 0.11 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.741013E+00 | grad norm: 0.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.004 | TFLOPs: 8.82 | 7: iteration 16340/ 173500 | consumed samples: 4183040 | consumed tokens: 8566865920 | elapsed time per iteration (s): 0.11 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.742196E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2374.334 | TFLOPs: 8.83 | 7: iteration 16350/ 173500 | consumed samples: 4185600 | consumed tokens: 8572108800 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.736943E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.712 | TFLOPs: 11.82 | 7: iteration 16360/ 173500 | consumed samples: 4188160 | consumed tokens: 8577351680 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.732944E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.363 | TFLOPs: 11.86 | 7: iteration 16370/ 173500 | consumed samples: 4190720 | consumed tokens: 8582594560 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.733414E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.303 | TFLOPs: 11.56 | 7: iteration 16380/ 173500 | consumed samples: 4193280 | consumed tokens: 8587837440 | elapsed time per iteration (s): 0.08 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.741898E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.944 | TFLOPs: 11.82 | 7: iteration 16390/ 173500 | consumed samples: 4195840 | consumed tokens: 8593080320 | elapsed time per iteration (s): 0.12 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.742938E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2131.105 | TFLOPs: 7.93 | 7: iteration 16400/ 173500 | consumed samples: 4198400 | consumed tokens: 8598323200 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.747704E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.846 | TFLOPs: 9.34 | 7: iteration 16410/ 173500 | consumed samples: 4200960 | consumed tokens: 8603566080 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.731803E+00 | grad norm: 0.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2534.519 | TFLOPs: 9.43 | 7: iteration 16420/ 173500 | consumed samples: 4203520 | consumed tokens: 8608808960 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.717410E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.965 | TFLOPs: 9.66 | 7: iteration 16430/ 173500 | consumed samples: 4206080 | consumed tokens: 8614051840 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.736951E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2486.491 | TFLOPs: 9.25 | 7: iteration 16440/ 173500 | consumed samples: 4208640 | consumed tokens: 8619294720 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.727318E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.143 | TFLOPs: 9.72 | 7: iteration 16450/ 173500 | consumed samples: 4211200 | consumed tokens: 8624537600 | elapsed time per iteration (s): 0.12 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.727740E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2120.414 | TFLOPs: 7.89 | 7: iteration 16460/ 173500 | consumed samples: 4213760 | consumed tokens: 8629780480 | elapsed time per iteration (s): 0.12 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.739841E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2096.559 | TFLOPs: 7.80 | 7: iteration 16470/ 173500 | consumed samples: 4216320 | consumed tokens: 8635023360 | elapsed time per iteration (s): 0.10 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 4.734912E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.127 | TFLOPs: 9.36 | 7: iteration 16480/ 173500 | consumed samples: 4218880 | consumed tokens: 8640266240 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.737391E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.513 | TFLOPs: 11.93 | 7: iteration 16490/ 173500 | consumed samples: 4221440 | consumed tokens: 8645509120 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.729711E+00 | grad norm: 0.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.454 | TFLOPs: 11.99 | 7: iteration 16500/ 173500 | consumed samples: 4224000 | consumed tokens: 8650752000 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.742448E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.545 | TFLOPs: 11.70 | 7: iteration 16510/ 173500 | consumed samples: 4226560 | consumed tokens: 8655994880 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.729927E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.693 | TFLOPs: 11.97 | 7: iteration 16520/ 173500 | consumed samples: 4229120 | consumed tokens: 8661237760 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.730485E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.294 | TFLOPs: 12.05 | 7: iteration 16530/ 173500 | consumed samples: 4231680 | consumed tokens: 8666480640 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.733653E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.654 | TFLOPs: 11.70 | 7: iteration 16540/ 173500 | consumed samples: 4234240 | consumed tokens: 8671723520 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.727755E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.607 | TFLOPs: 12.04 | 7: iteration 16550/ 173500 | consumed samples: 4236800 | consumed tokens: 8676966400 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.730290E+00 | grad norm: 0.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.861 | TFLOPs: 11.43 | 7: iteration 16560/ 173500 | consumed samples: 4239360 | consumed tokens: 8682209280 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.735514E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.923 | TFLOPs: 11.74 | 7: iteration 16570/ 173500 | consumed samples: 4241920 | consumed tokens: 8687452160 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.742369E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.206 | TFLOPs: 11.99 | 7: iteration 16580/ 173500 | consumed samples: 4244480 | consumed tokens: 8692695040 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.743882E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.845 | TFLOPs: 11.93 | 7: iteration 16590/ 173500 | consumed samples: 4247040 | consumed tokens: 8697937920 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.735873E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.066 | TFLOPs: 11.97 | 7: iteration 16600/ 173500 | consumed samples: 4249600 | consumed tokens: 8703180800 | elapsed time per iteration (s): 0.09 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.729608E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.186 | TFLOPs: 11.20 | 7: iteration 16610/ 173500 | consumed samples: 4252160 | consumed tokens: 8708423680 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.742241E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.887 | TFLOPs: 11.83 | 7: iteration 16620/ 173500 | consumed samples: 4254720 | consumed tokens: 8713666560 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.726167E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.543 | TFLOPs: 11.98 | 7: iteration 16630/ 173500 | consumed samples: 4257280 | consumed tokens: 8718909440 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.739051E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.507 | TFLOPs: 11.96 | 7: iteration 16640/ 173500 | consumed samples: 4259840 | consumed tokens: 8724152320 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.736583E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.622 | TFLOPs: 12.01 | 7: iteration 16650/ 173500 | consumed samples: 4262400 | consumed tokens: 8729395200 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.737466E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.436 | TFLOPs: 11.92 | 7: iteration 16660/ 173500 | consumed samples: 4264960 | consumed tokens: 8734638080 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.740626E+00 | grad norm: 0.692 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.360 | TFLOPs: 11.92 | 7: iteration 16670/ 173500 | consumed samples: 4267520 | consumed tokens: 8739880960 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.736779E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.761 | TFLOPs: 11.61 | 7: iteration 16680/ 173500 | consumed samples: 4270080 | consumed tokens: 8745123840 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.734203E+00 | grad norm: 0.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.631 | TFLOPs: 11.63 | 7: iteration 16690/ 173500 | consumed samples: 4272640 | consumed tokens: 8750366720 | elapsed time per iteration (s): 0.08 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 4.724381E+00 | grad norm: 0.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.206 | TFLOPs: 11.71 | 7: iteration 16700/ 173500 | consumed samples: 4275200 | consumed tokens: 8755609600 | elapsed time per iteration (s): 0.10 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.741106E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.553 | TFLOPs: 9.30 | 7: iteration 16710/ 173500 | consumed samples: 4277760 | consumed tokens: 8760852480 | elapsed time per iteration (s): 0.09 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.723040E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2853.960 | TFLOPs: 10.62 | 7: iteration 16720/ 173500 | consumed samples: 4280320 | consumed tokens: 8766095360 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.742204E+00 | grad norm: 0.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.691 | TFLOPs: 11.40 | 7: iteration 16730/ 173500 | consumed samples: 4282880 | consumed tokens: 8771338240 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.735534E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.476 | TFLOPs: 11.38 | 7: iteration 16740/ 173500 | consumed samples: 4285440 | consumed tokens: 8776581120 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.739617E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.089 | TFLOPs: 11.66 | 7: iteration 16750/ 173500 | consumed samples: 4288000 | consumed tokens: 8781824000 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.728731E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.189 | TFLOPs: 11.55 | 7: iteration 16760/ 173500 | consumed samples: 4290560 | consumed tokens: 8787066880 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.730712E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.121 | TFLOPs: 11.65 | 7: iteration 16770/ 173500 | consumed samples: 4293120 | consumed tokens: 8792309760 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.737104E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.267 | TFLOPs: 11.78 | 7: iteration 16780/ 173500 | consumed samples: 4295680 | consumed tokens: 8797552640 | elapsed time per iteration (s): 0.09 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.736774E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.616 | TFLOPs: 11.05 | 7: iteration 16790/ 173500 | consumed samples: 4298240 | consumed tokens: 8802795520 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.738336E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.041 | TFLOPs: 11.29 | 7: iteration 16800/ 173500 | consumed samples: 4300800 | consumed tokens: 8808038400 | elapsed time per iteration (s): 0.13 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.744478E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.486 | TFLOPs: 7.39 | 7: iteration 16810/ 173500 | consumed samples: 4303360 | consumed tokens: 8813281280 | elapsed time per iteration (s): 0.14 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.732309E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1813.725 | TFLOPs: 6.75 | 7: iteration 16820/ 173500 | consumed samples: 4305920 | consumed tokens: 8818524160 | elapsed time per iteration (s): 0.12 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.722001E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2221.080 | TFLOPs: 8.26 | 7: iteration 16830/ 173500 | consumed samples: 4308480 | consumed tokens: 8823767040 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.729017E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.730 | TFLOPs: 11.83 | 7: iteration 16840/ 173500 | consumed samples: 4311040 | consumed tokens: 8829009920 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.738292E+00 | grad norm: 0.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.987 | TFLOPs: 11.49 | 7: iteration 16850/ 173500 | consumed samples: 4313600 | consumed tokens: 8834252800 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.723677E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.472 | TFLOPs: 11.21 | 7: iteration 16860/ 173500 | consumed samples: 4316160 | consumed tokens: 8839495680 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.727932E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.202 | TFLOPs: 11.85 | 7: iteration 16870/ 173500 | consumed samples: 4318720 | consumed tokens: 8844738560 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.741777E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.794 | TFLOPs: 11.48 | 7: iteration 16880/ 173500 | consumed samples: 4321280 | consumed tokens: 8849981440 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.728148E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.080 | TFLOPs: 11.60 | 7: iteration 16890/ 173500 | consumed samples: 4323840 | consumed tokens: 8855224320 | elapsed time per iteration (s): 0.09 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.729996E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.870 | TFLOPs: 10.76 | 7: iteration 16900/ 173500 | consumed samples: 4326400 | consumed tokens: 8860467200 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.731897E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.219 | TFLOPs: 11.82 | 7: iteration 16910/ 173500 | consumed samples: 4328960 | consumed tokens: 8865710080 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.732245E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.117 | TFLOPs: 11.55 | 7: iteration 16920/ 173500 | consumed samples: 4331520 | consumed tokens: 8870952960 | elapsed time per iteration (s): 0.08 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 4.728633E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.870 | TFLOPs: 11.42 | 7: iteration 16930/ 173500 | consumed samples: 4334080 | consumed tokens: 8876195840 | elapsed time per iteration (s): 0.13 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.732475E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.188 | TFLOPs: 7.26 | 7: iteration 16940/ 173500 | consumed samples: 4336640 | consumed tokens: 8881438720 | elapsed time per iteration (s): 0.13 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.727425E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1999.814 | TFLOPs: 7.44 | 7: iteration 16950/ 173500 | consumed samples: 4339200 | consumed tokens: 8886681600 | elapsed time per iteration (s): 0.13 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.740202E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.636 | TFLOPs: 7.49 | 7: iteration 16960/ 173500 | consumed samples: 4341760 | consumed tokens: 8891924480 | elapsed time per iteration (s): 0.13 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.736266E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.904 | TFLOPs: 7.58 | 7: iteration 16970/ 173500 | consumed samples: 4344320 | consumed tokens: 8897167360 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.726968E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.038 | TFLOPs: 11.93 | 7: iteration 16980/ 173500 | consumed samples: 4346880 | consumed tokens: 8902410240 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.732336E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.728 | TFLOPs: 11.86 | 7: iteration 16990/ 173500 | consumed samples: 4349440 | consumed tokens: 8907653120 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.738626E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.996 | TFLOPs: 11.93 | 7: iteration 17000/ 173500 | consumed samples: 4352000 | consumed tokens: 8912896000 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.726704E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.106 | TFLOPs: 11.92 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 17000 | lm loss value: 4.552587E+00 | lm loss PPL: 9.487750E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 17000 to checkpoints_14m91b100m 0: [2023-03-17 00:42:05,780] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step17000 is begin to save! 0: [2023-03-17 00:42:05,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:42:05,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:42:05,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:42:05,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:42:05,813] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:42:05,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:42:05,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:42:05,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:42:05,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:42:05,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:42:05,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:42:05,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:42:05,823] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step17000/mp_rank_00_model_states.pt 0: [2023-03-17 00:42:05,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:42:05,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:42:05,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:42:05,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2023-03-17 00:42:05,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:42:05,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2023-03-17 00:42:05,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: successfully saved checkpoint at iteration 17000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.53 7: iteration 17010/ 173500 | consumed samples: 4354560 | consumed tokens: 8918138880 | elapsed time per iteration (s): 0.14 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.727783E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1891.397 | TFLOPs: 7.04 | 7: iteration 17020/ 173500 | consumed samples: 4357120 | consumed tokens: 8923381760 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.725827E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.045 | TFLOPs: 11.89 | 7: iteration 17030/ 173500 | consumed samples: 4359680 | consumed tokens: 8928624640 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.728158E+00 | grad norm: 0.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.910 | TFLOPs: 11.50 | 7: iteration 17040/ 173500 | consumed samples: 4362240 | consumed tokens: 8933867520 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.732535E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.130 | TFLOPs: 11.88 | 7: iteration 17050/ 173500 | consumed samples: 4364800 | consumed tokens: 8939110400 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.730658E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.065 | TFLOPs: 11.64 | 7: iteration 17060/ 173500 | consumed samples: 4367360 | consumed tokens: 8944353280 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.736243E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.715 | TFLOPs: 11.88 | 7: iteration 17070/ 173500 | consumed samples: 4369920 | consumed tokens: 8949596160 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.728413E+00 | grad norm: 0.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.773 | TFLOPs: 11.86 | 7: iteration 17080/ 173500 | consumed samples: 4372480 | consumed tokens: 8954839040 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.732853E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.981 | TFLOPs: 11.65 | 7: iteration 17090/ 173500 | consumed samples: 4375040 | consumed tokens: 8960081920 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.731618E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.106 | TFLOPs: 11.89 | 7: iteration 17100/ 173500 | consumed samples: 4377600 | consumed tokens: 8965324800 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.731337E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.912 | TFLOPs: 11.88 | 7: iteration 17110/ 173500 | consumed samples: 4380160 | consumed tokens: 8970567680 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.721761E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.284 | TFLOPs: 11.89 | 7: iteration 17120/ 173500 | consumed samples: 4382720 | consumed tokens: 8975810560 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.727079E+00 | grad norm: 0.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.852 | TFLOPs: 11.44 | 7: iteration 17130/ 173500 | consumed samples: 4385280 | consumed tokens: 8981053440 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.735075E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.357 | TFLOPs: 11.61 | 7: iteration 17140/ 173500 | consumed samples: 4387840 | consumed tokens: 8986296320 | elapsed time per iteration (s): 0.08 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 4.733232E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.195 | TFLOPs: 11.79 | 7: iteration 17150/ 173500 | consumed samples: 4390400 | consumed tokens: 8991539200 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.734781E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.406 | TFLOPs: 11.84 | 7: iteration 17160/ 173500 | consumed samples: 4392960 | consumed tokens: 8996782080 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.731882E+00 | grad norm: 0.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.698 | TFLOPs: 11.88 | 7: iteration 17170/ 173500 | consumed samples: 4395520 | consumed tokens: 9002024960 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.740422E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.522 | TFLOPs: 11.82 | 7: iteration 17180/ 173500 | consumed samples: 4398080 | consumed tokens: 9007267840 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.725756E+00 | grad norm: 0.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.972 | TFLOPs: 11.53 | 7: iteration 17190/ 173500 | consumed samples: 4400640 | consumed tokens: 9012510720 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.726103E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.921 | TFLOPs: 11.89 | 7: iteration 17200/ 173500 | consumed samples: 4403200 | consumed tokens: 9017753600 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.733143E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.752 | TFLOPs: 11.88 | 7: iteration 17210/ 173500 | consumed samples: 4405760 | consumed tokens: 9022996480 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.734178E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.313 | TFLOPs: 11.84 | 7: iteration 17220/ 173500 | consumed samples: 4408320 | consumed tokens: 9028239360 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.735031E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.942 | TFLOPs: 11.85 | 7: iteration 17230/ 173500 | consumed samples: 4410880 | consumed tokens: 9033482240 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.737167E+00 | grad norm: 0.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.837 | TFLOPs: 11.88 | 7: iteration 17240/ 173500 | consumed samples: 4413440 | consumed tokens: 9038725120 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.723432E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.107 | TFLOPs: 11.85 | 7: iteration 17250/ 173500 | consumed samples: 4416000 | consumed tokens: 9043968000 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.740153E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.354 | TFLOPs: 11.83 | 7: iteration 17260/ 173500 | consumed samples: 4418560 | consumed tokens: 9049210880 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.729994E+00 | grad norm: 0.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.191 | TFLOPs: 11.84 | 7: iteration 17270/ 173500 | consumed samples: 4421120 | consumed tokens: 9054453760 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.720728E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.766 | TFLOPs: 11.57 | 7: iteration 17280/ 173500 | consumed samples: 4423680 | consumed tokens: 9059696640 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.729337E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.422 | TFLOPs: 11.86 | 7: iteration 17290/ 173500 | consumed samples: 4426240 | consumed tokens: 9064939520 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.733609E+00 | grad norm: 0.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.138 | TFLOPs: 11.48 | 7: iteration 17300/ 173500 | consumed samples: 4428800 | consumed tokens: 9070182400 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.717905E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.414 | TFLOPs: 11.22 | 7: iteration 17310/ 173500 | consumed samples: 4431360 | consumed tokens: 9075425280 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.714293E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.031 | TFLOPs: 11.97 | 7: iteration 17320/ 173500 | consumed samples: 4433920 | consumed tokens: 9080668160 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.736209E+00 | grad norm: 0.705 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.512 | TFLOPs: 11.72 | 7: iteration 17330/ 173500 | consumed samples: 4436480 | consumed tokens: 9085911040 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.729223E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.592 | TFLOPs: 11.82 | 7: iteration 17340/ 173500 | consumed samples: 4439040 | consumed tokens: 9091153920 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.726336E+00 | grad norm: 0.646 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.689 | TFLOPs: 11.84 | 7: iteration 17350/ 173500 | consumed samples: 4441600 | consumed tokens: 9096396800 | elapsed time per iteration (s): 0.08 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 4.732267E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.204 | TFLOPs: 12.00 | 7: iteration 17360/ 173500 | consumed samples: 4444160 | consumed tokens: 9101639680 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.727232E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.441 | TFLOPs: 11.98 | 7: iteration 17370/ 173500 | consumed samples: 4446720 | consumed tokens: 9106882560 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.740015E+00 | grad norm: 0.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.008 | TFLOPs: 12.02 | 7: iteration 17380/ 173500 | consumed samples: 4449280 | consumed tokens: 9112125440 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.721306E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.932 | TFLOPs: 11.95 | 7: iteration 17390/ 173500 | consumed samples: 4451840 | consumed tokens: 9117368320 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.736417E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.773 | TFLOPs: 12.02 | 7: iteration 17400/ 173500 | consumed samples: 4454400 | consumed tokens: 9122611200 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.723691E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.022 | TFLOPs: 11.73 | 7: iteration 17410/ 173500 | consumed samples: 4456960 | consumed tokens: 9127854080 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.720718E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.203 | TFLOPs: 11.69 | 7: iteration 17420/ 173500 | consumed samples: 4459520 | consumed tokens: 9133096960 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.719917E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.378 | TFLOPs: 11.98 | 7: iteration 17430/ 173500 | consumed samples: 4462080 | consumed tokens: 9138339840 | elapsed time per iteration (s): 0.09 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.717799E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2880.887 | TFLOPs: 10.72 | 7: iteration 17440/ 173500 | consumed samples: 4464640 | consumed tokens: 9143582720 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.727323E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.667 | TFLOPs: 11.71 | 7: iteration 17450/ 173500 | consumed samples: 4467200 | consumed tokens: 9148825600 | elapsed time per iteration (s): 0.09 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.724853E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.396 | TFLOPs: 10.56 | 7: iteration 17460/ 173500 | consumed samples: 4469760 | consumed tokens: 9154068480 | elapsed time per iteration (s): 0.11 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.723661E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.897 | TFLOPs: 9.05 | 7: iteration 17470/ 173500 | consumed samples: 4472320 | consumed tokens: 9159311360 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.740976E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.178 | TFLOPs: 11.66 | 7: iteration 17480/ 173500 | consumed samples: 4474880 | consumed tokens: 9164554240 | elapsed time per iteration (s): 0.09 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.720211E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.116 | TFLOPs: 11.18 | 7: iteration 17490/ 173500 | consumed samples: 4477440 | consumed tokens: 9169797120 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.743933E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.298 | TFLOPs: 11.52 | 7: iteration 17500/ 173500 | consumed samples: 4480000 | consumed tokens: 9175040000 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.729993E+00 | grad norm: 0.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.145 | TFLOPs: 11.46 | 7: iteration 17510/ 173500 | consumed samples: 4482560 | consumed tokens: 9180282880 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.733997E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.068 | TFLOPs: 11.48 | 7: iteration 17520/ 173500 | consumed samples: 4485120 | consumed tokens: 9185525760 | elapsed time per iteration (s): 0.09 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.725372E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.423 | TFLOPs: 10.78 | 7: iteration 17530/ 173500 | consumed samples: 4487680 | consumed tokens: 9190768640 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.738977E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.087 | TFLOPs: 11.24 | 7: iteration 17540/ 173500 | consumed samples: 4490240 | consumed tokens: 9196011520 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.727628E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.840 | TFLOPs: 11.75 | 7: iteration 17550/ 173500 | consumed samples: 4492800 | consumed tokens: 9201254400 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.740295E+00 | grad norm: 0.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.766 | TFLOPs: 11.70 | 7: iteration 17560/ 173500 | consumed samples: 4495360 | consumed tokens: 9206497280 | elapsed time per iteration (s): 0.09 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.725918E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.011 | TFLOPs: 10.98 | 7: iteration 17570/ 173500 | consumed samples: 4497920 | consumed tokens: 9211740160 | elapsed time per iteration (s): 0.08 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 4.723011E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.522 | TFLOPs: 11.73 | 7: iteration 17580/ 173500 | consumed samples: 4500480 | consumed tokens: 9216983040 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.714775E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.941 | TFLOPs: 11.72 | 7: iteration 17590/ 173500 | consumed samples: 4503040 | consumed tokens: 9222225920 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.723930E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.717 | TFLOPs: 11.37 | 7: iteration 17600/ 173500 | consumed samples: 4505600 | consumed tokens: 9227468800 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.723334E+00 | grad norm: 0.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.709 | TFLOPs: 11.85 | 7: iteration 17610/ 173500 | consumed samples: 4508160 | consumed tokens: 9232711680 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.733562E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.806 | TFLOPs: 11.61 | 7: iteration 17620/ 173500 | consumed samples: 4510720 | consumed tokens: 9237954560 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.722696E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.890 | TFLOPs: 11.76 | 7: iteration 17630/ 173500 | consumed samples: 4513280 | consumed tokens: 9243197440 | elapsed time per iteration (s): 0.09 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.726051E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.441 | TFLOPs: 11.13 | 7: iteration 17640/ 173500 | consumed samples: 4515840 | consumed tokens: 9248440320 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.732781E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.649 | TFLOPs: 11.34 | 7: iteration 17650/ 173500 | consumed samples: 4518400 | consumed tokens: 9253683200 | elapsed time per iteration (s): 0.09 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.733274E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.926 | TFLOPs: 11.04 | 7: iteration 17660/ 173500 | consumed samples: 4520960 | consumed tokens: 9258926080 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.729102E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.805 | TFLOPs: 11.74 | 7: iteration 17670/ 173500 | consumed samples: 4523520 | consumed tokens: 9264168960 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.724022E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.607 | TFLOPs: 11.88 | 7: iteration 17680/ 173500 | consumed samples: 4526080 | consumed tokens: 9269411840 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.732195E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.201 | TFLOPs: 11.84 | 7: iteration 17690/ 173500 | consumed samples: 4528640 | consumed tokens: 9274654720 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.726722E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.173 | TFLOPs: 11.83 | 7: iteration 17700/ 173500 | consumed samples: 4531200 | consumed tokens: 9279897600 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.726032E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.836 | TFLOPs: 11.81 | 7: iteration 17710/ 173500 | consumed samples: 4533760 | consumed tokens: 9285140480 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.709351E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.642 | TFLOPs: 11.60 | 7: iteration 17720/ 173500 | consumed samples: 4536320 | consumed tokens: 9290383360 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.729857E+00 | grad norm: 0.696 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.406 | TFLOPs: 11.61 | 7: iteration 17730/ 173500 | consumed samples: 4538880 | consumed tokens: 9295626240 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.724472E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.698 | TFLOPs: 11.57 | 7: iteration 17740/ 173500 | consumed samples: 4541440 | consumed tokens: 9300869120 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.724551E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.297 | TFLOPs: 11.77 | 7: iteration 17750/ 173500 | consumed samples: 4544000 | consumed tokens: 9306112000 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.731329E+00 | grad norm: 0.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.804 | TFLOPs: 11.32 | 7: iteration 17760/ 173500 | consumed samples: 4546560 | consumed tokens: 9311354880 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.713824E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.592 | TFLOPs: 11.85 | 7: iteration 17770/ 173500 | consumed samples: 4549120 | consumed tokens: 9316597760 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.716899E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.606 | TFLOPs: 11.81 | 7: iteration 17780/ 173500 | consumed samples: 4551680 | consumed tokens: 9321840640 | elapsed time per iteration (s): 0.08 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 4.733551E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.984 | TFLOPs: 11.30 | 7: iteration 17790/ 173500 | consumed samples: 4554240 | consumed tokens: 9327083520 | elapsed time per iteration (s): 0.09 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.731180E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2862.902 | TFLOPs: 10.65 | 7: iteration 17800/ 173500 | consumed samples: 4556800 | consumed tokens: 9332326400 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.713189E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.153 | TFLOPs: 11.62 | 7: iteration 17810/ 173500 | consumed samples: 4559360 | consumed tokens: 9337569280 | elapsed time per iteration (s): 0.09 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.721193E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2937.366 | TFLOPs: 10.93 | 7: iteration 17820/ 173500 | consumed samples: 4561920 | consumed tokens: 9342812160 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.736356E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.985 | TFLOPs: 11.92 | 7: iteration 17830/ 173500 | consumed samples: 4564480 | consumed tokens: 9348055040 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.733385E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.155 | TFLOPs: 11.90 | 7: iteration 17840/ 173500 | consumed samples: 4567040 | consumed tokens: 9353297920 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.735283E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.422 | TFLOPs: 11.35 | 7: iteration 17850/ 173500 | consumed samples: 4569600 | consumed tokens: 9358540800 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.718204E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.085 | TFLOPs: 11.60 | 7: iteration 17860/ 173500 | consumed samples: 4572160 | consumed tokens: 9363783680 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.733259E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.883 | TFLOPs: 11.86 | 7: iteration 17870/ 173500 | consumed samples: 4574720 | consumed tokens: 9369026560 | elapsed time per iteration (s): 0.09 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.716018E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2982.513 | TFLOPs: 11.09 | 7: iteration 17880/ 173500 | consumed samples: 4577280 | consumed tokens: 9374269440 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.724313E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.207 | TFLOPs: 12.03 | 7: iteration 17890/ 173500 | consumed samples: 4579840 | consumed tokens: 9379512320 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.730024E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.813 | TFLOPs: 12.00 | 7: iteration 17900/ 173500 | consumed samples: 4582400 | consumed tokens: 9384755200 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.738469E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.580 | TFLOPs: 11.48 | 7: iteration 17910/ 173500 | consumed samples: 4584960 | consumed tokens: 9389998080 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.722931E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.756 | TFLOPs: 11.97 | 7: iteration 17920/ 173500 | consumed samples: 4587520 | consumed tokens: 9395240960 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.716817E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.312 | TFLOPs: 11.97 | 7: iteration 17930/ 173500 | consumed samples: 4590080 | consumed tokens: 9400483840 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.726849E+00 | grad norm: 0.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.281 | TFLOPs: 11.73 | 7: iteration 17940/ 173500 | consumed samples: 4592640 | consumed tokens: 9405726720 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.731965E+00 | grad norm: 0.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.198 | TFLOPs: 11.36 | 7: iteration 17950/ 173500 | consumed samples: 4595200 | consumed tokens: 9410969600 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.717874E+00 | grad norm: 0.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.538 | TFLOPs: 11.93 | 7: iteration 17960/ 173500 | consumed samples: 4597760 | consumed tokens: 9416212480 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.730783E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.160 | TFLOPs: 11.60 | 7: iteration 17970/ 173500 | consumed samples: 4600320 | consumed tokens: 9421455360 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.729795E+00 | grad norm: 0.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.096 | TFLOPs: 11.93 | 7: iteration 17980/ 173500 | consumed samples: 4602880 | consumed tokens: 9426698240 | elapsed time per iteration (s): 0.08 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.735647E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.995 | TFLOPs: 11.96 | 7: iteration 17990/ 173500 | consumed samples: 4605440 | consumed tokens: 9431941120 | elapsed time per iteration (s): 0.09 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 4.721794E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2779.508 | TFLOPs: 10.34 | 0: [2023-03-17 00:43:28,379] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.00019604685446348677, 0.00019604685446348677, 0.00019604685446348677], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 18000/ 173500 | consumed samples: 4608000 | consumed tokens: 9437184000 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.706371E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.983 | TFLOPs: 11.90 | 0: steps: 18000 loss: 4.6855 iter time (s): 0.085 samples/sec: 3017.595 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 18000 | lm loss value: 4.552152E+00 | lm loss PPL: 9.483625E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 18000 to checkpoints_14m91b100m 0: [2023-03-17 00:43:28,437] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step18000 is begin to save! 0: [2023-03-17 00:43:28,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:43:28,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:43:28,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:43:28,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:43:28,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:43:28,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:43:28,473] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:43:28,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:43:28,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:43:28,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:43:28,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:43:28,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:43:28,479] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step18000/mp_rank_00_model_states.pt 0: [2023-03-17 00:43:28,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:43:28,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:43:28,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:43:28,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:43:28,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:43:28,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:43:28,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:43:28,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2023-03-17 00:43:28,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2023-03-17 00:43:28,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:43:28,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:43:28,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2023-03-17 00:43:28,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:43:28,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:43:28,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: successfully saved checkpoint at iteration 18000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.54 7: iteration 18010/ 173500 | consumed samples: 4610560 | consumed tokens: 9442426880 | elapsed time per iteration (s): 0.10 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.713690E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.228 | TFLOPs: 9.70 | 7: iteration 18020/ 173500 | consumed samples: 4613120 | consumed tokens: 9447669760 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.730091E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.868 | TFLOPs: 11.31 | 7: iteration 18030/ 173500 | consumed samples: 4615680 | consumed tokens: 9452912640 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.724490E+00 | grad norm: 0.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.339 | TFLOPs: 11.07 | 7: iteration 18040/ 173500 | consumed samples: 4618240 | consumed tokens: 9458155520 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.719540E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.814 | TFLOPs: 11.25 | 7: iteration 18050/ 173500 | consumed samples: 4620800 | consumed tokens: 9463398400 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.724604E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.193 | TFLOPs: 11.35 | 7: iteration 18060/ 173500 | consumed samples: 4623360 | consumed tokens: 9468641280 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.714084E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.293 | TFLOPs: 10.45 | 7: iteration 18070/ 173500 | consumed samples: 4625920 | consumed tokens: 9473884160 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.723425E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.377 | TFLOPs: 10.80 | 7: iteration 18080/ 173500 | consumed samples: 4628480 | consumed tokens: 9479127040 | elapsed time per iteration (s): 0.09 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.724469E+00 | grad norm: 0.641 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.007 | TFLOPs: 10.97 | 7: iteration 18090/ 173500 | consumed samples: 4631040 | consumed tokens: 9484369920 | elapsed time per iteration (s): 0.10 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.716082E+00 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2577.333 | TFLOPs: 9.59 | 7: iteration 18100/ 173500 | consumed samples: 4633600 | consumed tokens: 9489612800 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.727206E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.634 | TFLOPs: 11.37 | 7: iteration 18110/ 173500 | consumed samples: 4636160 | consumed tokens: 9494855680 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.738503E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.087 | TFLOPs: 11.62 | 7: iteration 18120/ 173500 | consumed samples: 4638720 | consumed tokens: 9500098560 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.737193E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.033 | TFLOPs: 11.64 | 7: iteration 18130/ 173500 | consumed samples: 4641280 | consumed tokens: 9505341440 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.721788E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.670 | TFLOPs: 11.84 | 7: iteration 18140/ 173500 | consumed samples: 4643840 | consumed tokens: 9510584320 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.715352E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.020 | TFLOPs: 11.66 | 7: iteration 18150/ 173500 | consumed samples: 4646400 | consumed tokens: 9515827200 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.709976E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.042 | TFLOPs: 11.40 | 7: iteration 18160/ 173500 | consumed samples: 4648960 | consumed tokens: 9521070080 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.715787E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.689 | TFLOPs: 11.88 | 7: iteration 18170/ 173500 | consumed samples: 4651520 | consumed tokens: 9526312960 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.723357E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.080 | TFLOPs: 11.62 | 7: iteration 18180/ 173500 | consumed samples: 4654080 | consumed tokens: 9531555840 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.725431E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.477 | TFLOPs: 11.87 | 7: iteration 18190/ 173500 | consumed samples: 4656640 | consumed tokens: 9536798720 | elapsed time per iteration (s): 0.08 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 4.725628E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.718 | TFLOPs: 11.31 | 7: iteration 18200/ 173500 | consumed samples: 4659200 | consumed tokens: 9542041600 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.726817E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.043 | TFLOPs: 11.49 | 7: iteration 18210/ 173500 | consumed samples: 4661760 | consumed tokens: 9547284480 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.714521E+00 | grad norm: 0.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.798 | TFLOPs: 11.53 | 7: iteration 18220/ 173500 | consumed samples: 4664320 | consumed tokens: 9552527360 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.713194E+00 | grad norm: 0.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.274 | TFLOPs: 11.39 | 7: iteration 18230/ 173500 | consumed samples: 4666880 | consumed tokens: 9557770240 | elapsed time per iteration (s): 0.09 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.719306E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.319 | TFLOPs: 11.04 | 7: iteration 18240/ 173500 | consumed samples: 4669440 | consumed tokens: 9563013120 | elapsed time per iteration (s): 0.11 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.726672E+00 | grad norm: 0.647 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2431.571 | TFLOPs: 9.04 | 7: iteration 18250/ 173500 | consumed samples: 4672000 | consumed tokens: 9568256000 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.728301E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.296 | TFLOPs: 9.45 | 7: iteration 18260/ 173500 | consumed samples: 4674560 | consumed tokens: 9573498880 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.722262E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.544 | TFLOPs: 9.26 | 7: iteration 18270/ 173500 | consumed samples: 4677120 | consumed tokens: 9578741760 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.729128E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.700 | TFLOPs: 9.37 | 7: iteration 18280/ 173500 | consumed samples: 4679680 | consumed tokens: 9583984640 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.725760E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.672 | TFLOPs: 9.48 | 7: iteration 18290/ 173500 | consumed samples: 4682240 | consumed tokens: 9589227520 | elapsed time per iteration (s): 0.11 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.723343E+00 | grad norm: 0.637 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2397.123 | TFLOPs: 8.92 | 7: iteration 18300/ 173500 | consumed samples: 4684800 | consumed tokens: 9594470400 | elapsed time per iteration (s): 0.12 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.711588E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.365 | TFLOPs: 7.78 | 7: iteration 18310/ 173500 | consumed samples: 4687360 | consumed tokens: 9599713280 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.720316E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.972 | TFLOPs: 9.37 | 7: iteration 18320/ 173500 | consumed samples: 4689920 | consumed tokens: 9604956160 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.710884E+00 | grad norm: 0.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.436 | TFLOPs: 9.26 | 7: iteration 18330/ 173500 | consumed samples: 4692480 | consumed tokens: 9610199040 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.730424E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.200 | TFLOPs: 9.52 | 7: iteration 18340/ 173500 | consumed samples: 4695040 | consumed tokens: 9615441920 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.713236E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2531.633 | TFLOPs: 9.42 | 7: iteration 18350/ 173500 | consumed samples: 4697600 | consumed tokens: 9620684800 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.728049E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.758 | TFLOPs: 9.33 | 7: iteration 18360/ 173500 | consumed samples: 4700160 | consumed tokens: 9625927680 | elapsed time per iteration (s): 0.10 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.723547E+00 | grad norm: 0.638 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.226 | TFLOPs: 9.76 | 7: iteration 18370/ 173500 | consumed samples: 4702720 | consumed tokens: 9631170560 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.719809E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.856 | TFLOPs: 12.03 | 7: iteration 18380/ 173500 | consumed samples: 4705280 | consumed tokens: 9636413440 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.728481E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.765 | TFLOPs: 11.74 | 7: iteration 18390/ 173500 | consumed samples: 4707840 | consumed tokens: 9641656320 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.719521E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.062 | TFLOPs: 11.49 | 7: iteration 18400/ 173500 | consumed samples: 4710400 | consumed tokens: 9646899200 | elapsed time per iteration (s): 0.08 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 4.722257E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.413 | TFLOPs: 11.75 | 7: iteration 18410/ 173500 | consumed samples: 4712960 | consumed tokens: 9652142080 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.711484E+00 | grad norm: 0.605 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.575 | TFLOPs: 12.05 | 7: iteration 18420/ 173500 | consumed samples: 4715520 | consumed tokens: 9657384960 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.714280E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.761 | TFLOPs: 12.04 | 7: iteration 18430/ 173500 | consumed samples: 4718080 | consumed tokens: 9662627840 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.725325E+00 | grad norm: 0.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2991.196 | TFLOPs: 11.13 | 7: iteration 18440/ 173500 | consumed samples: 4720640 | consumed tokens: 9667870720 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.723863E+00 | grad norm: 0.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.778 | TFLOPs: 12.01 | 7: iteration 18450/ 173500 | consumed samples: 4723200 | consumed tokens: 9673113600 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.724266E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.374 | TFLOPs: 12.03 | 7: iteration 18460/ 173500 | consumed samples: 4725760 | consumed tokens: 9678356480 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.733822E+00 | grad norm: 0.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2922.859 | TFLOPs: 10.87 | 7: iteration 18470/ 173500 | consumed samples: 4728320 | consumed tokens: 9683599360 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.718839E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.241 | TFLOPs: 11.70 | 7: iteration 18480/ 173500 | consumed samples: 4730880 | consumed tokens: 9688842240 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.715334E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.687 | TFLOPs: 11.95 | 7: iteration 18490/ 173500 | consumed samples: 4733440 | consumed tokens: 9694085120 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.721130E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.532 | TFLOPs: 11.58 | 7: iteration 18500/ 173500 | consumed samples: 4736000 | consumed tokens: 9699328000 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.717376E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.055 | TFLOPs: 11.59 | 7: iteration 18510/ 173500 | consumed samples: 4738560 | consumed tokens: 9704570880 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.716403E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2886.908 | TFLOPs: 10.74 | 7: iteration 18520/ 173500 | consumed samples: 4741120 | consumed tokens: 9709813760 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.739849E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.341 | TFLOPs: 11.00 | 7: iteration 18530/ 173500 | consumed samples: 4743680 | consumed tokens: 9715056640 | elapsed time per iteration (s): 0.10 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.725999E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2614.557 | TFLOPs: 9.73 | 7: iteration 18540/ 173500 | consumed samples: 4746240 | consumed tokens: 9720299520 | elapsed time per iteration (s): 0.11 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.721552E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2435.592 | TFLOPs: 9.06 | 7: iteration 18550/ 173500 | consumed samples: 4748800 | consumed tokens: 9725542400 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.719956E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.155 | TFLOPs: 10.51 | 7: iteration 18560/ 173500 | consumed samples: 4751360 | consumed tokens: 9730785280 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.725312E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2869.504 | TFLOPs: 10.67 | 7: iteration 18570/ 173500 | consumed samples: 4753920 | consumed tokens: 9736028160 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.718583E+00 | grad norm: 0.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.523 | TFLOPs: 11.61 | 7: iteration 18580/ 173500 | consumed samples: 4756480 | consumed tokens: 9741271040 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.709473E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.215 | TFLOPs: 11.00 | 7: iteration 18590/ 173500 | consumed samples: 4759040 | consumed tokens: 9746513920 | elapsed time per iteration (s): 0.09 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.717139E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.354 | TFLOPs: 10.38 | 7: iteration 18600/ 173500 | consumed samples: 4761600 | consumed tokens: 9751756800 | elapsed time per iteration (s): 0.08 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 4.720226E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.542 | TFLOPs: 11.89 | 7: iteration 18610/ 173500 | consumed samples: 4764160 | consumed tokens: 9756999680 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.727612E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.428 | TFLOPs: 11.84 | 7: iteration 18620/ 173500 | consumed samples: 4766720 | consumed tokens: 9762242560 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.717794E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.297 | TFLOPs: 11.87 | 7: iteration 18630/ 173500 | consumed samples: 4769280 | consumed tokens: 9767485440 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.723833E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.954 | TFLOPs: 11.88 | 7: iteration 18640/ 173500 | consumed samples: 4771840 | consumed tokens: 9772728320 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.709247E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.656 | TFLOPs: 11.86 | 7: iteration 18650/ 173500 | consumed samples: 4774400 | consumed tokens: 9777971200 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.725405E+00 | grad norm: 0.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.162 | TFLOPs: 11.85 | 7: iteration 18660/ 173500 | consumed samples: 4776960 | consumed tokens: 9783214080 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.720290E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.031 | TFLOPs: 11.85 | 7: iteration 18670/ 173500 | consumed samples: 4779520 | consumed tokens: 9788456960 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.723574E+00 | grad norm: 0.626 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.027 | TFLOPs: 11.87 | 7: iteration 18680/ 173500 | consumed samples: 4782080 | consumed tokens: 9793699840 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.715679E+00 | grad norm: 0.689 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.665 | TFLOPs: 11.38 | 7: iteration 18690/ 173500 | consumed samples: 4784640 | consumed tokens: 9798942720 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.718936E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.221 | TFLOPs: 11.64 | 7: iteration 18700/ 173500 | consumed samples: 4787200 | consumed tokens: 9804185600 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.710305E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.717 | TFLOPs: 11.58 | 7: iteration 18710/ 173500 | consumed samples: 4789760 | consumed tokens: 9809428480 | elapsed time per iteration (s): 0.09 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.713018E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.975 | TFLOPs: 11.16 | 7: iteration 18720/ 173500 | consumed samples: 4792320 | consumed tokens: 9814671360 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.720316E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.662 | TFLOPs: 11.83 | 7: iteration 18730/ 173500 | consumed samples: 4794880 | consumed tokens: 9819914240 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.720518E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.744 | TFLOPs: 11.49 | 7: iteration 18740/ 173500 | consumed samples: 4797440 | consumed tokens: 9825157120 | elapsed time per iteration (s): 0.09 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.718514E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.372 | TFLOPs: 10.84 | 7: iteration 18750/ 173500 | consumed samples: 4800000 | consumed tokens: 9830400000 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.707465E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.093 | TFLOPs: 11.81 | 7: iteration 18760/ 173500 | consumed samples: 4802560 | consumed tokens: 9835642880 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.716163E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.611 | TFLOPs: 11.55 | 7: iteration 18770/ 173500 | consumed samples: 4805120 | consumed tokens: 9840885760 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.716486E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.810 | TFLOPs: 11.31 | 7: iteration 18780/ 173500 | consumed samples: 4807680 | consumed tokens: 9846128640 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.726020E+00 | grad norm: 0.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.162 | TFLOPs: 11.74 | 7: iteration 18790/ 173500 | consumed samples: 4810240 | consumed tokens: 9851371520 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.711227E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.637 | TFLOPs: 11.85 | 7: iteration 18800/ 173500 | consumed samples: 4812800 | consumed tokens: 9856614400 | elapsed time per iteration (s): 0.08 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 4.709906E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.038 | TFLOPs: 11.58 | 7: iteration 18810/ 173500 | consumed samples: 4815360 | consumed tokens: 9861857280 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.712075E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.393 | TFLOPs: 11.48 | 7: iteration 18820/ 173500 | consumed samples: 4817920 | consumed tokens: 9867100160 | elapsed time per iteration (s): 0.09 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.710408E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.660 | TFLOPs: 11.01 | 7: iteration 18830/ 173500 | consumed samples: 4820480 | consumed tokens: 9872343040 | elapsed time per iteration (s): 0.09 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.717697E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2914.378 | TFLOPs: 10.84 | 7: iteration 18840/ 173500 | consumed samples: 4823040 | consumed tokens: 9877585920 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.718475E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.268 | TFLOPs: 11.55 | 7: iteration 18850/ 173500 | consumed samples: 4825600 | consumed tokens: 9882828800 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.704003E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.723 | TFLOPs: 11.80 | 7: iteration 18860/ 173500 | consumed samples: 4828160 | consumed tokens: 9888071680 | elapsed time per iteration (s): 0.09 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.716808E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.643 | TFLOPs: 10.98 | 7: iteration 18870/ 173500 | consumed samples: 4830720 | consumed tokens: 9893314560 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.718885E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.358 | TFLOPs: 11.52 | 7: iteration 18880/ 173500 | consumed samples: 4833280 | consumed tokens: 9898557440 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.705162E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.867 | TFLOPs: 11.56 | 7: iteration 18890/ 173500 | consumed samples: 4835840 | consumed tokens: 9903800320 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.708585E+00 | grad norm: 0.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.014 | TFLOPs: 11.80 | 7: iteration 18900/ 173500 | consumed samples: 4838400 | consumed tokens: 9909043200 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.723476E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.141 | TFLOPs: 11.54 | 7: iteration 18910/ 173500 | consumed samples: 4840960 | consumed tokens: 9914286080 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.728286E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.965 | TFLOPs: 11.33 | 7: iteration 18920/ 173500 | consumed samples: 4843520 | consumed tokens: 9919528960 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.723131E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.840 | TFLOPs: 11.81 | 7: iteration 18930/ 173500 | consumed samples: 4846080 | consumed tokens: 9924771840 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.718209E+00 | grad norm: 0.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.196 | TFLOPs: 11.91 | 7: iteration 18940/ 173500 | consumed samples: 4848640 | consumed tokens: 9930014720 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.718947E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.722 | TFLOPs: 11.86 | 7: iteration 18950/ 173500 | consumed samples: 4851200 | consumed tokens: 9935257600 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.708046E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.717 | TFLOPs: 11.84 | 7: iteration 18960/ 173500 | consumed samples: 4853760 | consumed tokens: 9940500480 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.718107E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.369 | TFLOPs: 11.39 | 7: iteration 18970/ 173500 | consumed samples: 4856320 | consumed tokens: 9945743360 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.711709E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.446 | TFLOPs: 11.79 | 7: iteration 18980/ 173500 | consumed samples: 4858880 | consumed tokens: 9950986240 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.701679E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.639 | TFLOPs: 11.88 | 7: iteration 18990/ 173500 | consumed samples: 4861440 | consumed tokens: 9956229120 | elapsed time per iteration (s): 0.08 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 4.717978E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.140 | TFLOPs: 11.78 | 7: iteration 19000/ 173500 | consumed samples: 4864000 | consumed tokens: 9961472000 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.720079E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.602 | TFLOPs: 11.88 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 19000 | lm loss value: 4.528687E+00 | lm loss PPL: 9.263689E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 19000 to checkpoints_14m91b100m 0: [2023-03-17 00:44:54,656] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step19000 is begin to save! 0: [2023-03-17 00:44:54,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:44:54,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:44:54,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:44:54,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:44:54,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:44:54,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:44:54,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:44:54,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:44:54,693] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:44:54,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:44:54,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:44:54,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:44:54,697] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step19000/mp_rank_00_model_states.pt 0: [2023-03-17 00:44:54,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:44:54,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:44:54,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:44:54,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2023-03-17 00:44:54,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: successfully saved checkpoint at iteration 19000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.27 7: iteration 19010/ 173500 | consumed samples: 4866560 | consumed tokens: 9966714880 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.710052E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.814 | TFLOPs: 10.36 | 7: iteration 19020/ 173500 | consumed samples: 4869120 | consumed tokens: 9971957760 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.715583E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.524 | TFLOPs: 11.53 | 7: iteration 19030/ 173500 | consumed samples: 4871680 | consumed tokens: 9977200640 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.709345E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.819 | TFLOPs: 11.06 | 7: iteration 19040/ 173500 | consumed samples: 4874240 | consumed tokens: 9982443520 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.732254E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.945 | TFLOPs: 11.92 | 7: iteration 19050/ 173500 | consumed samples: 4876800 | consumed tokens: 9987686400 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.732879E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.857 | TFLOPs: 11.73 | 7: iteration 19060/ 173500 | consumed samples: 4879360 | consumed tokens: 9992929280 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.728515E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.720 | TFLOPs: 12.00 | 7: iteration 19070/ 173500 | consumed samples: 4881920 | consumed tokens: 9998172160 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.716674E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.727 | TFLOPs: 11.59 | 7: iteration 19080/ 173500 | consumed samples: 4884480 | consumed tokens: 10003415040 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.710115E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.212 | TFLOPs: 11.19 | 7: iteration 19090/ 173500 | consumed samples: 4887040 | consumed tokens: 10008657920 | elapsed time per iteration (s): 0.11 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.717942E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2360.704 | TFLOPs: 8.78 | 7: iteration 19100/ 173500 | consumed samples: 4889600 | consumed tokens: 10013900800 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.724282E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.787 | TFLOPs: 10.88 | 7: iteration 19110/ 173500 | consumed samples: 4892160 | consumed tokens: 10019143680 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.725885E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.291 | TFLOPs: 11.51 | 7: iteration 19120/ 173500 | consumed samples: 4894720 | consumed tokens: 10024386560 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.700071E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2856.150 | TFLOPs: 10.62 | 7: iteration 19130/ 173500 | consumed samples: 4897280 | consumed tokens: 10029629440 | elapsed time per iteration (s): 0.10 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.712624E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.244 | TFLOPs: 10.01 | 7: iteration 19140/ 173500 | consumed samples: 4899840 | consumed tokens: 10034872320 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.728320E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.859 | TFLOPs: 10.92 | 7: iteration 19150/ 173500 | consumed samples: 4902400 | consumed tokens: 10040115200 | elapsed time per iteration (s): 0.10 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.712061E+00 | grad norm: 0.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.035 | TFLOPs: 9.97 | 7: iteration 19160/ 173500 | consumed samples: 4904960 | consumed tokens: 10045358080 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.715075E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.870 | TFLOPs: 10.17 | 7: iteration 19170/ 173500 | consumed samples: 4907520 | consumed tokens: 10050600960 | elapsed time per iteration (s): 0.10 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.724223E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.687 | TFLOPs: 9.83 | 7: iteration 19180/ 173500 | consumed samples: 4910080 | consumed tokens: 10055843840 | elapsed time per iteration (s): 0.09 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.722696E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.352 | TFLOPs: 10.97 | 7: iteration 19190/ 173500 | consumed samples: 4912640 | consumed tokens: 10061086720 | elapsed time per iteration (s): 0.08 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 4.726549E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.605 | TFLOPs: 11.89 | 7: iteration 19200/ 173500 | consumed samples: 4915200 | consumed tokens: 10066329600 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.717797E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.482 | TFLOPs: 11.64 | 7: iteration 19210/ 173500 | consumed samples: 4917760 | consumed tokens: 10071572480 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.719597E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2590.751 | TFLOPs: 9.64 | 7: iteration 19220/ 173500 | consumed samples: 4920320 | consumed tokens: 10076815360 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.711814E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2462.215 | TFLOPs: 9.16 | 7: iteration 19230/ 173500 | consumed samples: 4922880 | consumed tokens: 10082058240 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.713240E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.683 | TFLOPs: 9.41 | 7: iteration 19240/ 173500 | consumed samples: 4925440 | consumed tokens: 10087301120 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.716776E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2489.786 | TFLOPs: 9.26 | 7: iteration 19250/ 173500 | consumed samples: 4928000 | consumed tokens: 10092544000 | elapsed time per iteration (s): 0.11 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.708496E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.534 | TFLOPs: 8.95 | 7: iteration 19260/ 173500 | consumed samples: 4930560 | consumed tokens: 10097786880 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.714232E+00 | grad norm: 0.674 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.003 | TFLOPs: 9.30 | 7: iteration 19270/ 173500 | consumed samples: 4933120 | consumed tokens: 10103029760 | elapsed time per iteration (s): 0.11 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.708624E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.489 | TFLOPs: 8.85 | 7: iteration 19280/ 173500 | consumed samples: 4935680 | consumed tokens: 10108272640 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.722540E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.766 | TFLOPs: 9.34 | 7: iteration 19290/ 173500 | consumed samples: 4938240 | consumed tokens: 10113515520 | elapsed time per iteration (s): 0.10 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.709222E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.867 | TFLOPs: 9.72 | 7: iteration 19300/ 173500 | consumed samples: 4940800 | consumed tokens: 10118758400 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.709054E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.772 | TFLOPs: 11.87 | 7: iteration 19310/ 173500 | consumed samples: 4943360 | consumed tokens: 10124001280 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.709698E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.215 | TFLOPs: 11.64 | 7: iteration 19320/ 173500 | consumed samples: 4945920 | consumed tokens: 10129244160 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.724627E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.853 | TFLOPs: 11.86 | 7: iteration 19330/ 173500 | consumed samples: 4948480 | consumed tokens: 10134487040 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.717508E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.057 | TFLOPs: 11.90 | 7: iteration 19340/ 173500 | consumed samples: 4951040 | consumed tokens: 10139729920 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.707547E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.118 | TFLOPs: 11.89 | 7: iteration 19350/ 173500 | consumed samples: 4953600 | consumed tokens: 10144972800 | elapsed time per iteration (s): 0.09 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.702439E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.965 | TFLOPs: 11.15 | 7: iteration 19360/ 173500 | consumed samples: 4956160 | consumed tokens: 10150215680 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.707491E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.742 | TFLOPs: 11.37 | 7: iteration 19370/ 173500 | consumed samples: 4958720 | consumed tokens: 10155458560 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.713192E+00 | grad norm: 0.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.883 | TFLOPs: 11.65 | 7: iteration 19380/ 173500 | consumed samples: 4961280 | consumed tokens: 10160701440 | elapsed time per iteration (s): 0.08 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 4.716695E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.374 | TFLOPs: 11.39 | 7: iteration 19390/ 173500 | consumed samples: 4963840 | consumed tokens: 10165944320 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.728648E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.666 | TFLOPs: 11.90 | 7: iteration 19400/ 173500 | consumed samples: 4966400 | consumed tokens: 10171187200 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.715023E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.675 | TFLOPs: 11.41 | 7: iteration 19410/ 173500 | consumed samples: 4968960 | consumed tokens: 10176430080 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.702754E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.785 | TFLOPs: 11.56 | 7: iteration 19420/ 173500 | consumed samples: 4971520 | consumed tokens: 10181672960 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.731737E+00 | grad norm: 0.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.428 | TFLOPs: 11.57 | 7: iteration 19430/ 173500 | consumed samples: 4974080 | consumed tokens: 10186915840 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.715335E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.204 | TFLOPs: 11.40 | 7: iteration 19440/ 173500 | consumed samples: 4976640 | consumed tokens: 10192158720 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.709822E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.656 | TFLOPs: 11.64 | 7: iteration 19450/ 173500 | consumed samples: 4979200 | consumed tokens: 10197401600 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.707660E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.384 | TFLOPs: 11.90 | 7: iteration 19460/ 173500 | consumed samples: 4981760 | consumed tokens: 10202644480 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.709030E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.023 | TFLOPs: 11.93 | 7: iteration 19470/ 173500 | consumed samples: 4984320 | consumed tokens: 10207887360 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.716052E+00 | grad norm: 0.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.341 | TFLOPs: 11.94 | 7: iteration 19480/ 173500 | consumed samples: 4986880 | consumed tokens: 10213130240 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.726643E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.375 | TFLOPs: 11.92 | 7: iteration 19490/ 173500 | consumed samples: 4989440 | consumed tokens: 10218373120 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.720114E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.589 | TFLOPs: 11.86 | 7: iteration 19500/ 173500 | consumed samples: 4992000 | consumed tokens: 10223616000 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.716345E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.635 | TFLOPs: 11.93 | 7: iteration 19510/ 173500 | consumed samples: 4994560 | consumed tokens: 10228858880 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.716901E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.777 | TFLOPs: 11.79 | 7: iteration 19520/ 173500 | consumed samples: 4997120 | consumed tokens: 10234101760 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.726613E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.605 | TFLOPs: 11.90 | 7: iteration 19530/ 173500 | consumed samples: 4999680 | consumed tokens: 10239344640 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.719251E+00 | grad norm: 0.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.691 | TFLOPs: 11.92 | 7: iteration 19540/ 173500 | consumed samples: 5002240 | consumed tokens: 10244587520 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.718410E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.781 | TFLOPs: 11.88 | 7: iteration 19550/ 173500 | consumed samples: 5004800 | consumed tokens: 10249830400 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.718329E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.157 | TFLOPs: 11.95 | 7: iteration 19560/ 173500 | consumed samples: 5007360 | consumed tokens: 10255073280 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.713817E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.856 | TFLOPs: 11.86 | 7: iteration 19570/ 173500 | consumed samples: 5009920 | consumed tokens: 10260316160 | elapsed time per iteration (s): 0.08 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 4.712645E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.352 | TFLOPs: 11.90 | 7: iteration 19580/ 173500 | consumed samples: 5012480 | consumed tokens: 10265559040 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.708427E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.963 | TFLOPs: 11.75 | 7: iteration 19590/ 173500 | consumed samples: 5015040 | consumed tokens: 10270801920 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.706866E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.053 | TFLOPs: 11.93 | 7: iteration 19600/ 173500 | consumed samples: 5017600 | consumed tokens: 10276044800 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.707523E+00 | grad norm: 0.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.559 | TFLOPs: 11.88 | 7: iteration 19610/ 173500 | consumed samples: 5020160 | consumed tokens: 10281287680 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.729348E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.524 | TFLOPs: 11.94 | 7: iteration 19620/ 173500 | consumed samples: 5022720 | consumed tokens: 10286530560 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.722360E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.631 | TFLOPs: 11.87 | 7: iteration 19630/ 173500 | consumed samples: 5025280 | consumed tokens: 10291773440 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.703420E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.989 | TFLOPs: 11.87 | 7: iteration 19640/ 173500 | consumed samples: 5027840 | consumed tokens: 10297016320 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.700741E+00 | grad norm: 0.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.632 | TFLOPs: 11.91 | 7: iteration 19650/ 173500 | consumed samples: 5030400 | consumed tokens: 10302259200 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.701525E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.534 | TFLOPs: 11.89 | 7: iteration 19660/ 173500 | consumed samples: 5032960 | consumed tokens: 10307502080 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.716368E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.417 | TFLOPs: 11.93 | 7: iteration 19670/ 173500 | consumed samples: 5035520 | consumed tokens: 10312744960 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.709471E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.482 | TFLOPs: 11.94 | 7: iteration 19680/ 173500 | consumed samples: 5038080 | consumed tokens: 10317987840 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.714260E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.199 | TFLOPs: 11.97 | 7: iteration 19690/ 173500 | consumed samples: 5040640 | consumed tokens: 10323230720 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.700121E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.200 | TFLOPs: 11.95 | 7: iteration 19700/ 173500 | consumed samples: 5043200 | consumed tokens: 10328473600 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.704808E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.695 | TFLOPs: 11.92 | 7: iteration 19710/ 173500 | consumed samples: 5045760 | consumed tokens: 10333716480 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.709435E+00 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.867 | TFLOPs: 11.96 | 7: iteration 19720/ 173500 | consumed samples: 5048320 | consumed tokens: 10338959360 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.714974E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.713 | TFLOPs: 11.98 | 7: iteration 19730/ 173500 | consumed samples: 5050880 | consumed tokens: 10344202240 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.696780E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.404 | TFLOPs: 11.99 | 7: iteration 19740/ 173500 | consumed samples: 5053440 | consumed tokens: 10349445120 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.710796E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.336 | TFLOPs: 11.92 | 7: iteration 19750/ 173500 | consumed samples: 5056000 | consumed tokens: 10354688000 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.702190E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.347 | TFLOPs: 11.94 | 7: iteration 19760/ 173500 | consumed samples: 5058560 | consumed tokens: 10359930880 | elapsed time per iteration (s): 0.08 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 4.709080E+00 | grad norm: 0.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.781 | TFLOPs: 11.96 | 7: iteration 19770/ 173500 | consumed samples: 5061120 | consumed tokens: 10365173760 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.715511E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.716 | TFLOPs: 11.76 | 7: iteration 19780/ 173500 | consumed samples: 5063680 | consumed tokens: 10370416640 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.718355E+00 | grad norm: 0.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.561 | TFLOPs: 11.93 | 7: iteration 19790/ 173500 | consumed samples: 5066240 | consumed tokens: 10375659520 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.710765E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.128 | TFLOPs: 11.93 | 7: iteration 19800/ 173500 | consumed samples: 5068800 | consumed tokens: 10380902400 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.727016E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.504 | TFLOPs: 11.93 | 7: iteration 19810/ 173500 | consumed samples: 5071360 | consumed tokens: 10386145280 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.714845E+00 | grad norm: 0.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.575 | TFLOPs: 11.92 | 7: iteration 19820/ 173500 | consumed samples: 5073920 | consumed tokens: 10391388160 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.716595E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.640 | TFLOPs: 12.00 | 7: iteration 19830/ 173500 | consumed samples: 5076480 | consumed tokens: 10396631040 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.719432E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.558 | TFLOPs: 11.91 | 7: iteration 19840/ 173500 | consumed samples: 5079040 | consumed tokens: 10401873920 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.717065E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.240 | TFLOPs: 11.98 | 7: iteration 19850/ 173500 | consumed samples: 5081600 | consumed tokens: 10407116800 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.708574E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.481 | TFLOPs: 12.01 | 7: iteration 19860/ 173500 | consumed samples: 5084160 | consumed tokens: 10412359680 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.714798E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.712 | TFLOPs: 11.97 | 7: iteration 19870/ 173500 | consumed samples: 5086720 | consumed tokens: 10417602560 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.714072E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.211 | TFLOPs: 11.99 | 7: iteration 19880/ 173500 | consumed samples: 5089280 | consumed tokens: 10422845440 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.718572E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.740 | TFLOPs: 11.96 | 7: iteration 19890/ 173500 | consumed samples: 5091840 | consumed tokens: 10428088320 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.711283E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.011 | TFLOPs: 11.96 | 7: iteration 19900/ 173500 | consumed samples: 5094400 | consumed tokens: 10433331200 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.705238E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.030 | TFLOPs: 11.95 | 7: iteration 19910/ 173500 | consumed samples: 5096960 | consumed tokens: 10438574080 | elapsed time per iteration (s): 0.13 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.716031E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2022.995 | TFLOPs: 7.52 | 7: iteration 19920/ 173500 | consumed samples: 5099520 | consumed tokens: 10443816960 | elapsed time per iteration (s): 0.13 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.702888E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2000.444 | TFLOPs: 7.44 | 7: iteration 19930/ 173500 | consumed samples: 5102080 | consumed tokens: 10449059840 | elapsed time per iteration (s): 0.11 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.713182E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2235.157 | TFLOPs: 8.31 | 7: iteration 19940/ 173500 | consumed samples: 5104640 | consumed tokens: 10454302720 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.716747E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.061 | TFLOPs: 11.93 | 7: iteration 19950/ 173500 | consumed samples: 5107200 | consumed tokens: 10459545600 | elapsed time per iteration (s): 0.08 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 4.704573E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.978 | TFLOPs: 11.91 | 7: iteration 19960/ 173500 | consumed samples: 5109760 | consumed tokens: 10464788480 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.715236E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.702 | TFLOPs: 11.95 | 7: iteration 19970/ 173500 | consumed samples: 5112320 | consumed tokens: 10470031360 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.713969E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.505 | TFLOPs: 11.97 | 7: iteration 19980/ 173500 | consumed samples: 5114880 | consumed tokens: 10475274240 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.716380E+00 | grad norm: 0.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.235 | TFLOPs: 11.95 | 7: iteration 19990/ 173500 | consumed samples: 5117440 | consumed tokens: 10480517120 | elapsed time per iteration (s): 0.09 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.714996E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2869.359 | TFLOPs: 10.67 | 0: [2023-03-17 00:46:19,818] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=0, lr=[0.00019502450208460265, 0.00019502450208460265, 0.00019502450208460265], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 20000/ 173500 | consumed samples: 5120000 | consumed tokens: 10485760000 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.717497E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.935 | TFLOPs: 11.66 | 0: steps: 20000 loss: 4.7493 iter time (s): 0.085 samples/sec: 3028.279 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 20000 | lm loss value: 4.567067E+00 | lm loss PPL: 9.626133E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 20000 to checkpoints_14m91b100m 0: [2023-03-17 00:46:19,876] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is begin to save! 0: [2023-03-17 00:46:19,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:46:19,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:46:19,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:46:19,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:46:19,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:46:19,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:46:19,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:46:19,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:46:19,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:46:19,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:46:19,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:46:19,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:46:19,918] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step20000/mp_rank_00_model_states.pt 0: [2023-03-17 00:46:19,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:46:19,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:46:19,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:46:19,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2023-03-17 00:46:19,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:46:19,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: successfully saved checkpoint at iteration 20000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.67 7: iteration 20010/ 173500 | consumed samples: 5122560 | consumed tokens: 10491002880 | elapsed time per iteration (s): 0.09 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.697317E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.411 | TFLOPs: 10.19 | 7: iteration 20020/ 173500 | consumed samples: 5125120 | consumed tokens: 10496245760 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.708089E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.685 | TFLOPs: 11.90 | 7: iteration 20030/ 173500 | consumed samples: 5127680 | consumed tokens: 10501488640 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.716563E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.251 | TFLOPs: 11.89 | 7: iteration 20040/ 173500 | consumed samples: 5130240 | consumed tokens: 10506731520 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.721983E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.073 | TFLOPs: 11.62 | 7: iteration 20050/ 173500 | consumed samples: 5132800 | consumed tokens: 10511974400 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.713124E+00 | grad norm: 0.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.448 | TFLOPs: 11.60 | 7: iteration 20060/ 173500 | consumed samples: 5135360 | consumed tokens: 10517217280 | elapsed time per iteration (s): 0.09 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.718575E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.134 | TFLOPs: 11.05 | 7: iteration 20070/ 173500 | consumed samples: 5137920 | consumed tokens: 10522460160 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.712479E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.216 | TFLOPs: 11.63 | 7: iteration 20080/ 173500 | consumed samples: 5140480 | consumed tokens: 10527703040 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.707016E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.015 | TFLOPs: 11.87 | 7: iteration 20090/ 173500 | consumed samples: 5143040 | consumed tokens: 10532945920 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.702451E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.953 | TFLOPs: 11.89 | 7: iteration 20100/ 173500 | consumed samples: 5145600 | consumed tokens: 10538188800 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.718839E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.531 | TFLOPs: 11.34 | 7: iteration 20110/ 173500 | consumed samples: 5148160 | consumed tokens: 10543431680 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.704742E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.082 | TFLOPs: 11.87 | 7: iteration 20120/ 173500 | consumed samples: 5150720 | consumed tokens: 10548674560 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.697498E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.105 | TFLOPs: 11.58 | 7: iteration 20130/ 173500 | consumed samples: 5153280 | consumed tokens: 10553917440 | elapsed time per iteration (s): 0.08 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 4.714353E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.025 | TFLOPs: 11.91 | 7: iteration 20140/ 173500 | consumed samples: 5155840 | consumed tokens: 10559160320 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.707233E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.118 | TFLOPs: 11.87 | 7: iteration 20150/ 173500 | consumed samples: 5158400 | consumed tokens: 10564403200 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.715541E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.150 | TFLOPs: 11.88 | 7: iteration 20160/ 173500 | consumed samples: 5160960 | consumed tokens: 10569646080 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.708951E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.126 | TFLOPs: 11.91 | 7: iteration 20170/ 173500 | consumed samples: 5163520 | consumed tokens: 10574888960 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.704675E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.162 | TFLOPs: 11.87 | 7: iteration 20180/ 173500 | consumed samples: 5166080 | consumed tokens: 10580131840 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.711342E+00 | grad norm: 0.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.644 | TFLOPs: 11.89 | 7: iteration 20190/ 173500 | consumed samples: 5168640 | consumed tokens: 10585374720 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.699056E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.112 | TFLOPs: 11.84 | 7: iteration 20200/ 173500 | consumed samples: 5171200 | consumed tokens: 10590617600 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.707146E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.762 | TFLOPs: 11.90 | 7: iteration 20210/ 173500 | consumed samples: 5173760 | consumed tokens: 10595860480 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.707632E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.981 | TFLOPs: 11.89 | 7: iteration 20220/ 173500 | consumed samples: 5176320 | consumed tokens: 10601103360 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.709537E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.060 | TFLOPs: 11.90 | 7: iteration 20230/ 173500 | consumed samples: 5178880 | consumed tokens: 10606346240 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.715169E+00 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.158 | TFLOPs: 11.91 | 7: iteration 20240/ 173500 | consumed samples: 5181440 | consumed tokens: 10611589120 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.707001E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.171 | TFLOPs: 11.83 | 7: iteration 20250/ 173500 | consumed samples: 5184000 | consumed tokens: 10616832000 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.707055E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.992 | TFLOPs: 11.90 | 7: iteration 20260/ 173500 | consumed samples: 5186560 | consumed tokens: 10622074880 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.713669E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.472 | TFLOPs: 11.91 | 7: iteration 20270/ 173500 | consumed samples: 5189120 | consumed tokens: 10627317760 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.715665E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.793 | TFLOPs: 11.62 | 7: iteration 20280/ 173500 | consumed samples: 5191680 | consumed tokens: 10632560640 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.717215E+00 | grad norm: 0.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.324 | TFLOPs: 11.87 | 7: iteration 20290/ 173500 | consumed samples: 5194240 | consumed tokens: 10637803520 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.711791E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.373 | TFLOPs: 11.89 | 7: iteration 20300/ 173500 | consumed samples: 5196800 | consumed tokens: 10643046400 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.703566E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.716 | TFLOPs: 11.60 | 7: iteration 20310/ 173500 | consumed samples: 5199360 | consumed tokens: 10648289280 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.711861E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.457 | TFLOPs: 11.86 | 7: iteration 20320/ 173500 | consumed samples: 5201920 | consumed tokens: 10653532160 | elapsed time per iteration (s): 0.08 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 4.703408E+00 | grad norm: 0.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.166 | TFLOPs: 11.85 | 7: iteration 20330/ 173500 | consumed samples: 5204480 | consumed tokens: 10658775040 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.703171E+00 | grad norm: 0.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.063 | TFLOPs: 11.65 | 7: iteration 20340/ 173500 | consumed samples: 5207040 | consumed tokens: 10664017920 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.704197E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.985 | TFLOPs: 11.89 | 7: iteration 20350/ 173500 | consumed samples: 5209600 | consumed tokens: 10669260800 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.705906E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.740 | TFLOPs: 11.86 | 7: iteration 20360/ 173500 | consumed samples: 5212160 | consumed tokens: 10674503680 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.721093E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.805 | TFLOPs: 11.90 | 7: iteration 20370/ 173500 | consumed samples: 5214720 | consumed tokens: 10679746560 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.705732E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.114 | TFLOPs: 11.84 | 7: iteration 20380/ 173500 | consumed samples: 5217280 | consumed tokens: 10684989440 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.710600E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.456 | TFLOPs: 11.63 | 7: iteration 20390/ 173500 | consumed samples: 5219840 | consumed tokens: 10690232320 | elapsed time per iteration (s): 0.09 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.715769E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.166 | TFLOPs: 11.12 | 7: iteration 20400/ 173500 | consumed samples: 5222400 | consumed tokens: 10695475200 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.714975E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.121 | TFLOPs: 11.64 | 7: iteration 20410/ 173500 | consumed samples: 5224960 | consumed tokens: 10700718080 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.689908E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.651 | TFLOPs: 11.91 | 7: iteration 20420/ 173500 | consumed samples: 5227520 | consumed tokens: 10705960960 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.712589E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.575 | TFLOPs: 11.91 | 7: iteration 20430/ 173500 | consumed samples: 5230080 | consumed tokens: 10711203840 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.704282E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.125 | TFLOPs: 11.93 | 7: iteration 20440/ 173500 | consumed samples: 5232640 | consumed tokens: 10716446720 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.701683E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.312 | TFLOPs: 11.93 | 7: iteration 20450/ 173500 | consumed samples: 5235200 | consumed tokens: 10721689600 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.726881E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.746 | TFLOPs: 11.36 | 7: iteration 20460/ 173500 | consumed samples: 5237760 | consumed tokens: 10726932480 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.706483E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.792 | TFLOPs: 11.79 | 7: iteration 20470/ 173500 | consumed samples: 5240320 | consumed tokens: 10732175360 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.712140E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.875 | TFLOPs: 11.88 | 7: iteration 20480/ 173500 | consumed samples: 5242880 | consumed tokens: 10737418240 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.722890E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.568 | TFLOPs: 11.68 | 7: iteration 20490/ 173500 | consumed samples: 5245440 | consumed tokens: 10742661120 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.712792E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.307 | TFLOPs: 11.66 | 7: iteration 20500/ 173500 | consumed samples: 5248000 | consumed tokens: 10747904000 | elapsed time per iteration (s): 0.08 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 4.704953E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.708 | TFLOPs: 11.93 | 7: iteration 20510/ 173500 | consumed samples: 5250560 | consumed tokens: 10753146880 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.692349E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.947 | TFLOPs: 11.63 | 7: iteration 20520/ 173500 | consumed samples: 5253120 | consumed tokens: 10758389760 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.712751E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.170 | TFLOPs: 11.92 | 7: iteration 20530/ 173500 | consumed samples: 5255680 | consumed tokens: 10763632640 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.701971E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.158 | TFLOPs: 11.96 | 7: iteration 20540/ 173500 | consumed samples: 5258240 | consumed tokens: 10768875520 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.715267E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.321 | TFLOPs: 11.61 | 7: iteration 20550/ 173500 | consumed samples: 5260800 | consumed tokens: 10774118400 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.698649E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.562 | TFLOPs: 11.54 | 7: iteration 20560/ 173500 | consumed samples: 5263360 | consumed tokens: 10779361280 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.712379E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.941 | TFLOPs: 11.66 | 7: iteration 20570/ 173500 | consumed samples: 5265920 | consumed tokens: 10784604160 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.717620E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.786 | TFLOPs: 11.92 | 7: iteration 20580/ 173500 | consumed samples: 5268480 | consumed tokens: 10789847040 | elapsed time per iteration (s): 0.09 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.698139E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.725 | TFLOPs: 10.34 | 7: iteration 20590/ 173500 | consumed samples: 5271040 | consumed tokens: 10795089920 | elapsed time per iteration (s): 0.09 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.700947E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.314 | TFLOPs: 11.07 | 7: iteration 20600/ 173500 | consumed samples: 5273600 | consumed tokens: 10800332800 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.709719E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.978 | TFLOPs: 11.94 | 7: iteration 20610/ 173500 | consumed samples: 5276160 | consumed tokens: 10805575680 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.702108E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.343 | TFLOPs: 11.93 | 7: iteration 20620/ 173500 | consumed samples: 5278720 | consumed tokens: 10810818560 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.706261E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.469 | TFLOPs: 11.41 | 7: iteration 20630/ 173500 | consumed samples: 5281280 | consumed tokens: 10816061440 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.717085E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.597 | TFLOPs: 11.95 | 7: iteration 20640/ 173500 | consumed samples: 5283840 | consumed tokens: 10821304320 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.712524E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.740 | TFLOPs: 11.70 | 7: iteration 20650/ 173500 | consumed samples: 5286400 | consumed tokens: 10826547200 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.705923E+00 | grad norm: 0.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.161 | TFLOPs: 11.94 | 7: iteration 20660/ 173500 | consumed samples: 5288960 | consumed tokens: 10831790080 | elapsed time per iteration (s): 0.08 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.704530E+00 | grad norm: 0.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.478 | TFLOPs: 11.45 | 7: iteration 20670/ 173500 | consumed samples: 5291520 | consumed tokens: 10837032960 | elapsed time per iteration (s): 0.09 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.701311E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2763.312 | TFLOPs: 10.28 | 7: iteration 20680/ 173500 | consumed samples: 5294080 | consumed tokens: 10842275840 | elapsed time per iteration (s): 0.11 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 4.701417E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.047 | TFLOPs: 8.66 | 7: iteration 20690/ 173500 | consumed samples: 5296640 | consumed tokens: 10847518720 | elapsed time per iteration (s): 0.11 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.708809E+00 | grad norm: 0.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2294.211 | TFLOPs: 8.53 | 7: iteration 20700/ 173500 | consumed samples: 5299200 | consumed tokens: 10852761600 | elapsed time per iteration (s): 0.11 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.695345E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.633 | TFLOPs: 8.59 | 7: iteration 20710/ 173500 | consumed samples: 5301760 | consumed tokens: 10858004480 | elapsed time per iteration (s): 0.11 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.702000E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2434.443 | TFLOPs: 9.06 | 7: iteration 20720/ 173500 | consumed samples: 5304320 | consumed tokens: 10863247360 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.711246E+00 | grad norm: 0.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.446 | TFLOPs: 11.88 | 7: iteration 20730/ 173500 | consumed samples: 5306880 | consumed tokens: 10868490240 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.709716E+00 | grad norm: 0.627 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.936 | TFLOPs: 11.67 | 7: iteration 20740/ 173500 | consumed samples: 5309440 | consumed tokens: 10873733120 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.717031E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.063 | TFLOPs: 11.76 | 7: iteration 20750/ 173500 | consumed samples: 5312000 | consumed tokens: 10878976000 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.700680E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.815 | TFLOPs: 11.91 | 7: iteration 20760/ 173500 | consumed samples: 5314560 | consumed tokens: 10884218880 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.711193E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.334 | TFLOPs: 11.94 | 7: iteration 20770/ 173500 | consumed samples: 5317120 | consumed tokens: 10889461760 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.707093E+00 | grad norm: 0.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.541 | TFLOPs: 11.93 | 7: iteration 20780/ 173500 | consumed samples: 5319680 | consumed tokens: 10894704640 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.697118E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.926 | TFLOPs: 11.91 | 7: iteration 20790/ 173500 | consumed samples: 5322240 | consumed tokens: 10899947520 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.702972E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.810 | TFLOPs: 11.86 | 7: iteration 20800/ 173500 | consumed samples: 5324800 | consumed tokens: 10905190400 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.700602E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.201 | TFLOPs: 11.94 | 7: iteration 20810/ 173500 | consumed samples: 5327360 | consumed tokens: 10910433280 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.700066E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.344 | TFLOPs: 11.62 | 7: iteration 20820/ 173500 | consumed samples: 5329920 | consumed tokens: 10915676160 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.724463E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.735 | TFLOPs: 11.95 | 7: iteration 20830/ 173500 | consumed samples: 5332480 | consumed tokens: 10920919040 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.715103E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.076 | TFLOPs: 11.95 | 7: iteration 20840/ 173500 | consumed samples: 5335040 | consumed tokens: 10926161920 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.696115E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.541 | TFLOPs: 11.86 | 7: iteration 20850/ 173500 | consumed samples: 5337600 | consumed tokens: 10931404800 | elapsed time per iteration (s): 0.08 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 4.706945E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.198 | TFLOPs: 11.91 | 7: iteration 20860/ 173500 | consumed samples: 5340160 | consumed tokens: 10936647680 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.716870E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.673 | TFLOPs: 11.88 | 7: iteration 20870/ 173500 | consumed samples: 5342720 | consumed tokens: 10941890560 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.703722E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.152 | TFLOPs: 11.86 | 7: iteration 20880/ 173500 | consumed samples: 5345280 | consumed tokens: 10947133440 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.712536E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.364 | TFLOPs: 11.89 | 7: iteration 20890/ 173500 | consumed samples: 5347840 | consumed tokens: 10952376320 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.701247E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.199 | TFLOPs: 11.90 | 7: iteration 20900/ 173500 | consumed samples: 5350400 | consumed tokens: 10957619200 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.712400E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.121 | TFLOPs: 11.90 | 7: iteration 20910/ 173500 | consumed samples: 5352960 | consumed tokens: 10962862080 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.704432E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.516 | TFLOPs: 11.64 | 7: iteration 20920/ 173500 | consumed samples: 5355520 | consumed tokens: 10968104960 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.700661E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.903 | TFLOPs: 11.96 | 7: iteration 20930/ 173500 | consumed samples: 5358080 | consumed tokens: 10973347840 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.701611E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.802 | TFLOPs: 11.68 | 7: iteration 20940/ 173500 | consumed samples: 5360640 | consumed tokens: 10978590720 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.702517E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.352 | TFLOPs: 11.74 | 7: iteration 20950/ 173500 | consumed samples: 5363200 | consumed tokens: 10983833600 | elapsed time per iteration (s): 0.08 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.703496E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.471 | TFLOPs: 11.44 | 7: iteration 20960/ 173500 | consumed samples: 5365760 | consumed tokens: 10989076480 | elapsed time per iteration (s): 0.09 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.708972E+00 | grad norm: 0.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.230 | TFLOPs: 10.82 | 7: iteration 20970/ 173500 | consumed samples: 5368320 | consumed tokens: 10994319360 | elapsed time per iteration (s): 0.10 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.710882E+00 | grad norm: 0.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.840 | TFLOPs: 9.09 | 7: iteration 20980/ 173500 | consumed samples: 5370880 | consumed tokens: 10999562240 | elapsed time per iteration (s): 0.11 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.686873E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2295.529 | TFLOPs: 8.54 | 7: iteration 20990/ 173500 | consumed samples: 5373440 | consumed tokens: 11004805120 | elapsed time per iteration (s): 0.12 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.696936E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2153.285 | TFLOPs: 8.01 | 7: iteration 21000/ 173500 | consumed samples: 5376000 | consumed tokens: 11010048000 | elapsed time per iteration (s): 0.13 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.705457E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.175 | TFLOPs: 7.30 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 21000 | lm loss value: 4.545269E+00 | lm loss PPL: 9.418581E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 21000 to checkpoints_14m91b100m 0: [2023-03-17 00:47:43,719] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step21000 is begin to save! 0: [2023-03-17 00:47:43,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:47:43,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:47:43,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:47:43,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:47:43,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:47:43,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:47:43,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:47:43,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:47:43,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:47:43,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:47:43,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:47:43,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:47:43,761] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step21000/mp_rank_00_model_states.pt 0: [2023-03-17 00:47:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:47:43,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:47:43,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:47:43,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:47:43,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:47:43,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:47:43,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:47:43,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: successfully saved checkpoint at iteration 21000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.44 7: iteration 21010/ 173500 | consumed samples: 5378560 | consumed tokens: 11015290880 | elapsed time per iteration (s): 0.13 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.707748E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1923.434 | TFLOPs: 7.15 | 7: iteration 21020/ 173500 | consumed samples: 5381120 | consumed tokens: 11020533760 | elapsed time per iteration (s): 0.09 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.694396E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.090 | TFLOPs: 10.13 | 7: iteration 21030/ 173500 | consumed samples: 5383680 | consumed tokens: 11025776640 | elapsed time per iteration (s): 0.09 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 4.707949E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.718 | TFLOPs: 10.56 | 7: iteration 21040/ 173500 | consumed samples: 5386240 | consumed tokens: 11031019520 | elapsed time per iteration (s): 0.10 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.693789E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2451.372 | TFLOPs: 9.12 | 7: iteration 21050/ 173500 | consumed samples: 5388800 | consumed tokens: 11036262400 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.709859E+00 | grad norm: 0.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.043 | TFLOPs: 11.83 | 7: iteration 21060/ 173500 | consumed samples: 5391360 | consumed tokens: 11041505280 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.713953E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.147 | TFLOPs: 11.91 | 7: iteration 21070/ 173500 | consumed samples: 5393920 | consumed tokens: 11046748160 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.685855E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.539 | TFLOPs: 11.92 | 7: iteration 21080/ 173500 | consumed samples: 5396480 | consumed tokens: 11051991040 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.703350E+00 | grad norm: 0.590 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.710 | TFLOPs: 11.66 | 7: iteration 21090/ 173500 | consumed samples: 5399040 | consumed tokens: 11057233920 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.705831E+00 | grad norm: 0.609 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.597 | TFLOPs: 11.93 | 7: iteration 21100/ 173500 | consumed samples: 5401600 | consumed tokens: 11062476800 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.713229E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.648 | TFLOPs: 11.66 | 7: iteration 21110/ 173500 | consumed samples: 5404160 | consumed tokens: 11067719680 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.701587E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.306 | TFLOPs: 11.88 | 7: iteration 21120/ 173500 | consumed samples: 5406720 | consumed tokens: 11072962560 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.700321E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.967 | TFLOPs: 11.87 | 7: iteration 21130/ 173500 | consumed samples: 5409280 | consumed tokens: 11078205440 | elapsed time per iteration (s): 0.09 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.695766E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.699 | TFLOPs: 10.56 | 7: iteration 21140/ 173500 | consumed samples: 5411840 | consumed tokens: 11083448320 | elapsed time per iteration (s): 0.10 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.714329E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.414 | TFLOPs: 10.00 | 7: iteration 21150/ 173500 | consumed samples: 5414400 | consumed tokens: 11088691200 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.711323E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.993 | TFLOPs: 11.92 | 7: iteration 21160/ 173500 | consumed samples: 5416960 | consumed tokens: 11093934080 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.698179E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.117 | TFLOPs: 11.90 | 7: iteration 21170/ 173500 | consumed samples: 5419520 | consumed tokens: 11099176960 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.694116E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.941 | TFLOPs: 11.89 | 7: iteration 21180/ 173500 | consumed samples: 5422080 | consumed tokens: 11104419840 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.697812E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.666 | TFLOPs: 11.91 | 7: iteration 21190/ 173500 | consumed samples: 5424640 | consumed tokens: 11109662720 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.706325E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.382 | TFLOPs: 11.90 | 7: iteration 21200/ 173500 | consumed samples: 5427200 | consumed tokens: 11114905600 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.713233E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.844 | TFLOPs: 11.91 | 7: iteration 21210/ 173500 | consumed samples: 5429760 | consumed tokens: 11120148480 | elapsed time per iteration (s): 0.08 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 4.711041E+00 | grad norm: 0.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.254 | TFLOPs: 11.93 | 7: iteration 21220/ 173500 | consumed samples: 5432320 | consumed tokens: 11125391360 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.705686E+00 | grad norm: 0.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.421 | TFLOPs: 11.85 | 7: iteration 21230/ 173500 | consumed samples: 5434880 | consumed tokens: 11130634240 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.698251E+00 | grad norm: 0.629 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.060 | TFLOPs: 11.93 | 7: iteration 21240/ 173500 | consumed samples: 5437440 | consumed tokens: 11135877120 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.705707E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.525 | TFLOPs: 11.92 | 7: iteration 21250/ 173500 | consumed samples: 5440000 | consumed tokens: 11141120000 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.697418E+00 | grad norm: 0.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.467 | TFLOPs: 11.96 | 7: iteration 21260/ 173500 | consumed samples: 5442560 | consumed tokens: 11146362880 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.719920E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.387 | TFLOPs: 11.94 | 7: iteration 21270/ 173500 | consumed samples: 5445120 | consumed tokens: 11151605760 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.693552E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.107 | TFLOPs: 11.96 | 7: iteration 21280/ 173500 | consumed samples: 5447680 | consumed tokens: 11156848640 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.689389E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.470 | TFLOPs: 11.91 | 7: iteration 21290/ 173500 | consumed samples: 5450240 | consumed tokens: 11162091520 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.700935E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.891 | TFLOPs: 11.40 | 7: iteration 21300/ 173500 | consumed samples: 5452800 | consumed tokens: 11167334400 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.697652E+00 | grad norm: 0.591 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.977 | TFLOPs: 11.56 | 7: iteration 21310/ 173500 | consumed samples: 5455360 | consumed tokens: 11172577280 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.697171E+00 | grad norm: 0.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.398 | TFLOPs: 11.36 | 7: iteration 21320/ 173500 | consumed samples: 5457920 | consumed tokens: 11177820160 | elapsed time per iteration (s): 0.09 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.706628E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.289 | TFLOPs: 10.87 | 7: iteration 21330/ 173500 | consumed samples: 5460480 | consumed tokens: 11183063040 | elapsed time per iteration (s): 0.09 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.692694E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.659 | TFLOPs: 10.48 | 7: iteration 21340/ 173500 | consumed samples: 5463040 | consumed tokens: 11188305920 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.684430E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.458 | TFLOPs: 11.35 | 7: iteration 21350/ 173500 | consumed samples: 5465600 | consumed tokens: 11193548800 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.700734E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.383 | TFLOPs: 11.79 | 7: iteration 21360/ 173500 | consumed samples: 5468160 | consumed tokens: 11198791680 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.699038E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.750 | TFLOPs: 11.68 | 7: iteration 21370/ 173500 | consumed samples: 5470720 | consumed tokens: 11204034560 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.693080E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.624 | TFLOPs: 11.87 | 7: iteration 21380/ 173500 | consumed samples: 5473280 | consumed tokens: 11209277440 | elapsed time per iteration (s): 0.08 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 4.701479E+00 | grad norm: 0.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.985 | TFLOPs: 11.75 | 7: iteration 21390/ 173500 | consumed samples: 5475840 | consumed tokens: 11214520320 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.690703E+00 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.671 | TFLOPs: 11.58 | 7: iteration 21400/ 173500 | consumed samples: 5478400 | consumed tokens: 11219763200 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.686004E+00 | grad norm: 0.560 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.306 | TFLOPs: 11.32 | 7: iteration 21410/ 173500 | consumed samples: 5480960 | consumed tokens: 11225006080 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.701921E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.886 | TFLOPs: 11.75 | 7: iteration 21420/ 173500 | consumed samples: 5483520 | consumed tokens: 11230248960 | elapsed time per iteration (s): 0.09 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.692120E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.360 | TFLOPs: 11.11 | 7: iteration 21430/ 173500 | consumed samples: 5486080 | consumed tokens: 11235491840 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.721097E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.446 | TFLOPs: 11.60 | 7: iteration 21440/ 173500 | consumed samples: 5488640 | consumed tokens: 11240734720 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.685907E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.118 | TFLOPs: 11.80 | 7: iteration 21450/ 173500 | consumed samples: 5491200 | consumed tokens: 11245977600 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.694576E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.846 | TFLOPs: 11.85 | 7: iteration 21460/ 173500 | consumed samples: 5493760 | consumed tokens: 11251220480 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.710597E+00 | grad norm: 0.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.734 | TFLOPs: 11.85 | 7: iteration 21470/ 173500 | consumed samples: 5496320 | consumed tokens: 11256463360 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.693431E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.076 | TFLOPs: 11.86 | 7: iteration 21480/ 173500 | consumed samples: 5498880 | consumed tokens: 11261706240 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.697832E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.571 | TFLOPs: 11.85 | 7: iteration 21490/ 173500 | consumed samples: 5501440 | consumed tokens: 11266949120 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.703061E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.922 | TFLOPs: 11.80 | 7: iteration 21500/ 173500 | consumed samples: 5504000 | consumed tokens: 11272192000 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.713369E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.746 | TFLOPs: 11.83 | 7: iteration 21510/ 173500 | consumed samples: 5506560 | consumed tokens: 11277434880 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.688903E+00 | grad norm: 0.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.721 | TFLOPs: 11.85 | 7: iteration 21520/ 173500 | consumed samples: 5509120 | consumed tokens: 11282677760 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.698019E+00 | grad norm: 0.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.783 | TFLOPs: 11.58 | 7: iteration 21530/ 173500 | consumed samples: 5511680 | consumed tokens: 11287920640 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.706891E+00 | grad norm: 0.624 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.917 | TFLOPs: 11.80 | 7: iteration 21540/ 173500 | consumed samples: 5514240 | consumed tokens: 11293163520 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.697640E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.206 | TFLOPs: 11.80 | 7: iteration 21550/ 173500 | consumed samples: 5516800 | consumed tokens: 11298406400 | elapsed time per iteration (s): 0.08 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 4.699100E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.514 | TFLOPs: 11.89 | 7: iteration 21560/ 173500 | consumed samples: 5519360 | consumed tokens: 11303649280 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.702204E+00 | grad norm: 0.589 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.979 | TFLOPs: 11.62 | 7: iteration 21570/ 173500 | consumed samples: 5521920 | consumed tokens: 11308892160 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.696321E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.285 | TFLOPs: 11.81 | 7: iteration 21580/ 173500 | consumed samples: 5524480 | consumed tokens: 11314135040 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.695878E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.252 | TFLOPs: 11.64 | 7: iteration 21590/ 173500 | consumed samples: 5527040 | consumed tokens: 11319377920 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.709753E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.897 | TFLOPs: 11.81 | 7: iteration 21600/ 173500 | consumed samples: 5529600 | consumed tokens: 11324620800 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.712056E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.696 | TFLOPs: 11.57 | 7: iteration 21610/ 173500 | consumed samples: 5532160 | consumed tokens: 11329863680 | elapsed time per iteration (s): 0.09 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.704057E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.291 | TFLOPs: 10.86 | 7: iteration 21620/ 173500 | consumed samples: 5534720 | consumed tokens: 11335106560 | elapsed time per iteration (s): 0.09 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.703399E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.370 | TFLOPs: 11.15 | 7: iteration 21630/ 173500 | consumed samples: 5537280 | consumed tokens: 11340349440 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.692267E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.182 | TFLOPs: 11.64 | 7: iteration 21640/ 173500 | consumed samples: 5539840 | consumed tokens: 11345592320 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.694082E+00 | grad norm: 0.614 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.949 | TFLOPs: 11.92 | 7: iteration 21650/ 173500 | consumed samples: 5542400 | consumed tokens: 11350835200 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.689325E+00 | grad norm: 0.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.551 | TFLOPs: 11.64 | 7: iteration 21660/ 173500 | consumed samples: 5544960 | consumed tokens: 11356078080 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.690406E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.175 | TFLOPs: 11.82 | 7: iteration 21670/ 173500 | consumed samples: 5547520 | consumed tokens: 11361320960 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.695192E+00 | grad norm: 0.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.863 | TFLOPs: 11.88 | 7: iteration 21680/ 173500 | consumed samples: 5550080 | consumed tokens: 11366563840 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.697787E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.330 | TFLOPs: 11.89 | 7: iteration 21690/ 173500 | consumed samples: 5552640 | consumed tokens: 11371806720 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.696881E+00 | grad norm: 0.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.315 | TFLOPs: 11.85 | 7: iteration 21700/ 173500 | consumed samples: 5555200 | consumed tokens: 11377049600 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.708855E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.995 | TFLOPs: 11.62 | 7: iteration 21710/ 173500 | consumed samples: 5557760 | consumed tokens: 11382292480 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.701500E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.192 | TFLOPs: 11.63 | 7: iteration 21720/ 173500 | consumed samples: 5560320 | consumed tokens: 11387535360 | elapsed time per iteration (s): 0.08 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 4.688245E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.880 | TFLOPs: 11.77 | 7: iteration 21730/ 173500 | consumed samples: 5562880 | consumed tokens: 11392778240 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.701943E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.746 | TFLOPs: 11.82 | 7: iteration 21740/ 173500 | consumed samples: 5565440 | consumed tokens: 11398021120 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.693524E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.547 | TFLOPs: 11.88 | 7: iteration 21750/ 173500 | consumed samples: 5568000 | consumed tokens: 11403264000 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.705116E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.323 | TFLOPs: 11.61 | 7: iteration 21760/ 173500 | consumed samples: 5570560 | consumed tokens: 11408506880 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.702061E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.277 | TFLOPs: 11.88 | 7: iteration 21770/ 173500 | consumed samples: 5573120 | consumed tokens: 11413749760 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.699155E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.593 | TFLOPs: 11.88 | 7: iteration 21780/ 173500 | consumed samples: 5575680 | consumed tokens: 11418992640 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.691741E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.687 | TFLOPs: 11.87 | 7: iteration 21790/ 173500 | consumed samples: 5578240 | consumed tokens: 11424235520 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.707852E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.816 | TFLOPs: 11.62 | 7: iteration 21800/ 173500 | consumed samples: 5580800 | consumed tokens: 11429478400 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.701977E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.590 | TFLOPs: 11.62 | 7: iteration 21810/ 173500 | consumed samples: 5583360 | consumed tokens: 11434721280 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.699748E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.781 | TFLOPs: 11.80 | 7: iteration 21820/ 173500 | consumed samples: 5585920 | consumed tokens: 11439964160 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.705474E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.449 | TFLOPs: 11.92 | 7: iteration 21830/ 173500 | consumed samples: 5588480 | consumed tokens: 11445207040 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.694131E+00 | grad norm: 0.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.115 | TFLOPs: 11.65 | 7: iteration 21840/ 173500 | consumed samples: 5591040 | consumed tokens: 11450449920 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.697192E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.295 | TFLOPs: 11.69 | 7: iteration 21850/ 173500 | consumed samples: 5593600 | consumed tokens: 11455692800 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.699420E+00 | grad norm: 0.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.220 | TFLOPs: 11.98 | 7: iteration 21860/ 173500 | consumed samples: 5596160 | consumed tokens: 11460935680 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.701533E+00 | grad norm: 0.545 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.013 | TFLOPs: 11.91 | 7: iteration 21870/ 173500 | consumed samples: 5598720 | consumed tokens: 11466178560 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.695054E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.221 | TFLOPs: 11.95 | 7: iteration 21880/ 173500 | consumed samples: 5601280 | consumed tokens: 11471421440 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.713715E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.986 | TFLOPs: 12.04 | 7: iteration 21890/ 173500 | consumed samples: 5603840 | consumed tokens: 11476664320 | elapsed time per iteration (s): 0.08 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 4.688857E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.611 | TFLOPs: 11.76 | 7: iteration 21900/ 173500 | consumed samples: 5606400 | consumed tokens: 11481907200 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.710825E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.532 | TFLOPs: 12.06 | 7: iteration 21910/ 173500 | consumed samples: 5608960 | consumed tokens: 11487150080 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.687661E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.368 | TFLOPs: 11.32 | 7: iteration 21920/ 173500 | consumed samples: 5611520 | consumed tokens: 11492392960 | elapsed time per iteration (s): 0.12 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.710506E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.470 | TFLOPs: 7.83 | 7: iteration 21930/ 173500 | consumed samples: 5614080 | consumed tokens: 11497635840 | elapsed time per iteration (s): 0.10 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.687611E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2690.473 | TFLOPs: 10.01 | 7: iteration 21940/ 173500 | consumed samples: 5616640 | consumed tokens: 11502878720 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.688797E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.055 | TFLOPs: 11.90 | 7: iteration 21950/ 173500 | consumed samples: 5619200 | consumed tokens: 11508121600 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.708250E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.646 | TFLOPs: 11.67 | 7: iteration 21960/ 173500 | consumed samples: 5621760 | consumed tokens: 11513364480 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.699654E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.054 | TFLOPs: 11.98 | 7: iteration 21970/ 173500 | consumed samples: 5624320 | consumed tokens: 11518607360 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.699137E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.436 | TFLOPs: 11.99 | 7: iteration 21980/ 173500 | consumed samples: 5626880 | consumed tokens: 11523850240 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.685362E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.743 | TFLOPs: 11.68 | 7: iteration 21990/ 173500 | consumed samples: 5629440 | consumed tokens: 11529093120 | elapsed time per iteration (s): 0.09 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.700338E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.177 | TFLOPs: 10.05 | 0: [2023-03-17 00:49:06,743] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=0, lr=[0.00019388839136370641, 0.00019388839136370641, 0.00019388839136370641], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 22000/ 173500 | consumed samples: 5632000 | consumed tokens: 11534336000 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.703012E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.254 | TFLOPs: 11.61 | 0: steps: 22000 loss: 4.7019 iter time (s): 0.082 samples/sec: 3116.898 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 22000 | lm loss value: 4.539511E+00 | lm loss PPL: 9.364497E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 22000 to checkpoints_14m91b100m 0: [2023-03-17 00:49:06,800] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step22000 is begin to save! 0: [2023-03-17 00:49:06,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:49:06,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:49:06,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:49:06,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:49:06,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:49:06,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:49:06,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:49:06,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:49:06,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:49:06,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:49:06,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:49:06,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:49:06,842] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step22000/mp_rank_00_model_states.pt 0: [2023-03-17 00:49:06,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:49:06,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:49:06,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:49:06,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2023-03-17 00:49:06,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2023-03-17 00:49:06,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:49:06,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:49:06,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: successfully saved checkpoint at iteration 22000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.11 7: iteration 22010/ 173500 | consumed samples: 5634560 | consumed tokens: 11539578880 | elapsed time per iteration (s): 0.10 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.701388E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2646.222 | TFLOPs: 9.84 | 7: iteration 22020/ 173500 | consumed samples: 5637120 | consumed tokens: 11544821760 | elapsed time per iteration (s): 0.09 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.696783E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.965 | TFLOPs: 10.65 | 7: iteration 22030/ 173500 | consumed samples: 5639680 | consumed tokens: 11550064640 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.689906E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.625 | TFLOPs: 11.65 | 7: iteration 22040/ 173500 | consumed samples: 5642240 | consumed tokens: 11555307520 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.689086E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.388 | TFLOPs: 11.38 | 7: iteration 22050/ 173500 | consumed samples: 5644800 | consumed tokens: 11560550400 | elapsed time per iteration (s): 0.09 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.692966E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.286 | TFLOPs: 11.14 | 7: iteration 22060/ 173500 | consumed samples: 5647360 | consumed tokens: 11565793280 | elapsed time per iteration (s): 0.08 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 4.696482E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.808 | TFLOPs: 11.40 | 7: iteration 22070/ 173500 | consumed samples: 5649920 | consumed tokens: 11571036160 | elapsed time per iteration (s): 0.09 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.695731E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.369 | TFLOPs: 10.72 | 7: iteration 22080/ 173500 | consumed samples: 5652480 | consumed tokens: 11576279040 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.698385E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.061 | TFLOPs: 11.36 | 7: iteration 22090/ 173500 | consumed samples: 5655040 | consumed tokens: 11581521920 | elapsed time per iteration (s): 0.09 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.689949E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2863.391 | TFLOPs: 10.65 | 7: iteration 22100/ 173500 | consumed samples: 5657600 | consumed tokens: 11586764800 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.697739E+00 | grad norm: 0.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.691 | TFLOPs: 11.26 | 7: iteration 22110/ 173500 | consumed samples: 5660160 | consumed tokens: 11592007680 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.702866E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.484 | TFLOPs: 11.92 | 7: iteration 22120/ 173500 | consumed samples: 5662720 | consumed tokens: 11597250560 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.697538E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.509 | TFLOPs: 11.40 | 7: iteration 22130/ 173500 | consumed samples: 5665280 | consumed tokens: 11602493440 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.683256E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.194 | TFLOPs: 11.86 | 7: iteration 22140/ 173500 | consumed samples: 5667840 | consumed tokens: 11607736320 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.681606E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.142 | TFLOPs: 11.81 | 7: iteration 22150/ 173500 | consumed samples: 5670400 | consumed tokens: 11612979200 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.692355E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.770 | TFLOPs: 11.61 | 7: iteration 22160/ 173500 | consumed samples: 5672960 | consumed tokens: 11618222080 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.700533E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.455 | TFLOPs: 11.32 | 7: iteration 22170/ 173500 | consumed samples: 5675520 | consumed tokens: 11623464960 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.699520E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.376 | TFLOPs: 11.26 | 7: iteration 22180/ 173500 | consumed samples: 5678080 | consumed tokens: 11628707840 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.688423E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.569 | TFLOPs: 11.79 | 7: iteration 22190/ 173500 | consumed samples: 5680640 | consumed tokens: 11633950720 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.691318E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.635 | TFLOPs: 11.80 | 7: iteration 22200/ 173500 | consumed samples: 5683200 | consumed tokens: 11639193600 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.689237E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.108 | TFLOPs: 11.65 | 7: iteration 22210/ 173500 | consumed samples: 5685760 | consumed tokens: 11644436480 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.694617E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.280 | TFLOPs: 11.54 | 7: iteration 22220/ 173500 | consumed samples: 5688320 | consumed tokens: 11649679360 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.691834E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.305 | TFLOPs: 11.83 | 7: iteration 22230/ 173500 | consumed samples: 5690880 | consumed tokens: 11654922240 | elapsed time per iteration (s): 0.08 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 4.699838E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.537 | TFLOPs: 11.82 | 7: iteration 22240/ 173500 | consumed samples: 5693440 | consumed tokens: 11660165120 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.690260E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.348 | TFLOPs: 11.79 | 7: iteration 22250/ 173500 | consumed samples: 5696000 | consumed tokens: 11665408000 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.695849E+00 | grad norm: 0.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.246 | TFLOPs: 11.53 | 7: iteration 22260/ 173500 | consumed samples: 5698560 | consumed tokens: 11670650880 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.694176E+00 | grad norm: 0.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.452 | TFLOPs: 11.51 | 7: iteration 22270/ 173500 | consumed samples: 5701120 | consumed tokens: 11675893760 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.690789E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.983 | TFLOPs: 11.80 | 7: iteration 22280/ 173500 | consumed samples: 5703680 | consumed tokens: 11681136640 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.704097E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.811 | TFLOPs: 11.50 | 7: iteration 22290/ 173500 | consumed samples: 5706240 | consumed tokens: 11686379520 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.693839E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.294 | TFLOPs: 11.76 | 7: iteration 22300/ 173500 | consumed samples: 5708800 | consumed tokens: 11691622400 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.703645E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.333 | TFLOPs: 11.70 | 7: iteration 22310/ 173500 | consumed samples: 5711360 | consumed tokens: 11696865280 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.687376E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.031 | TFLOPs: 11.78 | 7: iteration 22320/ 173500 | consumed samples: 5713920 | consumed tokens: 11702108160 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.689048E+00 | grad norm: 0.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.024 | TFLOPs: 11.76 | 7: iteration 22330/ 173500 | consumed samples: 5716480 | consumed tokens: 11707351040 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.695298E+00 | grad norm: 0.574 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.590 | TFLOPs: 11.76 | 7: iteration 22340/ 173500 | consumed samples: 5719040 | consumed tokens: 11712593920 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.692145E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.876 | TFLOPs: 11.80 | 7: iteration 22350/ 173500 | consumed samples: 5721600 | consumed tokens: 11717836800 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.689337E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.417 | TFLOPs: 11.82 | 7: iteration 22360/ 173500 | consumed samples: 5724160 | consumed tokens: 11723079680 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.692851E+00 | grad norm: 0.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.972 | TFLOPs: 11.55 | 7: iteration 22370/ 173500 | consumed samples: 5726720 | consumed tokens: 11728322560 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.694575E+00 | grad norm: 0.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.123 | TFLOPs: 11.82 | 7: iteration 22380/ 173500 | consumed samples: 5729280 | consumed tokens: 11733565440 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.691141E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.298 | TFLOPs: 11.53 | 7: iteration 22390/ 173500 | consumed samples: 5731840 | consumed tokens: 11738808320 | elapsed time per iteration (s): 0.08 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 4.691392E+00 | grad norm: 0.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.842 | TFLOPs: 11.80 | 7: iteration 22400/ 173500 | consumed samples: 5734400 | consumed tokens: 11744051200 | elapsed time per iteration (s): 0.12 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.688044E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2183.813 | TFLOPs: 8.12 | 7: iteration 22410/ 173500 | consumed samples: 5736960 | consumed tokens: 11749294080 | elapsed time per iteration (s): 0.13 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.694530E+00 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.580 | TFLOPs: 7.39 | 7: iteration 22420/ 173500 | consumed samples: 5739520 | consumed tokens: 11754536960 | elapsed time per iteration (s): 0.13 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.695910E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.841 | TFLOPs: 7.42 | 7: iteration 22430/ 173500 | consumed samples: 5742080 | consumed tokens: 11759779840 | elapsed time per iteration (s): 0.13 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.681590E+00 | grad norm: 0.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1982.647 | TFLOPs: 7.37 | 7: iteration 22440/ 173500 | consumed samples: 5744640 | consumed tokens: 11765022720 | elapsed time per iteration (s): 0.10 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.694645E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.106 | TFLOPs: 9.24 | 7: iteration 22450/ 173500 | consumed samples: 5747200 | consumed tokens: 11770265600 | elapsed time per iteration (s): 0.10 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.687989E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2482.801 | TFLOPs: 9.23 | 7: iteration 22460/ 173500 | consumed samples: 5749760 | consumed tokens: 11775508480 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.678838E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.629 | TFLOPs: 11.67 | 7: iteration 22470/ 173500 | consumed samples: 5752320 | consumed tokens: 11780751360 | elapsed time per iteration (s): 0.09 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.692856E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.053 | TFLOPs: 10.75 | 7: iteration 22480/ 173500 | consumed samples: 5754880 | consumed tokens: 11785994240 | elapsed time per iteration (s): 0.09 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.690940E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.650 | TFLOPs: 10.78 | 7: iteration 22490/ 173500 | consumed samples: 5757440 | consumed tokens: 11791237120 | elapsed time per iteration (s): 0.10 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.684801E+00 | grad norm: 0.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.384 | TFLOPs: 9.65 | 7: iteration 22500/ 173500 | consumed samples: 5760000 | consumed tokens: 11796480000 | elapsed time per iteration (s): 0.10 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.686497E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.068 | TFLOPs: 9.94 | 7: iteration 22510/ 173500 | consumed samples: 5762560 | consumed tokens: 11801722880 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.683488E+00 | grad norm: 0.554 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.538 | TFLOPs: 12.05 | 7: iteration 22520/ 173500 | consumed samples: 5765120 | consumed tokens: 11806965760 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.688203E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.453 | TFLOPs: 12.02 | 7: iteration 22530/ 173500 | consumed samples: 5767680 | consumed tokens: 11812208640 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.684781E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.420 | TFLOPs: 12.02 | 7: iteration 22540/ 173500 | consumed samples: 5770240 | consumed tokens: 11817451520 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.685184E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.723 | TFLOPs: 12.04 | 7: iteration 22550/ 173500 | consumed samples: 5772800 | consumed tokens: 11822694400 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.692770E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.120 | TFLOPs: 12.03 | 7: iteration 22560/ 173500 | consumed samples: 5775360 | consumed tokens: 11827937280 | elapsed time per iteration (s): 0.08 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 4.698379E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.959 | TFLOPs: 12.00 | 7: iteration 22570/ 173500 | consumed samples: 5777920 | consumed tokens: 11833180160 | elapsed time per iteration (s): 0.10 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.679000E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.157 | TFLOPs: 9.23 | 7: iteration 22580/ 173500 | consumed samples: 5780480 | consumed tokens: 11838423040 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.696267E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.420 | TFLOPs: 11.72 | 7: iteration 22590/ 173500 | consumed samples: 5783040 | consumed tokens: 11843665920 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.694371E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.636 | TFLOPs: 11.99 | 7: iteration 22600/ 173500 | consumed samples: 5785600 | consumed tokens: 11848908800 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.706781E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.329 | TFLOPs: 12.02 | 7: iteration 22610/ 173500 | consumed samples: 5788160 | consumed tokens: 11854151680 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.689380E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.064 | TFLOPs: 12.03 | 7: iteration 22620/ 173500 | consumed samples: 5790720 | consumed tokens: 11859394560 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.678162E+00 | grad norm: 0.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.943 | TFLOPs: 12.01 | 7: iteration 22630/ 173500 | consumed samples: 5793280 | consumed tokens: 11864637440 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.693370E+00 | grad norm: 0.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.019 | TFLOPs: 12.00 | 7: iteration 22640/ 173500 | consumed samples: 5795840 | consumed tokens: 11869880320 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.691401E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.914 | TFLOPs: 12.05 | 7: iteration 22650/ 173500 | consumed samples: 5798400 | consumed tokens: 11875123200 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.689948E+00 | grad norm: 0.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.437 | TFLOPs: 11.99 | 7: iteration 22660/ 173500 | consumed samples: 5800960 | consumed tokens: 11880366080 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.693922E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.380 | TFLOPs: 11.99 | 7: iteration 22670/ 173500 | consumed samples: 5803520 | consumed tokens: 11885608960 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.691718E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.743 | TFLOPs: 11.99 | 7: iteration 22680/ 173500 | consumed samples: 5806080 | consumed tokens: 11890851840 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.691792E+00 | grad norm: 0.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.706 | TFLOPs: 11.98 | 7: iteration 22690/ 173500 | consumed samples: 5808640 | consumed tokens: 11896094720 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.689512E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.686 | TFLOPs: 12.03 | 7: iteration 22700/ 173500 | consumed samples: 5811200 | consumed tokens: 11901337600 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.694614E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.801 | TFLOPs: 11.97 | 7: iteration 22710/ 173500 | consumed samples: 5813760 | consumed tokens: 11906580480 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.681933E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.545 | TFLOPs: 12.01 | 7: iteration 22720/ 173500 | consumed samples: 5816320 | consumed tokens: 11911823360 | elapsed time per iteration (s): 0.08 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 4.688341E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.450 | TFLOPs: 12.04 | 7: iteration 22730/ 173500 | consumed samples: 5818880 | consumed tokens: 11917066240 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.700804E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.617 | TFLOPs: 12.00 | 7: iteration 22740/ 173500 | consumed samples: 5821440 | consumed tokens: 11922309120 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.683407E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.573 | TFLOPs: 12.01 | 7: iteration 22750/ 173500 | consumed samples: 5824000 | consumed tokens: 11927552000 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.678583E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.754 | TFLOPs: 12.03 | 7: iteration 22760/ 173500 | consumed samples: 5826560 | consumed tokens: 11932794880 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.696880E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.321 | TFLOPs: 12.01 | 7: iteration 22770/ 173500 | consumed samples: 5829120 | consumed tokens: 11938037760 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.686570E+00 | grad norm: 0.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.635 | TFLOPs: 11.98 | 7: iteration 22780/ 173500 | consumed samples: 5831680 | consumed tokens: 11943280640 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.698756E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.203 | TFLOPs: 11.98 | 7: iteration 22790/ 173500 | consumed samples: 5834240 | consumed tokens: 11948523520 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.684553E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.574 | TFLOPs: 11.90 | 7: iteration 22800/ 173500 | consumed samples: 5836800 | consumed tokens: 11953766400 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.699065E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.870 | TFLOPs: 11.91 | 7: iteration 22810/ 173500 | consumed samples: 5839360 | consumed tokens: 11959009280 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.689399E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.045 | TFLOPs: 11.92 | 7: iteration 22820/ 173500 | consumed samples: 5841920 | consumed tokens: 11964252160 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.669765E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.482 | TFLOPs: 11.89 | 7: iteration 22830/ 173500 | consumed samples: 5844480 | consumed tokens: 11969495040 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.678721E+00 | grad norm: 0.596 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.818 | TFLOPs: 11.91 | 7: iteration 22840/ 173500 | consumed samples: 5847040 | consumed tokens: 11974737920 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.681405E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.670 | TFLOPs: 11.73 | 7: iteration 22850/ 173500 | consumed samples: 5849600 | consumed tokens: 11979980800 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.688855E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.288 | TFLOPs: 11.92 | 7: iteration 22860/ 173500 | consumed samples: 5852160 | consumed tokens: 11985223680 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.680146E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.470 | TFLOPs: 11.85 | 7: iteration 22870/ 173500 | consumed samples: 5854720 | consumed tokens: 11990466560 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.683443E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.362 | TFLOPs: 11.97 | 7: iteration 22880/ 173500 | consumed samples: 5857280 | consumed tokens: 11995709440 | elapsed time per iteration (s): 0.08 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 4.683551E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.339 | TFLOPs: 12.02 | 7: iteration 22890/ 173500 | consumed samples: 5859840 | consumed tokens: 12000952320 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.694546E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.473 | TFLOPs: 12.06 | 7: iteration 22900/ 173500 | consumed samples: 5862400 | consumed tokens: 12006195200 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.681214E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.008 | TFLOPs: 12.00 | 7: iteration 22910/ 173500 | consumed samples: 5864960 | consumed tokens: 12011438080 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.685069E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.420 | TFLOPs: 12.00 | 7: iteration 22920/ 173500 | consumed samples: 5867520 | consumed tokens: 12016680960 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.692673E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.303 | TFLOPs: 11.99 | 7: iteration 22930/ 173500 | consumed samples: 5870080 | consumed tokens: 12021923840 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.695554E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.787 | TFLOPs: 12.01 | 7: iteration 22940/ 173500 | consumed samples: 5872640 | consumed tokens: 12027166720 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.682399E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.481 | TFLOPs: 12.01 | 7: iteration 22950/ 173500 | consumed samples: 5875200 | consumed tokens: 12032409600 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.676488E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.337 | TFLOPs: 11.99 | 7: iteration 22960/ 173500 | consumed samples: 5877760 | consumed tokens: 12037652480 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.684164E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.799 | TFLOPs: 11.97 | 7: iteration 22970/ 173500 | consumed samples: 5880320 | consumed tokens: 12042895360 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.685088E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.856 | TFLOPs: 12.01 | 7: iteration 22980/ 173500 | consumed samples: 5882880 | consumed tokens: 12048138240 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.685190E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.557 | TFLOPs: 11.96 | 7: iteration 22990/ 173500 | consumed samples: 5885440 | consumed tokens: 12053381120 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.684970E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.063 | TFLOPs: 11.96 | 7: iteration 23000/ 173500 | consumed samples: 5888000 | consumed tokens: 12058624000 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.693687E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.358 | TFLOPs: 11.95 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 23000 | lm loss value: 4.532762E+00 | lm loss PPL: 9.301507E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 23000 to checkpoints_14m91b100m 0: [2023-03-17 00:50:30,712] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step23000 is begin to save! 0: [2023-03-17 00:50:30,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:50:30,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:50:30,740] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:50:30,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:50:30,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:50:30,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:50:30,747] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:50:30,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:50:30,750] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:50:30,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:50:30,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:50:30,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:50:30,754] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step23000/mp_rank_00_model_states.pt 0: [2023-03-17 00:50:30,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:50:30,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:50:30,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:50:30,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:50:30,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,776] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,776] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2023-03-17 00:50:30,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:50:30,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:50:30,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: successfully saved checkpoint at iteration 23000 to checkpoints_14m91b100m 7: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:50:30,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 00:50:30,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: time (ms) | save-checkpoint: 78.12 7: iteration 23010/ 173500 | consumed samples: 5890560 | consumed tokens: 12063866880 | elapsed time per iteration (s): 0.09 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.677952E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.217 | TFLOPs: 10.44 | 7: iteration 23020/ 173500 | consumed samples: 5893120 | consumed tokens: 12069109760 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.678005E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.777 | TFLOPs: 11.73 | 7: iteration 23030/ 173500 | consumed samples: 5895680 | consumed tokens: 12074352640 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.692624E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.487 | TFLOPs: 11.78 | 7: iteration 23040/ 173500 | consumed samples: 5898240 | consumed tokens: 12079595520 | elapsed time per iteration (s): 0.08 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 4.680340E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.536 | TFLOPs: 11.85 | 7: iteration 23050/ 173500 | consumed samples: 5900800 | consumed tokens: 12084838400 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.683969E+00 | grad norm: 0.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.546 | TFLOPs: 11.88 | 7: iteration 23060/ 173500 | consumed samples: 5903360 | consumed tokens: 12090081280 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.669327E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.328 | TFLOPs: 11.87 | 7: iteration 23070/ 173500 | consumed samples: 5905920 | consumed tokens: 12095324160 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.689570E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.934 | TFLOPs: 11.91 | 7: iteration 23080/ 173500 | consumed samples: 5908480 | consumed tokens: 12100567040 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.685689E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.276 | TFLOPs: 11.65 | 7: iteration 23090/ 173500 | consumed samples: 5911040 | consumed tokens: 12105809920 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.684170E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.056 | TFLOPs: 11.94 | 7: iteration 23100/ 173500 | consumed samples: 5913600 | consumed tokens: 12111052800 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.659352E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.849 | TFLOPs: 11.64 | 7: iteration 23110/ 173500 | consumed samples: 5916160 | consumed tokens: 12116295680 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.677740E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.392 | TFLOPs: 11.57 | 7: iteration 23120/ 173500 | consumed samples: 5918720 | consumed tokens: 12121538560 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.688845E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.920 | TFLOPs: 11.84 | 7: iteration 23130/ 173500 | consumed samples: 5921280 | consumed tokens: 12126781440 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.679768E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.847 | TFLOPs: 11.86 | 7: iteration 23140/ 173500 | consumed samples: 5923840 | consumed tokens: 12132024320 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.687483E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.783 | TFLOPs: 11.40 | 7: iteration 23150/ 173500 | consumed samples: 5926400 | consumed tokens: 12137267200 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.670053E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.702 | TFLOPs: 11.90 | 7: iteration 23160/ 173500 | consumed samples: 5928960 | consumed tokens: 12142510080 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.684946E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.482 | TFLOPs: 11.88 | 7: iteration 23170/ 173500 | consumed samples: 5931520 | consumed tokens: 12147752960 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.678339E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.793 | TFLOPs: 11.91 | 7: iteration 23180/ 173500 | consumed samples: 5934080 | consumed tokens: 12152995840 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.686016E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.450 | TFLOPs: 12.00 | 7: iteration 23190/ 173500 | consumed samples: 5936640 | consumed tokens: 12158238720 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.696595E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.337 | TFLOPs: 11.99 | 7: iteration 23200/ 173500 | consumed samples: 5939200 | consumed tokens: 12163481600 | elapsed time per iteration (s): 0.08 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 4.657466E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.457 | TFLOPs: 11.97 | 7: iteration 23210/ 173500 | consumed samples: 5941760 | consumed tokens: 12168724480 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.678245E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.736 | TFLOPs: 11.92 | 7: iteration 23220/ 173500 | consumed samples: 5944320 | consumed tokens: 12173967360 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.669000E+00 | grad norm: 0.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.669 | TFLOPs: 11.99 | 7: iteration 23230/ 173500 | consumed samples: 5946880 | consumed tokens: 12179210240 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.681061E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.969 | TFLOPs: 11.94 | 7: iteration 23240/ 173500 | consumed samples: 5949440 | consumed tokens: 12184453120 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.679345E+00 | grad norm: 0.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.185 | TFLOPs: 11.98 | 7: iteration 23250/ 173500 | consumed samples: 5952000 | consumed tokens: 12189696000 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.675370E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.182 | TFLOPs: 12.01 | 7: iteration 23260/ 173500 | consumed samples: 5954560 | consumed tokens: 12194938880 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.676075E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.634 | TFLOPs: 11.98 | 7: iteration 23270/ 173500 | consumed samples: 5957120 | consumed tokens: 12200181760 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.678855E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.809 | TFLOPs: 12.00 | 7: iteration 23280/ 173500 | consumed samples: 5959680 | consumed tokens: 12205424640 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.676141E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.472 | TFLOPs: 12.05 | 7: iteration 23290/ 173500 | consumed samples: 5962240 | consumed tokens: 12210667520 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.688146E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.591 | TFLOPs: 12.05 | 7: iteration 23300/ 173500 | consumed samples: 5964800 | consumed tokens: 12215910400 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.678598E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.913 | TFLOPs: 12.00 | 7: iteration 23310/ 173500 | consumed samples: 5967360 | consumed tokens: 12221153280 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.662416E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.614 | TFLOPs: 11.97 | 7: iteration 23320/ 173500 | consumed samples: 5969920 | consumed tokens: 12226396160 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.668416E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.677 | TFLOPs: 11.98 | 7: iteration 23330/ 173500 | consumed samples: 5972480 | consumed tokens: 12231639040 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.675663E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.066 | TFLOPs: 11.91 | 7: iteration 23340/ 173500 | consumed samples: 5975040 | consumed tokens: 12236881920 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.679122E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.202 | TFLOPs: 11.84 | 7: iteration 23350/ 173500 | consumed samples: 5977600 | consumed tokens: 12242124800 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.675947E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.607 | TFLOPs: 11.93 | 7: iteration 23360/ 173500 | consumed samples: 5980160 | consumed tokens: 12247367680 | elapsed time per iteration (s): 0.08 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 4.670594E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.023 | TFLOPs: 11.97 | 7: iteration 23370/ 173500 | consumed samples: 5982720 | consumed tokens: 12252610560 | elapsed time per iteration (s): 0.11 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.688340E+00 | grad norm: 0.557 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2241.515 | TFLOPs: 8.34 | 7: iteration 23380/ 173500 | consumed samples: 5985280 | consumed tokens: 12257853440 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.666789E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.008 | TFLOPs: 9.52 | 7: iteration 23390/ 173500 | consumed samples: 5987840 | consumed tokens: 12263096320 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.685709E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.777 | TFLOPs: 9.84 | 7: iteration 23400/ 173500 | consumed samples: 5990400 | consumed tokens: 12268339200 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.677047E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.265 | TFLOPs: 9.56 | 7: iteration 23410/ 173500 | consumed samples: 5992960 | consumed tokens: 12273582080 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.672639E+00 | grad norm: 0.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.781 | TFLOPs: 9.60 | 7: iteration 23420/ 173500 | consumed samples: 5995520 | consumed tokens: 12278824960 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.674665E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.579 | TFLOPs: 9.64 | 7: iteration 23430/ 173500 | consumed samples: 5998080 | consumed tokens: 12284067840 | elapsed time per iteration (s): 0.10 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.675644E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.364 | TFLOPs: 9.84 | 7: iteration 23440/ 173500 | consumed samples: 6000640 | consumed tokens: 12289310720 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.674321E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.879 | TFLOPs: 11.53 | 7: iteration 23450/ 173500 | consumed samples: 6003200 | consumed tokens: 12294553600 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.678067E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.417 | TFLOPs: 12.05 | 7: iteration 23460/ 173500 | consumed samples: 6005760 | consumed tokens: 12299796480 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.673510E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.318 | TFLOPs: 11.98 | 7: iteration 23470/ 173500 | consumed samples: 6008320 | consumed tokens: 12305039360 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.676821E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.025 | TFLOPs: 12.03 | 7: iteration 23480/ 173500 | consumed samples: 6010880 | consumed tokens: 12310282240 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.663594E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.718 | TFLOPs: 11.82 | 7: iteration 23490/ 173500 | consumed samples: 6013440 | consumed tokens: 12315525120 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.669188E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.517 | TFLOPs: 12.02 | 7: iteration 23500/ 173500 | consumed samples: 6016000 | consumed tokens: 12320768000 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.665524E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.928 | TFLOPs: 11.98 | 7: iteration 23510/ 173500 | consumed samples: 6018560 | consumed tokens: 12326010880 | elapsed time per iteration (s): 0.08 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 4.666412E+00 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.611 | TFLOPs: 11.93 | 7: iteration 23520/ 173500 | consumed samples: 6021120 | consumed tokens: 12331253760 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.683721E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.470 | TFLOPs: 11.96 | 7: iteration 23530/ 173500 | consumed samples: 6023680 | consumed tokens: 12336496640 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.669969E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.481 | TFLOPs: 11.88 | 7: iteration 23540/ 173500 | consumed samples: 6026240 | consumed tokens: 12341739520 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.654349E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.354 | TFLOPs: 11.98 | 7: iteration 23550/ 173500 | consumed samples: 6028800 | consumed tokens: 12346982400 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.681815E+00 | grad norm: 0.567 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.520 | TFLOPs: 11.98 | 7: iteration 23560/ 173500 | consumed samples: 6031360 | consumed tokens: 12352225280 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.672630E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.581 | TFLOPs: 12.04 | 7: iteration 23570/ 173500 | consumed samples: 6033920 | consumed tokens: 12357468160 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.677916E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.205 | TFLOPs: 11.97 | 7: iteration 23580/ 173500 | consumed samples: 6036480 | consumed tokens: 12362711040 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.676571E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.618 | TFLOPs: 12.05 | 7: iteration 23590/ 173500 | consumed samples: 6039040 | consumed tokens: 12367953920 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.678176E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.589 | TFLOPs: 11.99 | 7: iteration 23600/ 173500 | consumed samples: 6041600 | consumed tokens: 12373196800 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.670031E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.512 | TFLOPs: 12.03 | 7: iteration 23610/ 173500 | consumed samples: 6044160 | consumed tokens: 12378439680 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.676109E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.771 | TFLOPs: 12.01 | 7: iteration 23620/ 173500 | consumed samples: 6046720 | consumed tokens: 12383682560 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.680298E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.892 | TFLOPs: 11.99 | 7: iteration 23630/ 173500 | consumed samples: 6049280 | consumed tokens: 12388925440 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.665408E+00 | grad norm: 0.534 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.532 | TFLOPs: 11.96 | 7: iteration 23640/ 173500 | consumed samples: 6051840 | consumed tokens: 12394168320 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.675908E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.057 | TFLOPs: 11.99 | 7: iteration 23650/ 173500 | consumed samples: 6054400 | consumed tokens: 12399411200 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.675504E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.312 | TFLOPs: 12.05 | 7: iteration 23660/ 173500 | consumed samples: 6056960 | consumed tokens: 12404654080 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.667722E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.505 | TFLOPs: 11.93 | 7: iteration 23670/ 173500 | consumed samples: 6059520 | consumed tokens: 12409896960 | elapsed time per iteration (s): 0.08 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 4.670311E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.615 | TFLOPs: 11.93 | 7: iteration 23680/ 173500 | consumed samples: 6062080 | consumed tokens: 12415139840 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.665633E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.639 | TFLOPs: 11.94 | 7: iteration 23690/ 173500 | consumed samples: 6064640 | consumed tokens: 12420382720 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.668830E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.898 | TFLOPs: 12.03 | 7: iteration 23700/ 173500 | consumed samples: 6067200 | consumed tokens: 12425625600 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.670498E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.954 | TFLOPs: 12.06 | 7: iteration 23710/ 173500 | consumed samples: 6069760 | consumed tokens: 12430868480 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.657701E+00 | grad norm: 0.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.541 | TFLOPs: 12.00 | 7: iteration 23720/ 173500 | consumed samples: 6072320 | consumed tokens: 12436111360 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.663009E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.172 | TFLOPs: 12.02 | 7: iteration 23730/ 173500 | consumed samples: 6074880 | consumed tokens: 12441354240 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.672396E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.243 | TFLOPs: 11.97 | 7: iteration 23740/ 173500 | consumed samples: 6077440 | consumed tokens: 12446597120 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.661022E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.818 | TFLOPs: 11.96 | 7: iteration 23750/ 173500 | consumed samples: 6080000 | consumed tokens: 12451840000 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.665172E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.221 | TFLOPs: 11.92 | 7: iteration 23760/ 173500 | consumed samples: 6082560 | consumed tokens: 12457082880 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.668256E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.710 | TFLOPs: 11.99 | 7: iteration 23770/ 173500 | consumed samples: 6085120 | consumed tokens: 12462325760 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.669943E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.242 | TFLOPs: 12.01 | 7: iteration 23780/ 173500 | consumed samples: 6087680 | consumed tokens: 12467568640 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.657650E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.145 | TFLOPs: 11.97 | 7: iteration 23790/ 173500 | consumed samples: 6090240 | consumed tokens: 12472811520 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.669356E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.339 | TFLOPs: 11.96 | 7: iteration 23800/ 173500 | consumed samples: 6092800 | consumed tokens: 12478054400 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.668350E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.419 | TFLOPs: 11.96 | 7: iteration 23810/ 173500 | consumed samples: 6095360 | consumed tokens: 12483297280 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.668426E+00 | grad norm: 0.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.382 | TFLOPs: 11.94 | 7: iteration 23820/ 173500 | consumed samples: 6097920 | consumed tokens: 12488540160 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.652356E+00 | grad norm: 0.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.523 | TFLOPs: 11.98 | 7: iteration 23830/ 173500 | consumed samples: 6100480 | consumed tokens: 12493783040 | elapsed time per iteration (s): 0.08 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 4.661610E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.097 | TFLOPs: 11.96 | 7: iteration 23840/ 173500 | consumed samples: 6103040 | consumed tokens: 12499025920 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.661206E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.503 | TFLOPs: 11.98 | 7: iteration 23850/ 173500 | consumed samples: 6105600 | consumed tokens: 12504268800 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.678940E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.341 | TFLOPs: 12.00 | 7: iteration 23860/ 173500 | consumed samples: 6108160 | consumed tokens: 12509511680 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.667754E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.404 | TFLOPs: 11.97 | 7: iteration 23870/ 173500 | consumed samples: 6110720 | consumed tokens: 12514754560 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.658210E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.053 | TFLOPs: 11.96 | 7: iteration 23880/ 173500 | consumed samples: 6113280 | consumed tokens: 12519997440 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.676330E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.484 | TFLOPs: 11.99 | 7: iteration 23890/ 173500 | consumed samples: 6115840 | consumed tokens: 12525240320 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.662585E+00 | grad norm: 0.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.423 | TFLOPs: 11.96 | 7: iteration 23900/ 173500 | consumed samples: 6118400 | consumed tokens: 12530483200 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.675586E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.933 | TFLOPs: 11.74 | 7: iteration 23910/ 173500 | consumed samples: 6120960 | consumed tokens: 12535726080 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.665606E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.812 | TFLOPs: 11.94 | 7: iteration 23920/ 173500 | consumed samples: 6123520 | consumed tokens: 12540968960 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.675390E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.514 | TFLOPs: 11.99 | 7: iteration 23930/ 173500 | consumed samples: 6126080 | consumed tokens: 12546211840 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.680747E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.006 | TFLOPs: 11.80 | 7: iteration 23940/ 173500 | consumed samples: 6128640 | consumed tokens: 12551454720 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.663476E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.161 | TFLOPs: 11.97 | 7: iteration 23950/ 173500 | consumed samples: 6131200 | consumed tokens: 12556697600 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.661217E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.517 | TFLOPs: 11.99 | 7: iteration 23960/ 173500 | consumed samples: 6133760 | consumed tokens: 12561940480 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.663491E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.105 | TFLOPs: 11.95 | 7: iteration 23970/ 173500 | consumed samples: 6136320 | consumed tokens: 12567183360 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.659126E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.905 | TFLOPs: 11.98 | 7: iteration 23980/ 173500 | consumed samples: 6138880 | consumed tokens: 12572426240 | elapsed time per iteration (s): 0.08 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 4.676067E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.373 | TFLOPs: 11.95 | 7: iteration 23990/ 173500 | consumed samples: 6141440 | consumed tokens: 12577669120 | elapsed time per iteration (s): 0.08 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.670858E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.322 | TFLOPs: 12.00 | 0: [2023-03-17 00:51:52,055] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=0, lr=[0.00019264004235759096, 0.00019264004235759096, 0.00019264004235759096], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 24000/ 173500 | consumed samples: 6144000 | consumed tokens: 12582912000 | elapsed time per iteration (s): 0.08 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.658961E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.277 | TFLOPs: 11.91 | 0: steps: 24000 loss: 4.6661 iter time (s): 0.081 samples/sec: 3150.407 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 24000 | lm loss value: 4.544197E+00 | lm loss PPL: 9.408481E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 24000 to checkpoints_14m91b100m 0: [2023-03-17 00:51:52,111] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step24000 is begin to save! 0: [2023-03-17 00:51:52,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:51:52,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:51:52,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:51:52,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:51:52,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:51:52,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:51:52,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:51:52,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:51:52,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:51:52,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:51:52,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:51:52,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:51:52,154] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step24000/mp_rank_00_model_states.pt 0: [2023-03-17 00:51:52,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:51:52,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:51:52,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:51:52,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:51:52,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:51:52,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2023-03-17 00:51:52,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:51:52,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: successfully saved checkpoint at iteration 24000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.03 7: iteration 24010/ 173500 | consumed samples: 6146560 | consumed tokens: 12588154880 | elapsed time per iteration (s): 0.09 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.680054E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.727 | TFLOPs: 10.33 | 7: iteration 24020/ 173500 | consumed samples: 6149120 | consumed tokens: 12593397760 | elapsed time per iteration (s): 0.08 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.676263E+00 | grad norm: 0.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.647 | TFLOPs: 12.02 | 7: iteration 24030/ 173500 | consumed samples: 6151680 | consumed tokens: 12598640640 | elapsed time per iteration (s): 0.08 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.670790E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.104 | TFLOPs: 11.77 | 7: iteration 24040/ 173500 | consumed samples: 6154240 | consumed tokens: 12603883520 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.663636E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2552.929 | TFLOPs: 9.50 | 7: iteration 24050/ 173500 | consumed samples: 6156800 | consumed tokens: 12609126400 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.660435E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2501.372 | TFLOPs: 9.30 | 7: iteration 24060/ 173500 | consumed samples: 6159360 | consumed tokens: 12614369280 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.677166E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2467.334 | TFLOPs: 9.18 | 7: iteration 24070/ 173500 | consumed samples: 6161920 | consumed tokens: 12619612160 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.657674E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.061 | TFLOPs: 9.19 | 7: iteration 24080/ 173500 | consumed samples: 6164480 | consumed tokens: 12624855040 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.662224E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.063 | TFLOPs: 9.27 | 7: iteration 24090/ 173500 | consumed samples: 6167040 | consumed tokens: 12630097920 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.657086E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.108 | TFLOPs: 9.19 | 7: iteration 24100/ 173500 | consumed samples: 6169600 | consumed tokens: 12635340800 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.660781E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.062 | TFLOPs: 9.30 | 7: iteration 24110/ 173500 | consumed samples: 6172160 | consumed tokens: 12640583680 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.663094E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.071 | TFLOPs: 9.19 | 7: iteration 24120/ 173500 | consumed samples: 6174720 | consumed tokens: 12645826560 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.660232E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2441.120 | TFLOPs: 9.08 | 7: iteration 24130/ 173500 | consumed samples: 6177280 | consumed tokens: 12651069440 | elapsed time per iteration (s): 0.10 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 4.656884E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.115 | TFLOPs: 9.52 | 7: iteration 24140/ 173500 | consumed samples: 6179840 | consumed tokens: 12656312320 | elapsed time per iteration (s): 0.10 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.663113E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.817 | TFLOPs: 9.34 | 7: iteration 24150/ 173500 | consumed samples: 6182400 | consumed tokens: 12661555200 | elapsed time per iteration (s): 0.10 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.645513E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2502.279 | TFLOPs: 9.31 | 7: iteration 24160/ 173500 | consumed samples: 6184960 | consumed tokens: 12666798080 | elapsed time per iteration (s): 0.10 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.662570E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.266 | TFLOPs: 9.36 | 7: iteration 24170/ 173500 | consumed samples: 6187520 | consumed tokens: 12672040960 | elapsed time per iteration (s): 0.10 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.661809E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.888 | TFLOPs: 9.09 | 7: iteration 24180/ 173500 | consumed samples: 6190080 | consumed tokens: 12677283840 | elapsed time per iteration (s): 0.10 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.658417E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2650.302 | TFLOPs: 9.86 | 7: iteration 24190/ 173500 | consumed samples: 6192640 | consumed tokens: 12682526720 | elapsed time per iteration (s): 0.08 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.661015E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.197 | TFLOPs: 12.03 | 7: iteration 24200/ 173500 | consumed samples: 6195200 | consumed tokens: 12687769600 | elapsed time per iteration (s): 0.08 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.657750E+00 | grad norm: 0.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.724 | TFLOPs: 12.01 | 7: iteration 24210/ 173500 | consumed samples: 6197760 | consumed tokens: 12693012480 | elapsed time per iteration (s): 0.08 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.661279E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.214 | TFLOPs: 11.83 | 7: iteration 24220/ 173500 | consumed samples: 6200320 | consumed tokens: 12698255360 | elapsed time per iteration (s): 0.08 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.668371E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.545 | TFLOPs: 11.85 | 7: iteration 24230/ 173500 | consumed samples: 6202880 | consumed tokens: 12703498240 | elapsed time per iteration (s): 0.09 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.647923E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2980.142 | TFLOPs: 11.08 | 7: iteration 24240/ 173500 | consumed samples: 6205440 | consumed tokens: 12708741120 | elapsed time per iteration (s): 0.11 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.661324E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2331.500 | TFLOPs: 8.67 | 7: iteration 24250/ 173500 | consumed samples: 6208000 | consumed tokens: 12713984000 | elapsed time per iteration (s): 0.11 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.654526E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2262.861 | TFLOPs: 8.42 | 7: iteration 24260/ 173500 | consumed samples: 6210560 | consumed tokens: 12719226880 | elapsed time per iteration (s): 0.11 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.657832E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.373 | TFLOPs: 8.72 | 7: iteration 24270/ 173500 | consumed samples: 6213120 | consumed tokens: 12724469760 | elapsed time per iteration (s): 0.11 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.656007E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.948 | TFLOPs: 8.91 | 7: iteration 24280/ 173500 | consumed samples: 6215680 | consumed tokens: 12729712640 | elapsed time per iteration (s): 0.08 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 4.661113E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.661 | TFLOPs: 12.04 | 7: iteration 24290/ 173500 | consumed samples: 6218240 | consumed tokens: 12734955520 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.657232E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.638 | TFLOPs: 11.93 | 7: iteration 24300/ 173500 | consumed samples: 6220800 | consumed tokens: 12740198400 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.678661E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.756 | TFLOPs: 11.94 | 7: iteration 24310/ 173500 | consumed samples: 6223360 | consumed tokens: 12745441280 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.667145E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.673 | TFLOPs: 11.98 | 7: iteration 24320/ 173500 | consumed samples: 6225920 | consumed tokens: 12750684160 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.643410E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.348 | TFLOPs: 11.98 | 7: iteration 24330/ 173500 | consumed samples: 6228480 | consumed tokens: 12755927040 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.671298E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.814 | TFLOPs: 11.99 | 7: iteration 24340/ 173500 | consumed samples: 6231040 | consumed tokens: 12761169920 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.657874E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.256 | TFLOPs: 11.83 | 7: iteration 24350/ 173500 | consumed samples: 6233600 | consumed tokens: 12766412800 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.673832E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.047 | TFLOPs: 11.87 | 7: iteration 24360/ 173500 | consumed samples: 6236160 | consumed tokens: 12771655680 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.658149E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.301 | TFLOPs: 11.79 | 7: iteration 24370/ 173500 | consumed samples: 6238720 | consumed tokens: 12776898560 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.663839E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.226 | TFLOPs: 11.87 | 7: iteration 24380/ 173500 | consumed samples: 6241280 | consumed tokens: 12782141440 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.661744E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.617 | TFLOPs: 11.85 | 7: iteration 24390/ 173500 | consumed samples: 6243840 | consumed tokens: 12787384320 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.650233E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.115 | TFLOPs: 11.82 | 7: iteration 24400/ 173500 | consumed samples: 6246400 | consumed tokens: 12792627200 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.657948E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.333 | TFLOPs: 11.86 | 7: iteration 24410/ 173500 | consumed samples: 6248960 | consumed tokens: 12797870080 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.664942E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.801 | TFLOPs: 11.86 | 7: iteration 24420/ 173500 | consumed samples: 6251520 | consumed tokens: 12803112960 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.649124E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.961 | TFLOPs: 11.80 | 7: iteration 24430/ 173500 | consumed samples: 6254080 | consumed tokens: 12808355840 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.662034E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.725 | TFLOPs: 11.85 | 7: iteration 24440/ 173500 | consumed samples: 6256640 | consumed tokens: 12813598720 | elapsed time per iteration (s): 0.08 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 4.661495E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.929 | TFLOPs: 11.83 | 7: iteration 24450/ 173500 | consumed samples: 6259200 | consumed tokens: 12818841600 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.659527E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.712 | TFLOPs: 11.82 | 7: iteration 24460/ 173500 | consumed samples: 6261760 | consumed tokens: 12824084480 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.655481E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.542 | TFLOPs: 11.85 | 7: iteration 24470/ 173500 | consumed samples: 6264320 | consumed tokens: 12829327360 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.665405E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.528 | TFLOPs: 11.86 | 7: iteration 24480/ 173500 | consumed samples: 6266880 | consumed tokens: 12834570240 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.649068E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.775 | TFLOPs: 11.85 | 7: iteration 24490/ 173500 | consumed samples: 6269440 | consumed tokens: 12839813120 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.663231E+00 | grad norm: 0.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.694 | TFLOPs: 11.84 | 7: iteration 24500/ 173500 | consumed samples: 6272000 | consumed tokens: 12845056000 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.671916E+00 | grad norm: 0.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.174 | TFLOPs: 11.84 | 7: iteration 24510/ 173500 | consumed samples: 6274560 | consumed tokens: 12850298880 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.645571E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.473 | TFLOPs: 11.78 | 7: iteration 24520/ 173500 | consumed samples: 6277120 | consumed tokens: 12855541760 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.641527E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.885 | TFLOPs: 11.80 | 7: iteration 24530/ 173500 | consumed samples: 6279680 | consumed tokens: 12860784640 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.656260E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.283 | TFLOPs: 11.83 | 7: iteration 24540/ 173500 | consumed samples: 6282240 | consumed tokens: 12866027520 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.667084E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.474 | TFLOPs: 11.84 | 7: iteration 24550/ 173500 | consumed samples: 6284800 | consumed tokens: 12871270400 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.669769E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.773 | TFLOPs: 11.85 | 7: iteration 24560/ 173500 | consumed samples: 6287360 | consumed tokens: 12876513280 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.672956E+00 | grad norm: 0.525 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.766 | TFLOPs: 11.87 | 7: iteration 24570/ 173500 | consumed samples: 6289920 | consumed tokens: 12881756160 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.674685E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.890 | TFLOPs: 11.87 | 7: iteration 24580/ 173500 | consumed samples: 6292480 | consumed tokens: 12886999040 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.654517E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.128 | TFLOPs: 11.85 | 7: iteration 24590/ 173500 | consumed samples: 6295040 | consumed tokens: 12892241920 | elapsed time per iteration (s): 0.08 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 4.658927E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.410 | TFLOPs: 11.88 | 7: iteration 24600/ 173500 | consumed samples: 6297600 | consumed tokens: 12897484800 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.648560E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.933 | TFLOPs: 11.82 | 7: iteration 24610/ 173500 | consumed samples: 6300160 | consumed tokens: 12902727680 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.653786E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.070 | TFLOPs: 11.81 | 7: iteration 24620/ 173500 | consumed samples: 6302720 | consumed tokens: 12907970560 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.655638E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.420 | TFLOPs: 11.90 | 7: iteration 24630/ 173500 | consumed samples: 6305280 | consumed tokens: 12913213440 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.645328E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.524 | TFLOPs: 11.64 | 7: iteration 24640/ 173500 | consumed samples: 6307840 | consumed tokens: 12918456320 | elapsed time per iteration (s): 0.09 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.662090E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.777 | TFLOPs: 10.97 | 7: iteration 24650/ 173500 | consumed samples: 6310400 | consumed tokens: 12923699200 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.665128E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.181 | TFLOPs: 11.76 | 7: iteration 24660/ 173500 | consumed samples: 6312960 | consumed tokens: 12928942080 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.656618E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.404 | TFLOPs: 11.84 | 7: iteration 24670/ 173500 | consumed samples: 6315520 | consumed tokens: 12934184960 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.658004E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.990 | TFLOPs: 11.86 | 7: iteration 24680/ 173500 | consumed samples: 6318080 | consumed tokens: 12939427840 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.653180E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.643 | TFLOPs: 11.73 | 7: iteration 24690/ 173500 | consumed samples: 6320640 | consumed tokens: 12944670720 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.664408E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.789 | TFLOPs: 11.82 | 7: iteration 24700/ 173500 | consumed samples: 6323200 | consumed tokens: 12949913600 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.658355E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.508 | TFLOPs: 11.82 | 7: iteration 24710/ 173500 | consumed samples: 6325760 | consumed tokens: 12955156480 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.662270E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.404 | TFLOPs: 11.80 | 7: iteration 24720/ 173500 | consumed samples: 6328320 | consumed tokens: 12960399360 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.662038E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.779 | TFLOPs: 11.81 | 7: iteration 24730/ 173500 | consumed samples: 6330880 | consumed tokens: 12965642240 | elapsed time per iteration (s): 0.09 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.648567E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.990 | TFLOPs: 10.65 | 7: iteration 24740/ 173500 | consumed samples: 6333440 | consumed tokens: 12970885120 | elapsed time per iteration (s): 0.08 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 4.644952E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.974 | TFLOPs: 11.82 | 7: iteration 24750/ 173500 | consumed samples: 6336000 | consumed tokens: 12976128000 | elapsed time per iteration (s): 0.10 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.641248E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2537.565 | TFLOPs: 9.44 | 7: iteration 24760/ 173500 | consumed samples: 6338560 | consumed tokens: 12981370880 | elapsed time per iteration (s): 0.09 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.654414E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.992 | TFLOPs: 10.79 | 7: iteration 24770/ 173500 | consumed samples: 6341120 | consumed tokens: 12986613760 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.655615E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.096 | TFLOPs: 11.88 | 7: iteration 24780/ 173500 | consumed samples: 6343680 | consumed tokens: 12991856640 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.668833E+00 | grad norm: 0.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.092 | TFLOPs: 11.90 | 7: iteration 24790/ 173500 | consumed samples: 6346240 | consumed tokens: 12997099520 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.656855E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.326 | TFLOPs: 11.89 | 7: iteration 24800/ 173500 | consumed samples: 6348800 | consumed tokens: 13002342400 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.663725E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.695 | TFLOPs: 11.91 | 7: iteration 24810/ 173500 | consumed samples: 6351360 | consumed tokens: 13007585280 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.652392E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.972 | TFLOPs: 11.95 | 7: iteration 24820/ 173500 | consumed samples: 6353920 | consumed tokens: 13012828160 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.656672E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.151 | TFLOPs: 11.69 | 7: iteration 24830/ 173500 | consumed samples: 6356480 | consumed tokens: 13018071040 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.661572E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.226 | TFLOPs: 11.95 | 7: iteration 24840/ 173500 | consumed samples: 6359040 | consumed tokens: 13023313920 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.664549E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.749 | TFLOPs: 11.89 | 7: iteration 24850/ 173500 | consumed samples: 6361600 | consumed tokens: 13028556800 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.655510E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.632 | TFLOPs: 11.52 | 7: iteration 24860/ 173500 | consumed samples: 6364160 | consumed tokens: 13033799680 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.645155E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.653 | TFLOPs: 11.89 | 7: iteration 24870/ 173500 | consumed samples: 6366720 | consumed tokens: 13039042560 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.657375E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.207 | TFLOPs: 11.84 | 7: iteration 24880/ 173500 | consumed samples: 6369280 | consumed tokens: 13044285440 | elapsed time per iteration (s): 0.08 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 4.643864E+00 | grad norm: 0.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.361 | TFLOPs: 11.79 | 7: iteration 24890/ 173500 | consumed samples: 6371840 | consumed tokens: 13049528320 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.649188E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.839 | TFLOPs: 11.84 | 7: iteration 24900/ 173500 | consumed samples: 6374400 | consumed tokens: 13054771200 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.646454E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.286 | TFLOPs: 11.81 | 7: iteration 24910/ 173500 | consumed samples: 6376960 | consumed tokens: 13060014080 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.662231E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.435 | TFLOPs: 11.86 | 7: iteration 24920/ 173500 | consumed samples: 6379520 | consumed tokens: 13065256960 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.658838E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.302 | TFLOPs: 11.85 | 7: iteration 24930/ 173500 | consumed samples: 6382080 | consumed tokens: 13070499840 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.659206E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.246 | TFLOPs: 11.84 | 7: iteration 24940/ 173500 | consumed samples: 6384640 | consumed tokens: 13075742720 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.652855E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.723 | TFLOPs: 11.82 | 7: iteration 24950/ 173500 | consumed samples: 6387200 | consumed tokens: 13080985600 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.660524E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.865 | TFLOPs: 11.85 | 7: iteration 24960/ 173500 | consumed samples: 6389760 | consumed tokens: 13086228480 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.667959E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.185 | TFLOPs: 11.81 | 7: iteration 24970/ 173500 | consumed samples: 6392320 | consumed tokens: 13091471360 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.659102E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.762 | TFLOPs: 11.86 | 7: iteration 24980/ 173500 | consumed samples: 6394880 | consumed tokens: 13096714240 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.666401E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.437 | TFLOPs: 11.80 | 7: iteration 24990/ 173500 | consumed samples: 6397440 | consumed tokens: 13101957120 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.660298E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.009 | TFLOPs: 11.83 | 7: iteration 25000/ 173500 | consumed samples: 6400000 | consumed tokens: 13107200000 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.655216E+00 | grad norm: 0.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.165 | TFLOPs: 11.77 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 25000 | lm loss value: 4.485362E+00 | lm loss PPL: 8.870906E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 25000 to checkpoints_14m91b100m 0: [2023-03-17 00:53:17,539] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step25000 is begin to save! 0: [2023-03-17 00:53:17,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:53:17,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:53:17,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:53:17,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:53:17,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:53:17,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:53:17,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:53:17,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:53:17,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:53:17,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:53:17,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:53:17,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:53:17,581] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step25000/mp_rank_00_model_states.pt 0: [2023-03-17 00:53:17,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:53:17,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:53:17,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:53:17,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:53:17,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:53:17,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2023-03-17 00:53:17,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: successfully saved checkpoint at iteration 25000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.91 7: iteration 25010/ 173500 | consumed samples: 6402560 | consumed tokens: 13112442880 | elapsed time per iteration (s): 0.09 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.648636E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.068 | TFLOPs: 10.38 | 7: iteration 25020/ 173500 | consumed samples: 6405120 | consumed tokens: 13117685760 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.663436E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.236 | TFLOPs: 11.84 | 7: iteration 25030/ 173500 | consumed samples: 6407680 | consumed tokens: 13122928640 | elapsed time per iteration (s): 0.08 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 4.654656E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.258 | TFLOPs: 11.81 | 7: iteration 25040/ 173500 | consumed samples: 6410240 | consumed tokens: 13128171520 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.646889E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.830 | TFLOPs: 11.86 | 7: iteration 25050/ 173500 | consumed samples: 6412800 | consumed tokens: 13133414400 | elapsed time per iteration (s): 0.09 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.653190E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.946 | TFLOPs: 10.16 | 7: iteration 25060/ 173500 | consumed samples: 6415360 | consumed tokens: 13138657280 | elapsed time per iteration (s): 0.09 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.653250E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2922.539 | TFLOPs: 10.87 | 7: iteration 25070/ 173500 | consumed samples: 6417920 | consumed tokens: 13143900160 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.663664E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.884 | TFLOPs: 11.80 | 7: iteration 25080/ 173500 | consumed samples: 6420480 | consumed tokens: 13149143040 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.656133E+00 | grad norm: 0.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.164 | TFLOPs: 11.82 | 7: iteration 25090/ 173500 | consumed samples: 6423040 | consumed tokens: 13154385920 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.655750E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.189 | TFLOPs: 11.82 | 7: iteration 25100/ 173500 | consumed samples: 6425600 | consumed tokens: 13159628800 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.641184E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.347 | TFLOPs: 11.83 | 7: iteration 25110/ 173500 | consumed samples: 6428160 | consumed tokens: 13164871680 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.655140E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.824 | TFLOPs: 11.79 | 7: iteration 25120/ 173500 | consumed samples: 6430720 | consumed tokens: 13170114560 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.650234E+00 | grad norm: 0.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.850 | TFLOPs: 11.79 | 7: iteration 25130/ 173500 | consumed samples: 6433280 | consumed tokens: 13175357440 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.640590E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.309 | TFLOPs: 11.83 | 7: iteration 25140/ 173500 | consumed samples: 6435840 | consumed tokens: 13180600320 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.650418E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.882 | TFLOPs: 11.82 | 7: iteration 25150/ 173500 | consumed samples: 6438400 | consumed tokens: 13185843200 | elapsed time per iteration (s): 0.09 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.654456E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.421 | TFLOPs: 10.33 | 7: iteration 25160/ 173500 | consumed samples: 6440960 | consumed tokens: 13191086080 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.648188E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.584 | TFLOPs: 11.84 | 7: iteration 25170/ 173500 | consumed samples: 6443520 | consumed tokens: 13196328960 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.653397E+00 | grad norm: 0.505 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.070 | TFLOPs: 11.83 | 7: iteration 25180/ 173500 | consumed samples: 6446080 | consumed tokens: 13201571840 | elapsed time per iteration (s): 0.08 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 4.646373E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.356 | TFLOPs: 11.86 | 7: iteration 25190/ 173500 | consumed samples: 6448640 | consumed tokens: 13206814720 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.649210E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.851 | TFLOPs: 11.86 | 7: iteration 25200/ 173500 | consumed samples: 6451200 | consumed tokens: 13212057600 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.644193E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.439 | TFLOPs: 11.86 | 7: iteration 25210/ 173500 | consumed samples: 6453760 | consumed tokens: 13217300480 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.666348E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.141 | TFLOPs: 11.82 | 7: iteration 25220/ 173500 | consumed samples: 6456320 | consumed tokens: 13222543360 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.658557E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.432 | TFLOPs: 11.87 | 7: iteration 25230/ 173500 | consumed samples: 6458880 | consumed tokens: 13227786240 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.641855E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.218 | TFLOPs: 11.87 | 7: iteration 25240/ 173500 | consumed samples: 6461440 | consumed tokens: 13233029120 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.647187E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.145 | TFLOPs: 11.87 | 7: iteration 25250/ 173500 | consumed samples: 6464000 | consumed tokens: 13238272000 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.659844E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.036 | TFLOPs: 11.68 | 7: iteration 25260/ 173500 | consumed samples: 6466560 | consumed tokens: 13243514880 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.653478E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.438 | TFLOPs: 11.86 | 7: iteration 25270/ 173500 | consumed samples: 6469120 | consumed tokens: 13248757760 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.657661E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.861 | TFLOPs: 11.87 | 7: iteration 25280/ 173500 | consumed samples: 6471680 | consumed tokens: 13254000640 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.661767E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.679 | TFLOPs: 11.91 | 7: iteration 25290/ 173500 | consumed samples: 6474240 | consumed tokens: 13259243520 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.662073E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.353 | TFLOPs: 11.78 | 7: iteration 25300/ 173500 | consumed samples: 6476800 | consumed tokens: 13264486400 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.655014E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.144 | TFLOPs: 11.77 | 7: iteration 25310/ 173500 | consumed samples: 6479360 | consumed tokens: 13269729280 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.645173E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.153 | TFLOPs: 11.78 | 7: iteration 25320/ 173500 | consumed samples: 6481920 | consumed tokens: 13274972160 | elapsed time per iteration (s): 0.08 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 4.653363E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.526 | TFLOPs: 11.76 | 7: iteration 25330/ 173500 | consumed samples: 6484480 | consumed tokens: 13280215040 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.662797E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.126 | TFLOPs: 11.77 | 7: iteration 25340/ 173500 | consumed samples: 6487040 | consumed tokens: 13285457920 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.651752E+00 | grad norm: 0.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.747 | TFLOPs: 11.83 | 7: iteration 25350/ 173500 | consumed samples: 6489600 | consumed tokens: 13290700800 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.633204E+00 | grad norm: 0.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.185 | TFLOPs: 11.40 | 7: iteration 25360/ 173500 | consumed samples: 6492160 | consumed tokens: 13295943680 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.646754E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.238 | TFLOPs: 11.87 | 7: iteration 25370/ 173500 | consumed samples: 6494720 | consumed tokens: 13301186560 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.638028E+00 | grad norm: 0.518 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.544 | TFLOPs: 11.87 | 7: iteration 25380/ 173500 | consumed samples: 6497280 | consumed tokens: 13306429440 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.649574E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.079 | TFLOPs: 11.33 | 7: iteration 25390/ 173500 | consumed samples: 6499840 | consumed tokens: 13311672320 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.645571E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.507 | TFLOPs: 11.85 | 7: iteration 25400/ 173500 | consumed samples: 6502400 | consumed tokens: 13316915200 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.659968E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.475 | TFLOPs: 11.83 | 7: iteration 25410/ 173500 | consumed samples: 6504960 | consumed tokens: 13322158080 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.648598E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.619 | TFLOPs: 11.80 | 7: iteration 25420/ 173500 | consumed samples: 6507520 | consumed tokens: 13327400960 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.648614E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.131 | TFLOPs: 11.85 | 7: iteration 25430/ 173500 | consumed samples: 6510080 | consumed tokens: 13332643840 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.643005E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.059 | TFLOPs: 11.76 | 7: iteration 25440/ 173500 | consumed samples: 6512640 | consumed tokens: 13337886720 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.652215E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.440 | TFLOPs: 11.86 | 7: iteration 25450/ 173500 | consumed samples: 6515200 | consumed tokens: 13343129600 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.657982E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.729 | TFLOPs: 11.50 | 7: iteration 25460/ 173500 | consumed samples: 6517760 | consumed tokens: 13348372480 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.654748E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.834 | TFLOPs: 11.82 | 7: iteration 25470/ 173500 | consumed samples: 6520320 | consumed tokens: 13353615360 | elapsed time per iteration (s): 0.08 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 4.652889E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.440 | TFLOPs: 11.82 | 7: iteration 25480/ 173500 | consumed samples: 6522880 | consumed tokens: 13358858240 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.654619E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.212 | TFLOPs: 11.84 | 7: iteration 25490/ 173500 | consumed samples: 6525440 | consumed tokens: 13364101120 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.650303E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.001 | TFLOPs: 11.81 | 7: iteration 25500/ 173500 | consumed samples: 6528000 | consumed tokens: 13369344000 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.654611E+00 | grad norm: 0.501 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.119 | TFLOPs: 11.85 | 7: iteration 25510/ 173500 | consumed samples: 6530560 | consumed tokens: 13374586880 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.652211E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.704 | TFLOPs: 11.85 | 7: iteration 25520/ 173500 | consumed samples: 6533120 | consumed tokens: 13379829760 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.651020E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.925 | TFLOPs: 11.87 | 7: iteration 25530/ 173500 | consumed samples: 6535680 | consumed tokens: 13385072640 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.653450E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.894 | TFLOPs: 11.89 | 7: iteration 25540/ 173500 | consumed samples: 6538240 | consumed tokens: 13390315520 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.649559E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.274 | TFLOPs: 11.84 | 7: iteration 25550/ 173500 | consumed samples: 6540800 | consumed tokens: 13395558400 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.641328E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.491 | TFLOPs: 11.74 | 7: iteration 25560/ 173500 | consumed samples: 6543360 | consumed tokens: 13400801280 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.647254E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.378 | TFLOPs: 11.85 | 7: iteration 25570/ 173500 | consumed samples: 6545920 | consumed tokens: 13406044160 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.651312E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.917 | TFLOPs: 11.83 | 7: iteration 25580/ 173500 | consumed samples: 6548480 | consumed tokens: 13411287040 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.646499E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.398 | TFLOPs: 11.88 | 7: iteration 25590/ 173500 | consumed samples: 6551040 | consumed tokens: 13416529920 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.650624E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.710 | TFLOPs: 11.87 | 7: iteration 25600/ 173500 | consumed samples: 6553600 | consumed tokens: 13421772800 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.651873E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.105 | TFLOPs: 11.88 | 7: iteration 25610/ 173500 | consumed samples: 6556160 | consumed tokens: 13427015680 | elapsed time per iteration (s): 0.08 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 4.657539E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.283 | TFLOPs: 11.84 | 7: iteration 25620/ 173500 | consumed samples: 6558720 | consumed tokens: 13432258560 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.642045E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.378 | TFLOPs: 11.86 | 7: iteration 25630/ 173500 | consumed samples: 6561280 | consumed tokens: 13437501440 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.652318E+00 | grad norm: 0.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.333 | TFLOPs: 11.88 | 7: iteration 25640/ 173500 | consumed samples: 6563840 | consumed tokens: 13442744320 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.654307E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.145 | TFLOPs: 11.87 | 7: iteration 25650/ 173500 | consumed samples: 6566400 | consumed tokens: 13447987200 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.647736E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.037 | TFLOPs: 11.85 | 7: iteration 25660/ 173500 | consumed samples: 6568960 | consumed tokens: 13453230080 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.652406E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.511 | TFLOPs: 11.87 | 7: iteration 25670/ 173500 | consumed samples: 6571520 | consumed tokens: 13458472960 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.653875E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.909 | TFLOPs: 11.82 | 7: iteration 25680/ 173500 | consumed samples: 6574080 | consumed tokens: 13463715840 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.650337E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.859 | TFLOPs: 11.86 | 7: iteration 25690/ 173500 | consumed samples: 6576640 | consumed tokens: 13468958720 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.665063E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.817 | TFLOPs: 11.73 | 7: iteration 25700/ 173500 | consumed samples: 6579200 | consumed tokens: 13474201600 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.654987E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.367 | TFLOPs: 11.83 | 7: iteration 25710/ 173500 | consumed samples: 6581760 | consumed tokens: 13479444480 | elapsed time per iteration (s): 0.09 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.637986E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.786 | TFLOPs: 10.63 | 7: iteration 25720/ 173500 | consumed samples: 6584320 | consumed tokens: 13484687360 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.649792E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.868 | TFLOPs: 11.88 | 7: iteration 25730/ 173500 | consumed samples: 6586880 | consumed tokens: 13489930240 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.651954E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.323 | TFLOPs: 11.91 | 7: iteration 25740/ 173500 | consumed samples: 6589440 | consumed tokens: 13495173120 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.648019E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.514 | TFLOPs: 11.93 | 7: iteration 25750/ 173500 | consumed samples: 6592000 | consumed tokens: 13500416000 | elapsed time per iteration (s): 0.08 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 4.645612E+00 | grad norm: 0.540 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.724 | TFLOPs: 11.86 | 7: iteration 25760/ 173500 | consumed samples: 6594560 | consumed tokens: 13505658880 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.653529E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.864 | TFLOPs: 11.86 | 7: iteration 25770/ 173500 | consumed samples: 6597120 | consumed tokens: 13510901760 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.643589E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.964 | TFLOPs: 11.91 | 7: iteration 25780/ 173500 | consumed samples: 6599680 | consumed tokens: 13516144640 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.649750E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.295 | TFLOPs: 11.89 | 7: iteration 25790/ 173500 | consumed samples: 6602240 | consumed tokens: 13521387520 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.641494E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.605 | TFLOPs: 11.89 | 7: iteration 25800/ 173500 | consumed samples: 6604800 | consumed tokens: 13526630400 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.647665E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.559 | TFLOPs: 11.91 | 7: iteration 25810/ 173500 | consumed samples: 6607360 | consumed tokens: 13531873280 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.641262E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.643 | TFLOPs: 11.75 | 7: iteration 25820/ 173500 | consumed samples: 6609920 | consumed tokens: 13537116160 | elapsed time per iteration (s): 0.08 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.649892E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.301 | TFLOPs: 11.47 | 7: iteration 25830/ 173500 | consumed samples: 6612480 | consumed tokens: 13542359040 | elapsed time per iteration (s): 0.09 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.643635E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2732.453 | TFLOPs: 10.16 | 7: iteration 25840/ 173500 | consumed samples: 6615040 | consumed tokens: 13547601920 | elapsed time per iteration (s): 0.09 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.643282E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.854 | TFLOPs: 10.58 | 7: iteration 25850/ 173500 | consumed samples: 6617600 | consumed tokens: 13552844800 | elapsed time per iteration (s): 0.30 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.645809E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 857.762 | TFLOPs: 3.19 | 7: iteration 25860/ 173500 | consumed samples: 6620160 | consumed tokens: 13558087680 | elapsed time per iteration (s): 0.10 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.640857E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2683.345 | TFLOPs: 9.98 | 7: iteration 25870/ 173500 | consumed samples: 6622720 | consumed tokens: 13563330560 | elapsed time per iteration (s): 0.10 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.652367E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2597.898 | TFLOPs: 9.66 | 7: iteration 25880/ 173500 | consumed samples: 6625280 | consumed tokens: 13568573440 | elapsed time per iteration (s): 0.09 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.657535E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.473 | TFLOPs: 10.38 | 7: iteration 25890/ 173500 | consumed samples: 6627840 | consumed tokens: 13573816320 | elapsed time per iteration (s): 0.10 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.658878E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.966 | TFLOPs: 9.98 | 7: iteration 25900/ 173500 | consumed samples: 6630400 | consumed tokens: 13579059200 | elapsed time per iteration (s): 0.09 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 4.648929E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.458 | TFLOPs: 10.39 | 7: iteration 25910/ 173500 | consumed samples: 6632960 | consumed tokens: 13584302080 | elapsed time per iteration (s): 0.09 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.646390E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.543 | TFLOPs: 10.07 | 7: iteration 25920/ 173500 | consumed samples: 6635520 | consumed tokens: 13589544960 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.636095E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2610.257 | TFLOPs: 9.71 | 7: iteration 25930/ 173500 | consumed samples: 6638080 | consumed tokens: 13594787840 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.625032E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2684.783 | TFLOPs: 9.99 | 7: iteration 25940/ 173500 | consumed samples: 6640640 | consumed tokens: 13600030720 | elapsed time per iteration (s): 0.09 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.655143E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.549 | TFLOPs: 10.23 | 7: iteration 25950/ 173500 | consumed samples: 6643200 | consumed tokens: 13605273600 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.648699E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.077 | TFLOPs: 10.00 | 7: iteration 25960/ 173500 | consumed samples: 6645760 | consumed tokens: 13610516480 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.647944E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2665.463 | TFLOPs: 9.91 | 7: iteration 25970/ 173500 | consumed samples: 6648320 | consumed tokens: 13615759360 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.631622E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2613.507 | TFLOPs: 9.72 | 7: iteration 25980/ 173500 | consumed samples: 6650880 | consumed tokens: 13621002240 | elapsed time per iteration (s): 0.08 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.642021E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.843 | TFLOPs: 11.21 | 7: iteration 25990/ 173500 | consumed samples: 6653440 | consumed tokens: 13626245120 | elapsed time per iteration (s): 0.10 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.649845E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2445.180 | TFLOPs: 9.09 | 0: [2023-03-17 00:54:43,384] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=0, lr=[0.00019128112529201118, 0.00019128112529201118, 0.00019128112529201118], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 26000/ 173500 | consumed samples: 6656000 | consumed tokens: 13631488000 | elapsed time per iteration (s): 0.12 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.650104E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.074 | TFLOPs: 8.21 | 0: steps: 26000 loss: 4.6590 iter time (s): 0.084 samples/sec: 3029.846 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 26000 | lm loss value: 4.496815E+00 | lm loss PPL: 8.973086E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 26000 to checkpoints_14m91b100m 0: [2023-03-17 00:54:43,466] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step26000 is begin to save! 0: [2023-03-17 00:54:43,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:54:43,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:54:43,493] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:54:43,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:54:43,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:54:43,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:54:43,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:54:43,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:54:43,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:54:43,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:54:43,508] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:54:43,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:54:43,509] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step26000/mp_rank_00_model_states.pt 0: [2023-03-17 00:54:43,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:54:43,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:54:43,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:54:43,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2023-03-17 00:54:43,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:54:43,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2023-03-17 00:54:43,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:54:43,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:54:43,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: successfully saved checkpoint at iteration 26000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.80 7: iteration 26010/ 173500 | consumed samples: 6658560 | consumed tokens: 13636730880 | elapsed time per iteration (s): 0.13 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.652001E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.384 | TFLOPs: 7.26 | 7: iteration 26020/ 173500 | consumed samples: 6661120 | consumed tokens: 13641973760 | elapsed time per iteration (s): 0.12 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.656874E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2110.580 | TFLOPs: 7.85 | 7: iteration 26030/ 173500 | consumed samples: 6663680 | consumed tokens: 13647216640 | elapsed time per iteration (s): 0.15 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.648703E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1763.487 | TFLOPs: 6.56 | 7: iteration 26040/ 173500 | consumed samples: 6666240 | consumed tokens: 13652459520 | elapsed time per iteration (s): 0.14 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 4.646172E+00 | grad norm: 0.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1885.948 | TFLOPs: 7.01 | 7: iteration 26050/ 173500 | consumed samples: 6668800 | consumed tokens: 13657702400 | elapsed time per iteration (s): 0.13 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.652777E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1901.421 | TFLOPs: 7.07 | 7: iteration 26060/ 173500 | consumed samples: 6671360 | consumed tokens: 13662945280 | elapsed time per iteration (s): 0.14 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.645073E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1833.734 | TFLOPs: 6.82 | 7: iteration 26070/ 173500 | consumed samples: 6673920 | consumed tokens: 13668188160 | elapsed time per iteration (s): 0.14 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.660339E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1799.742 | TFLOPs: 6.69 | 7: iteration 26080/ 173500 | consumed samples: 6676480 | consumed tokens: 13673431040 | elapsed time per iteration (s): 0.12 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.654525E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2189.965 | TFLOPs: 8.15 | 7: iteration 26090/ 173500 | consumed samples: 6679040 | consumed tokens: 13678673920 | elapsed time per iteration (s): 0.11 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.641836E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2278.657 | TFLOPs: 8.48 | 7: iteration 26100/ 173500 | consumed samples: 6681600 | consumed tokens: 13683916800 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.653352E+00 | grad norm: 0.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.929 | TFLOPs: 11.42 | 7: iteration 26110/ 173500 | consumed samples: 6684160 | consumed tokens: 13689159680 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.649879E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.002 | TFLOPs: 11.42 | 7: iteration 26120/ 173500 | consumed samples: 6686720 | consumed tokens: 13694402560 | elapsed time per iteration (s): 0.09 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.646614E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.828 | TFLOPs: 10.46 | 7: iteration 26130/ 173500 | consumed samples: 6689280 | consumed tokens: 13699645440 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.652641E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.617 | TFLOPs: 11.67 | 7: iteration 26140/ 173500 | consumed samples: 6691840 | consumed tokens: 13704888320 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.647938E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.184 | TFLOPs: 11.89 | 7: iteration 26150/ 173500 | consumed samples: 6694400 | consumed tokens: 13710131200 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.644432E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.730 | TFLOPs: 11.96 | 7: iteration 26160/ 173500 | consumed samples: 6696960 | consumed tokens: 13715374080 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.647208E+00 | grad norm: 0.468 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.976 | TFLOPs: 11.91 | 7: iteration 26170/ 173500 | consumed samples: 6699520 | consumed tokens: 13720616960 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.640059E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.932 | TFLOPs: 11.89 | 7: iteration 26180/ 173500 | consumed samples: 6702080 | consumed tokens: 13725859840 | elapsed time per iteration (s): 0.08 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 4.642553E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.236 | TFLOPs: 11.92 | 7: iteration 26190/ 173500 | consumed samples: 6704640 | consumed tokens: 13731102720 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.654020E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.890 | TFLOPs: 11.55 | 7: iteration 26200/ 173500 | consumed samples: 6707200 | consumed tokens: 13736345600 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.632383E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.163 | TFLOPs: 11.65 | 7: iteration 26210/ 173500 | consumed samples: 6709760 | consumed tokens: 13741588480 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.637473E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.559 | TFLOPs: 11.92 | 7: iteration 26220/ 173500 | consumed samples: 6712320 | consumed tokens: 13746831360 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.641072E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.619 | TFLOPs: 11.91 | 7: iteration 26230/ 173500 | consumed samples: 6714880 | consumed tokens: 13752074240 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.636103E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.952 | TFLOPs: 11.87 | 7: iteration 26240/ 173500 | consumed samples: 6717440 | consumed tokens: 13757317120 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.650266E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.176 | TFLOPs: 11.87 | 7: iteration 26250/ 173500 | consumed samples: 6720000 | consumed tokens: 13762560000 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.638897E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.967 | TFLOPs: 11.61 | 7: iteration 26260/ 173500 | consumed samples: 6722560 | consumed tokens: 13767802880 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.643264E+00 | grad norm: 0.562 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.867 | TFLOPs: 11.88 | 7: iteration 26270/ 173500 | consumed samples: 6725120 | consumed tokens: 13773045760 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.652410E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.381 | TFLOPs: 11.85 | 7: iteration 26280/ 173500 | consumed samples: 6727680 | consumed tokens: 13778288640 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.636420E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.293 | TFLOPs: 11.83 | 7: iteration 26290/ 173500 | consumed samples: 6730240 | consumed tokens: 13783531520 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.640785E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.094 | TFLOPs: 11.88 | 7: iteration 26300/ 173500 | consumed samples: 6732800 | consumed tokens: 13788774400 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.638966E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.708 | TFLOPs: 11.39 | 7: iteration 26310/ 173500 | consumed samples: 6735360 | consumed tokens: 13794017280 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.643283E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.226 | TFLOPs: 11.56 | 7: iteration 26320/ 173500 | consumed samples: 6737920 | consumed tokens: 13799260160 | elapsed time per iteration (s): 0.08 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 4.646140E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.476 | TFLOPs: 11.83 | 7: iteration 26330/ 173500 | consumed samples: 6740480 | consumed tokens: 13804503040 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.641209E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.428 | TFLOPs: 11.84 | 7: iteration 26340/ 173500 | consumed samples: 6743040 | consumed tokens: 13809745920 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.636884E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.043 | TFLOPs: 11.85 | 7: iteration 26350/ 173500 | consumed samples: 6745600 | consumed tokens: 13814988800 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.644995E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.522 | TFLOPs: 11.85 | 7: iteration 26360/ 173500 | consumed samples: 6748160 | consumed tokens: 13820231680 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.641119E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.582 | TFLOPs: 11.83 | 7: iteration 26370/ 173500 | consumed samples: 6750720 | consumed tokens: 13825474560 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.636005E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.104 | TFLOPs: 11.77 | 7: iteration 26380/ 173500 | consumed samples: 6753280 | consumed tokens: 13830717440 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.626507E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.224 | TFLOPs: 11.84 | 7: iteration 26390/ 173500 | consumed samples: 6755840 | consumed tokens: 13835960320 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.643001E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.891 | TFLOPs: 11.88 | 7: iteration 26400/ 173500 | consumed samples: 6758400 | consumed tokens: 13841203200 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.644440E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.343 | TFLOPs: 11.89 | 7: iteration 26410/ 173500 | consumed samples: 6760960 | consumed tokens: 13846446080 | elapsed time per iteration (s): 0.10 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.635563E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.960 | TFLOPs: 9.82 | 7: iteration 26420/ 173500 | consumed samples: 6763520 | consumed tokens: 13851688960 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.641956E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.952 | TFLOPs: 11.85 | 7: iteration 26430/ 173500 | consumed samples: 6766080 | consumed tokens: 13856931840 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.646867E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.016 | TFLOPs: 11.81 | 7: iteration 26440/ 173500 | consumed samples: 6768640 | consumed tokens: 13862174720 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.643209E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.642 | TFLOPs: 11.92 | 7: iteration 26450/ 173500 | consumed samples: 6771200 | consumed tokens: 13867417600 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.641037E+00 | grad norm: 0.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.337 | TFLOPs: 11.89 | 7: iteration 26460/ 173500 | consumed samples: 6773760 | consumed tokens: 13872660480 | elapsed time per iteration (s): 0.08 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 4.639346E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.726 | TFLOPs: 11.85 | 7: iteration 26470/ 173500 | consumed samples: 6776320 | consumed tokens: 13877903360 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.638738E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.738 | TFLOPs: 11.90 | 7: iteration 26480/ 173500 | consumed samples: 6778880 | consumed tokens: 13883146240 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.644821E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.323 | TFLOPs: 11.82 | 7: iteration 26490/ 173500 | consumed samples: 6781440 | consumed tokens: 13888389120 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.634068E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.952 | TFLOPs: 11.74 | 7: iteration 26500/ 173500 | consumed samples: 6784000 | consumed tokens: 13893632000 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.649959E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.003 | TFLOPs: 11.59 | 7: iteration 26510/ 173500 | consumed samples: 6786560 | consumed tokens: 13898874880 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.639059E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.765 | TFLOPs: 11.88 | 7: iteration 26520/ 173500 | consumed samples: 6789120 | consumed tokens: 13904117760 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.635624E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.361 | TFLOPs: 11.87 | 7: iteration 26530/ 173500 | consumed samples: 6791680 | consumed tokens: 13909360640 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.633805E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.305 | TFLOPs: 11.55 | 7: iteration 26540/ 173500 | consumed samples: 6794240 | consumed tokens: 13914603520 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.632077E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.581 | TFLOPs: 11.86 | 7: iteration 26550/ 173500 | consumed samples: 6796800 | consumed tokens: 13919846400 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.637218E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.261 | TFLOPs: 11.81 | 7: iteration 26560/ 173500 | consumed samples: 6799360 | consumed tokens: 13925089280 | elapsed time per iteration (s): 0.10 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.654285E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2676.093 | TFLOPs: 9.95 | 7: iteration 26570/ 173500 | consumed samples: 6801920 | consumed tokens: 13930332160 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.634353E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.643 | TFLOPs: 11.59 | 7: iteration 26580/ 173500 | consumed samples: 6804480 | consumed tokens: 13935575040 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.651721E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.604 | TFLOPs: 11.88 | 7: iteration 26590/ 173500 | consumed samples: 6807040 | consumed tokens: 13940817920 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.640862E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.917 | TFLOPs: 11.88 | 7: iteration 26600/ 173500 | consumed samples: 6809600 | consumed tokens: 13946060800 | elapsed time per iteration (s): 0.08 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 4.632277E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.374 | TFLOPs: 11.88 | 7: iteration 26610/ 173500 | consumed samples: 6812160 | consumed tokens: 13951303680 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.647714E+00 | grad norm: 0.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.210 | TFLOPs: 11.90 | 7: iteration 26620/ 173500 | consumed samples: 6814720 | consumed tokens: 13956546560 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.648237E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.444 | TFLOPs: 11.89 | 7: iteration 26630/ 173500 | consumed samples: 6817280 | consumed tokens: 13961789440 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.644714E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.944 | TFLOPs: 11.89 | 7: iteration 26640/ 173500 | consumed samples: 6819840 | consumed tokens: 13967032320 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.642252E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.457 | TFLOPs: 11.82 | 7: iteration 26650/ 173500 | consumed samples: 6822400 | consumed tokens: 13972275200 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.637805E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.345 | TFLOPs: 11.80 | 7: iteration 26660/ 173500 | consumed samples: 6824960 | consumed tokens: 13977518080 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.640347E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.884 | TFLOPs: 11.89 | 7: iteration 26670/ 173500 | consumed samples: 6827520 | consumed tokens: 13982760960 | elapsed time per iteration (s): 0.13 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.642442E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1997.824 | TFLOPs: 7.43 | 7: iteration 26680/ 173500 | consumed samples: 6830080 | consumed tokens: 13988003840 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.657691E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.407 | TFLOPs: 11.81 | 7: iteration 26690/ 173500 | consumed samples: 6832640 | consumed tokens: 13993246720 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.642386E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.190 | TFLOPs: 11.86 | 7: iteration 26700/ 173500 | consumed samples: 6835200 | consumed tokens: 13998489600 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.639352E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.809 | TFLOPs: 11.88 | 7: iteration 26710/ 173500 | consumed samples: 6837760 | consumed tokens: 14003732480 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.643584E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.568 | TFLOPs: 11.34 | 7: iteration 26720/ 173500 | consumed samples: 6840320 | consumed tokens: 14008975360 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.648947E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.447 | TFLOPs: 11.88 | 7: iteration 26730/ 173500 | consumed samples: 6842880 | consumed tokens: 14014218240 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.638081E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.854 | TFLOPs: 11.86 | 7: iteration 26740/ 173500 | consumed samples: 6845440 | consumed tokens: 14019461120 | elapsed time per iteration (s): 0.08 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 4.643571E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.512 | TFLOPs: 11.86 | 7: iteration 26750/ 173500 | consumed samples: 6848000 | consumed tokens: 14024704000 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.635086E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.658 | TFLOPs: 11.85 | 7: iteration 26760/ 173500 | consumed samples: 6850560 | consumed tokens: 14029946880 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.636299E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.175 | TFLOPs: 11.82 | 7: iteration 26770/ 173500 | consumed samples: 6853120 | consumed tokens: 14035189760 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.639161E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.496 | TFLOPs: 11.82 | 7: iteration 26780/ 173500 | consumed samples: 6855680 | consumed tokens: 14040432640 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.638651E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.291 | TFLOPs: 11.79 | 7: iteration 26790/ 173500 | consumed samples: 6858240 | consumed tokens: 14045675520 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.631149E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.603 | TFLOPs: 11.85 | 7: iteration 26800/ 173500 | consumed samples: 6860800 | consumed tokens: 14050918400 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.643789E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.560 | TFLOPs: 11.85 | 7: iteration 26810/ 173500 | consumed samples: 6863360 | consumed tokens: 14056161280 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.639615E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.363 | TFLOPs: 11.85 | 7: iteration 26820/ 173500 | consumed samples: 6865920 | consumed tokens: 14061404160 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.645819E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.916 | TFLOPs: 11.78 | 7: iteration 26830/ 173500 | consumed samples: 6868480 | consumed tokens: 14066647040 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.650572E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.315 | TFLOPs: 11.84 | 7: iteration 26840/ 173500 | consumed samples: 6871040 | consumed tokens: 14071889920 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.635849E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.509 | TFLOPs: 11.86 | 7: iteration 26850/ 173500 | consumed samples: 6873600 | consumed tokens: 14077132800 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.642393E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.677 | TFLOPs: 11.58 | 7: iteration 26860/ 173500 | consumed samples: 6876160 | consumed tokens: 14082375680 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.646907E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.324 | TFLOPs: 11.87 | 7: iteration 26870/ 173500 | consumed samples: 6878720 | consumed tokens: 14087618560 | elapsed time per iteration (s): 0.08 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 4.652033E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.108 | TFLOPs: 11.82 | 7: iteration 26880/ 173500 | consumed samples: 6881280 | consumed tokens: 14092861440 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.638193E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.563 | TFLOPs: 11.83 | 7: iteration 26890/ 173500 | consumed samples: 6883840 | consumed tokens: 14098104320 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.625672E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.087 | TFLOPs: 11.81 | 7: iteration 26900/ 173500 | consumed samples: 6886400 | consumed tokens: 14103347200 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.635016E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.201 | TFLOPs: 11.87 | 7: iteration 26910/ 173500 | consumed samples: 6888960 | consumed tokens: 14108590080 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.627896E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.920 | TFLOPs: 11.60 | 7: iteration 26920/ 173500 | consumed samples: 6891520 | consumed tokens: 14113832960 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.621033E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.261 | TFLOPs: 11.80 | 7: iteration 26930/ 173500 | consumed samples: 6894080 | consumed tokens: 14119075840 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.633426E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.456 | TFLOPs: 11.86 | 7: iteration 26940/ 173500 | consumed samples: 6896640 | consumed tokens: 14124318720 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.632956E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.904 | TFLOPs: 11.87 | 7: iteration 26950/ 173500 | consumed samples: 6899200 | consumed tokens: 14129561600 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.627872E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.825 | TFLOPs: 11.84 | 7: iteration 26960/ 173500 | consumed samples: 6901760 | consumed tokens: 14134804480 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.623120E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.995 | TFLOPs: 11.86 | 7: iteration 26970/ 173500 | consumed samples: 6904320 | consumed tokens: 14140047360 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.656812E+00 | grad norm: 0.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.430 | TFLOPs: 11.80 | 7: iteration 26980/ 173500 | consumed samples: 6906880 | consumed tokens: 14145290240 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.629424E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.860 | TFLOPs: 11.85 | 7: iteration 26990/ 173500 | consumed samples: 6909440 | consumed tokens: 14150533120 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.640298E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.947 | TFLOPs: 11.77 | 7: iteration 27000/ 173500 | consumed samples: 6912000 | consumed tokens: 14155776000 | elapsed time per iteration (s): 0.08 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.628633E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.311 | TFLOPs: 11.61 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 27000 | lm loss value: 4.515232E+00 | lm loss PPL: 9.139873E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 27000 to checkpoints_14m91b100m 0: [2023-03-17 00:56:09,567] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step27000 is begin to save! 0: [2023-03-17 00:56:09,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:56:09,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:56:09,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:56:09,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:56:09,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:56:09,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:56:09,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:56:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:56:09,605] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:56:09,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:56:09,607] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:56:09,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:56:09,608] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step27000/mp_rank_00_model_states.pt 0: [2023-03-17 00:56:09,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:56:09,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:56:09,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:56:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2023-03-17 00:56:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2023-03-17 00:56:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:56:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: successfully saved checkpoint at iteration 27000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 76.70 7: iteration 27010/ 173500 | consumed samples: 6914560 | consumed tokens: 14161018880 | elapsed time per iteration (s): 0.13 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 4.635159E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1911.273 | TFLOPs: 7.11 | 7: iteration 27020/ 173500 | consumed samples: 6917120 | consumed tokens: 14166261760 | elapsed time per iteration (s): 0.12 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.636193E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2226.054 | TFLOPs: 8.28 | 7: iteration 27030/ 173500 | consumed samples: 6919680 | consumed tokens: 14171504640 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.634312E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.454 | TFLOPs: 11.94 | 7: iteration 27040/ 173500 | consumed samples: 6922240 | consumed tokens: 14176747520 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.633870E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.794 | TFLOPs: 11.66 | 7: iteration 27050/ 173500 | consumed samples: 6924800 | consumed tokens: 14181990400 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.641074E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.916 | TFLOPs: 11.99 | 7: iteration 27060/ 173500 | consumed samples: 6927360 | consumed tokens: 14187233280 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.634766E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.510 | TFLOPs: 11.98 | 7: iteration 27070/ 173500 | consumed samples: 6929920 | consumed tokens: 14192476160 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.634201E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.180 | TFLOPs: 11.72 | 7: iteration 27080/ 173500 | consumed samples: 6932480 | consumed tokens: 14197719040 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.629359E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.555 | TFLOPs: 11.96 | 7: iteration 27090/ 173500 | consumed samples: 6935040 | consumed tokens: 14202961920 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.623925E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.594 | TFLOPs: 11.89 | 7: iteration 27100/ 173500 | consumed samples: 6937600 | consumed tokens: 14208204800 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.643571E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.615 | TFLOPs: 11.94 | 7: iteration 27110/ 173500 | consumed samples: 6940160 | consumed tokens: 14213447680 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.640347E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.654 | TFLOPs: 11.89 | 7: iteration 27120/ 173500 | consumed samples: 6942720 | consumed tokens: 14218690560 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.633739E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.804 | TFLOPs: 11.65 | 7: iteration 27130/ 173500 | consumed samples: 6945280 | consumed tokens: 14223933440 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.634255E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.061 | TFLOPs: 11.95 | 7: iteration 27140/ 173500 | consumed samples: 6947840 | consumed tokens: 14229176320 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.631612E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.201 | TFLOPs: 11.94 | 7: iteration 27150/ 173500 | consumed samples: 6950400 | consumed tokens: 14234419200 | elapsed time per iteration (s): 0.08 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 4.630271E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.198 | TFLOPs: 11.67 | 7: iteration 27160/ 173500 | consumed samples: 6952960 | consumed tokens: 14239662080 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.625702E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.936 | TFLOPs: 11.91 | 7: iteration 27170/ 173500 | consumed samples: 6955520 | consumed tokens: 14244904960 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.632758E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.685 | TFLOPs: 11.88 | 7: iteration 27180/ 173500 | consumed samples: 6958080 | consumed tokens: 14250147840 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.650670E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.844 | TFLOPs: 11.77 | 7: iteration 27190/ 173500 | consumed samples: 6960640 | consumed tokens: 14255390720 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.624534E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.858 | TFLOPs: 11.88 | 7: iteration 27200/ 173500 | consumed samples: 6963200 | consumed tokens: 14260633600 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.639338E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.874 | TFLOPs: 11.94 | 7: iteration 27210/ 173500 | consumed samples: 6965760 | consumed tokens: 14265876480 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.640955E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.759 | TFLOPs: 11.87 | 7: iteration 27220/ 173500 | consumed samples: 6968320 | consumed tokens: 14271119360 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.629915E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.605 | TFLOPs: 11.60 | 7: iteration 27230/ 173500 | consumed samples: 6970880 | consumed tokens: 14276362240 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.633391E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.949 | TFLOPs: 11.82 | 7: iteration 27240/ 173500 | consumed samples: 6973440 | consumed tokens: 14281605120 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.633780E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.091 | TFLOPs: 11.84 | 7: iteration 27250/ 173500 | consumed samples: 6976000 | consumed tokens: 14286848000 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.635568E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.160 | TFLOPs: 11.86 | 7: iteration 27260/ 173500 | consumed samples: 6978560 | consumed tokens: 14292090880 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.640355E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.142 | TFLOPs: 11.85 | 7: iteration 27270/ 173500 | consumed samples: 6981120 | consumed tokens: 14297333760 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.627836E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.296 | TFLOPs: 11.85 | 7: iteration 27280/ 173500 | consumed samples: 6983680 | consumed tokens: 14302576640 | elapsed time per iteration (s): 0.08 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 4.642742E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.397 | TFLOPs: 11.85 | 7: iteration 27290/ 173500 | consumed samples: 6986240 | consumed tokens: 14307819520 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.629057E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.622 | TFLOPs: 11.82 | 7: iteration 27300/ 173500 | consumed samples: 6988800 | consumed tokens: 14313062400 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.630499E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.439 | TFLOPs: 11.81 | 7: iteration 27310/ 173500 | consumed samples: 6991360 | consumed tokens: 14318305280 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.629400E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.881 | TFLOPs: 11.84 | 7: iteration 27320/ 173500 | consumed samples: 6993920 | consumed tokens: 14323548160 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.617623E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.481 | TFLOPs: 11.86 | 7: iteration 27330/ 173500 | consumed samples: 6996480 | consumed tokens: 14328791040 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.638841E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.216 | TFLOPs: 11.88 | 7: iteration 27340/ 173500 | consumed samples: 6999040 | consumed tokens: 14334033920 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.626986E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.589 | TFLOPs: 11.86 | 7: iteration 27350/ 173500 | consumed samples: 7001600 | consumed tokens: 14339276800 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.629809E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.419 | TFLOPs: 11.84 | 7: iteration 27360/ 173500 | consumed samples: 7004160 | consumed tokens: 14344519680 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.630024E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.503 | TFLOPs: 11.79 | 7: iteration 27370/ 173500 | consumed samples: 7006720 | consumed tokens: 14349762560 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.630421E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.003 | TFLOPs: 11.81 | 7: iteration 27380/ 173500 | consumed samples: 7009280 | consumed tokens: 14355005440 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.623164E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.796 | TFLOPs: 11.75 | 7: iteration 27390/ 173500 | consumed samples: 7011840 | consumed tokens: 14360248320 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.630497E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.319 | TFLOPs: 11.77 | 7: iteration 27400/ 173500 | consumed samples: 7014400 | consumed tokens: 14365491200 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.621630E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.021 | TFLOPs: 11.84 | 7: iteration 27410/ 173500 | consumed samples: 7016960 | consumed tokens: 14370734080 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.634739E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.591 | TFLOPs: 11.83 | 7: iteration 27420/ 173500 | consumed samples: 7019520 | consumed tokens: 14375976960 | elapsed time per iteration (s): 0.08 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 4.626913E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.829 | TFLOPs: 11.50 | 7: iteration 27430/ 173500 | consumed samples: 7022080 | consumed tokens: 14381219840 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.627081E+00 | grad norm: 0.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.532 | TFLOPs: 11.42 | 7: iteration 27440/ 173500 | consumed samples: 7024640 | consumed tokens: 14386462720 | elapsed time per iteration (s): 0.09 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.627352E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.402 | TFLOPs: 10.33 | 7: iteration 27450/ 173500 | consumed samples: 7027200 | consumed tokens: 14391705600 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.627911E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.967 | TFLOPs: 11.62 | 7: iteration 27460/ 173500 | consumed samples: 7029760 | consumed tokens: 14396948480 | elapsed time per iteration (s): 0.10 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.627560E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2630.827 | TFLOPs: 9.79 | 7: iteration 27470/ 173500 | consumed samples: 7032320 | consumed tokens: 14402191360 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.641869E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.816 | TFLOPs: 11.85 | 7: iteration 27480/ 173500 | consumed samples: 7034880 | consumed tokens: 14407434240 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.621892E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.724 | TFLOPs: 11.76 | 7: iteration 27490/ 173500 | consumed samples: 7037440 | consumed tokens: 14412677120 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.631101E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.560 | TFLOPs: 11.89 | 7: iteration 27500/ 173500 | consumed samples: 7040000 | consumed tokens: 14417920000 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.635742E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3042.818 | TFLOPs: 11.32 | 7: iteration 27510/ 173500 | consumed samples: 7042560 | consumed tokens: 14423162880 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.615475E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.920 | TFLOPs: 11.92 | 7: iteration 27520/ 173500 | consumed samples: 7045120 | consumed tokens: 14428405760 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.631289E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.934 | TFLOPs: 11.91 | 7: iteration 27530/ 173500 | consumed samples: 7047680 | consumed tokens: 14433648640 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.627348E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.120 | TFLOPs: 11.90 | 7: iteration 27540/ 173500 | consumed samples: 7050240 | consumed tokens: 14438891520 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.630566E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.400 | TFLOPs: 11.60 | 7: iteration 27550/ 173500 | consumed samples: 7052800 | consumed tokens: 14444134400 | elapsed time per iteration (s): 0.08 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 4.628425E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.715 | TFLOPs: 11.92 | 7: iteration 27560/ 173500 | consumed samples: 7055360 | consumed tokens: 14449377280 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.621151E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.803 | TFLOPs: 11.68 | 7: iteration 27570/ 173500 | consumed samples: 7057920 | consumed tokens: 14454620160 | elapsed time per iteration (s): 0.09 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.628825E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.685 | TFLOPs: 11.20 | 7: iteration 27580/ 173500 | consumed samples: 7060480 | consumed tokens: 14459863040 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.617982E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.481 | TFLOPs: 11.90 | 7: iteration 27590/ 173500 | consumed samples: 7063040 | consumed tokens: 14465105920 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.632119E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.124 | TFLOPs: 11.68 | 7: iteration 27600/ 173500 | consumed samples: 7065600 | consumed tokens: 14470348800 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.616965E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.224 | TFLOPs: 12.02 | 7: iteration 27610/ 173500 | consumed samples: 7068160 | consumed tokens: 14475591680 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.635526E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.934 | TFLOPs: 11.48 | 7: iteration 27620/ 173500 | consumed samples: 7070720 | consumed tokens: 14480834560 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.624790E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.624 | TFLOPs: 11.78 | 7: iteration 27630/ 173500 | consumed samples: 7073280 | consumed tokens: 14486077440 | elapsed time per iteration (s): 0.10 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.628655E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.487 | TFLOPs: 9.33 | 7: iteration 27640/ 173500 | consumed samples: 7075840 | consumed tokens: 14491320320 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.626062E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.093 | TFLOPs: 11.98 | 7: iteration 27650/ 173500 | consumed samples: 7078400 | consumed tokens: 14496563200 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.634415E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.702 | TFLOPs: 11.99 | 7: iteration 27660/ 173500 | consumed samples: 7080960 | consumed tokens: 14501806080 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.622382E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.388 | TFLOPs: 11.92 | 7: iteration 27670/ 173500 | consumed samples: 7083520 | consumed tokens: 14507048960 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.631585E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.121 | TFLOPs: 11.75 | 7: iteration 27680/ 173500 | consumed samples: 7086080 | consumed tokens: 14512291840 | elapsed time per iteration (s): 0.08 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 4.631808E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.125 | TFLOPs: 11.98 | 7: iteration 27690/ 173500 | consumed samples: 7088640 | consumed tokens: 14517534720 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.640884E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.197 | TFLOPs: 12.04 | 7: iteration 27700/ 173500 | consumed samples: 7091200 | consumed tokens: 14522777600 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.632048E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.117 | TFLOPs: 11.77 | 7: iteration 27710/ 173500 | consumed samples: 7093760 | consumed tokens: 14528020480 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.625042E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.553 | TFLOPs: 12.04 | 7: iteration 27720/ 173500 | consumed samples: 7096320 | consumed tokens: 14533263360 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.633612E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.514 | TFLOPs: 12.03 | 7: iteration 27730/ 173500 | consumed samples: 7098880 | consumed tokens: 14538506240 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.628637E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.794 | TFLOPs: 11.71 | 7: iteration 27740/ 173500 | consumed samples: 7101440 | consumed tokens: 14543749120 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.637035E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.688 | TFLOPs: 12.09 | 7: iteration 27750/ 173500 | consumed samples: 7104000 | consumed tokens: 14548992000 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.633113E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.837 | TFLOPs: 12.02 | 7: iteration 27760/ 173500 | consumed samples: 7106560 | consumed tokens: 14554234880 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.636901E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.244 | TFLOPs: 11.54 | 7: iteration 27770/ 173500 | consumed samples: 7109120 | consumed tokens: 14559477760 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.624549E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.634 | TFLOPs: 11.31 | 7: iteration 27780/ 173500 | consumed samples: 7111680 | consumed tokens: 14564720640 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.628677E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.521 | TFLOPs: 12.03 | 7: iteration 27790/ 173500 | consumed samples: 7114240 | consumed tokens: 14569963520 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.634201E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.004 | TFLOPs: 12.11 | 7: iteration 27800/ 173500 | consumed samples: 7116800 | consumed tokens: 14575206400 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.631323E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.026 | TFLOPs: 12.04 | 7: iteration 27810/ 173500 | consumed samples: 7119360 | consumed tokens: 14580449280 | elapsed time per iteration (s): 0.08 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 4.636873E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.775 | TFLOPs: 11.97 | 7: iteration 27820/ 173500 | consumed samples: 7121920 | consumed tokens: 14585692160 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.621444E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.501 | TFLOPs: 11.78 | 7: iteration 27830/ 173500 | consumed samples: 7124480 | consumed tokens: 14590935040 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.646175E+00 | grad norm: 0.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.300 | TFLOPs: 11.74 | 7: iteration 27840/ 173500 | consumed samples: 7127040 | consumed tokens: 14596177920 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.624636E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.853 | TFLOPs: 12.04 | 7: iteration 27850/ 173500 | consumed samples: 7129600 | consumed tokens: 14601420800 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.628513E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.965 | TFLOPs: 11.50 | 7: iteration 27860/ 173500 | consumed samples: 7132160 | consumed tokens: 14606663680 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.634230E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.943 | TFLOPs: 12.04 | 7: iteration 27870/ 173500 | consumed samples: 7134720 | consumed tokens: 14611906560 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.628274E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.126 | TFLOPs: 12.05 | 7: iteration 27880/ 173500 | consumed samples: 7137280 | consumed tokens: 14617149440 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.623990E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.026 | TFLOPs: 11.96 | 7: iteration 27890/ 173500 | consumed samples: 7139840 | consumed tokens: 14622392320 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.625428E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.955 | TFLOPs: 11.73 | 7: iteration 27900/ 173500 | consumed samples: 7142400 | consumed tokens: 14627635200 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.629025E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.580 | TFLOPs: 11.76 | 7: iteration 27910/ 173500 | consumed samples: 7144960 | consumed tokens: 14632878080 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.624785E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.391 | TFLOPs: 11.96 | 7: iteration 27920/ 173500 | consumed samples: 7147520 | consumed tokens: 14638120960 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.622057E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.913 | TFLOPs: 11.95 | 7: iteration 27930/ 173500 | consumed samples: 7150080 | consumed tokens: 14643363840 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.630732E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.696 | TFLOPs: 11.95 | 7: iteration 27940/ 173500 | consumed samples: 7152640 | consumed tokens: 14648606720 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.632555E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.557 | TFLOPs: 11.76 | 7: iteration 27950/ 173500 | consumed samples: 7155200 | consumed tokens: 14653849600 | elapsed time per iteration (s): 0.08 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 4.631984E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.698 | TFLOPs: 11.92 | 7: iteration 27960/ 173500 | consumed samples: 7157760 | consumed tokens: 14659092480 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.620463E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.573 | TFLOPs: 11.92 | 7: iteration 27970/ 173500 | consumed samples: 7160320 | consumed tokens: 14664335360 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.620048E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.407 | TFLOPs: 11.95 | 7: iteration 27980/ 173500 | consumed samples: 7162880 | consumed tokens: 14669578240 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.637536E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.778 | TFLOPs: 11.40 | 7: iteration 27990/ 173500 | consumed samples: 7165440 | consumed tokens: 14674821120 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.640569E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.366 | TFLOPs: 11.99 | 0: [2023-03-17 00:57:31,460] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=0, lr=[0.00018981345832700956, 0.00018981345832700956, 0.00018981345832700956], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 28000/ 173500 | consumed samples: 7168000 | consumed tokens: 14680064000 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.633697E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.309 | TFLOPs: 11.42 | 0: steps: 28000 loss: 4.6062 iter time (s): 0.083 samples/sec: 3097.267 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 28000 | lm loss value: 4.494139E+00 | lm loss PPL: 8.949106E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 28000 to checkpoints_14m91b100m 0: [2023-03-17 00:57:31,518] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step28000 is begin to save! 0: [2023-03-17 00:57:31,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:57:31,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:57:31,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:57:31,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:57:31,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:57:31,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:57:31,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:57:31,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:57:31,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:57:31,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:57:31,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:57:31,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:57:31,560] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step28000/mp_rank_00_model_states.pt 0: [2023-03-17 00:57:31,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:57:31,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:57:31,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:57:31,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2023-03-17 00:57:31,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2023-03-17 00:57:31,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:57:31,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:57:31,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: successfully saved checkpoint at iteration 28000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.07 7: iteration 28010/ 173500 | consumed samples: 7170560 | consumed tokens: 14685306880 | elapsed time per iteration (s): 0.20 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.625886E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1250.703 | TFLOPs: 4.65 | 7: iteration 28020/ 173500 | consumed samples: 7173120 | consumed tokens: 14690549760 | elapsed time per iteration (s): 0.09 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.627505E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.362 | TFLOPs: 10.80 | 7: iteration 28030/ 173500 | consumed samples: 7175680 | consumed tokens: 14695792640 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.622065E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.184 | TFLOPs: 11.72 | 7: iteration 28040/ 173500 | consumed samples: 7178240 | consumed tokens: 14701035520 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.619501E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.690 | TFLOPs: 11.37 | 7: iteration 28050/ 173500 | consumed samples: 7180800 | consumed tokens: 14706278400 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.640529E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.884 | TFLOPs: 11.84 | 7: iteration 28060/ 173500 | consumed samples: 7183360 | consumed tokens: 14711521280 | elapsed time per iteration (s): 0.08 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.626354E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.574 | TFLOPs: 11.37 | 7: iteration 28070/ 173500 | consumed samples: 7185920 | consumed tokens: 14716764160 | elapsed time per iteration (s): 0.09 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.630510E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.457 | TFLOPs: 11.12 | 7: iteration 28080/ 173500 | consumed samples: 7188480 | consumed tokens: 14722007040 | elapsed time per iteration (s): 0.09 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 4.618602E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2878.239 | TFLOPs: 10.71 | 7: iteration 28090/ 173500 | consumed samples: 7191040 | consumed tokens: 14727249920 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.623392E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.409 | TFLOPs: 11.35 | 7: iteration 28100/ 173500 | consumed samples: 7193600 | consumed tokens: 14732492800 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.635629E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2937.252 | TFLOPs: 10.93 | 7: iteration 28110/ 173500 | consumed samples: 7196160 | consumed tokens: 14737735680 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.625494E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.096 | TFLOPs: 11.17 | 7: iteration 28120/ 173500 | consumed samples: 7198720 | consumed tokens: 14742978560 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.614053E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.816 | TFLOPs: 10.93 | 7: iteration 28130/ 173500 | consumed samples: 7201280 | consumed tokens: 14748221440 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.624214E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.716 | TFLOPs: 11.59 | 7: iteration 28140/ 173500 | consumed samples: 7203840 | consumed tokens: 14753464320 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.623845E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.345 | TFLOPs: 11.39 | 7: iteration 28150/ 173500 | consumed samples: 7206400 | consumed tokens: 14758707200 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.636243E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.543 | TFLOPs: 11.00 | 7: iteration 28160/ 173500 | consumed samples: 7208960 | consumed tokens: 14763950080 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.624355E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.917 | TFLOPs: 11.63 | 7: iteration 28170/ 173500 | consumed samples: 7211520 | consumed tokens: 14769192960 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.629871E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.532 | TFLOPs: 10.70 | 7: iteration 28180/ 173500 | consumed samples: 7214080 | consumed tokens: 14774435840 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.619286E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.282 | TFLOPs: 11.58 | 7: iteration 28190/ 173500 | consumed samples: 7216640 | consumed tokens: 14779678720 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.636996E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.657 | TFLOPs: 11.86 | 7: iteration 28200/ 173500 | consumed samples: 7219200 | consumed tokens: 14784921600 | elapsed time per iteration (s): 0.08 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.623466E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.625 | TFLOPs: 11.92 | 7: iteration 28210/ 173500 | consumed samples: 7221760 | consumed tokens: 14790164480 | elapsed time per iteration (s): 0.09 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 4.608255E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.924 | TFLOPs: 10.48 | 7: iteration 28220/ 173500 | consumed samples: 7224320 | consumed tokens: 14795407360 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.618165E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.625 | TFLOPs: 11.88 | 7: iteration 28230/ 173500 | consumed samples: 7226880 | consumed tokens: 14800650240 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.620817E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.516 | TFLOPs: 11.86 | 7: iteration 28240/ 173500 | consumed samples: 7229440 | consumed tokens: 14805893120 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.625265E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.648 | TFLOPs: 11.59 | 7: iteration 28250/ 173500 | consumed samples: 7232000 | consumed tokens: 14811136000 | elapsed time per iteration (s): 0.10 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.624855E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2493.559 | TFLOPs: 9.27 | 7: iteration 28260/ 173500 | consumed samples: 7234560 | consumed tokens: 14816378880 | elapsed time per iteration (s): 0.11 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.617733E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2408.269 | TFLOPs: 8.96 | 7: iteration 28270/ 173500 | consumed samples: 7237120 | consumed tokens: 14821621760 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.631550E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.779 | TFLOPs: 11.94 | 7: iteration 28280/ 173500 | consumed samples: 7239680 | consumed tokens: 14826864640 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.619360E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.249 | TFLOPs: 11.79 | 7: iteration 28290/ 173500 | consumed samples: 7242240 | consumed tokens: 14832107520 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.619595E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.192 | TFLOPs: 11.89 | 7: iteration 28300/ 173500 | consumed samples: 7244800 | consumed tokens: 14837350400 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.614520E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.192 | TFLOPs: 12.08 | 7: iteration 28310/ 173500 | consumed samples: 7247360 | consumed tokens: 14842593280 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.615764E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.631 | TFLOPs: 11.88 | 7: iteration 28320/ 173500 | consumed samples: 7249920 | consumed tokens: 14847836160 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.629769E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.603 | TFLOPs: 11.88 | 7: iteration 28330/ 173500 | consumed samples: 7252480 | consumed tokens: 14853079040 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.616921E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.395 | TFLOPs: 12.00 | 7: iteration 28340/ 173500 | consumed samples: 7255040 | consumed tokens: 14858321920 | elapsed time per iteration (s): 0.08 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 4.609879E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.999 | TFLOPs: 11.93 | 7: iteration 28350/ 173500 | consumed samples: 7257600 | consumed tokens: 14863564800 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.625864E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.136 | TFLOPs: 11.42 | 7: iteration 28360/ 173500 | consumed samples: 7260160 | consumed tokens: 14868807680 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.621875E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.019 | TFLOPs: 11.97 | 7: iteration 28370/ 173500 | consumed samples: 7262720 | consumed tokens: 14874050560 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.625427E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.056 | TFLOPs: 12.01 | 7: iteration 28380/ 173500 | consumed samples: 7265280 | consumed tokens: 14879293440 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.636426E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.131 | TFLOPs: 11.52 | 7: iteration 28390/ 173500 | consumed samples: 7267840 | consumed tokens: 14884536320 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.615445E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.798 | TFLOPs: 12.00 | 7: iteration 28400/ 173500 | consumed samples: 7270400 | consumed tokens: 14889779200 | elapsed time per iteration (s): 0.10 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.630534E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2568.606 | TFLOPs: 9.55 | 7: iteration 28410/ 173500 | consumed samples: 7272960 | consumed tokens: 14895022080 | elapsed time per iteration (s): 0.09 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.612838E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.826 | TFLOPs: 10.80 | 7: iteration 28420/ 173500 | consumed samples: 7275520 | consumed tokens: 14900264960 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.608929E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.018 | TFLOPs: 11.97 | 7: iteration 28430/ 173500 | consumed samples: 7278080 | consumed tokens: 14905507840 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.617768E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.934 | TFLOPs: 11.73 | 7: iteration 28440/ 173500 | consumed samples: 7280640 | consumed tokens: 14910750720 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.614311E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.955 | TFLOPs: 11.91 | 7: iteration 28450/ 173500 | consumed samples: 7283200 | consumed tokens: 14915993600 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.610836E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.691 | TFLOPs: 11.97 | 7: iteration 28460/ 173500 | consumed samples: 7285760 | consumed tokens: 14921236480 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.642279E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.758 | TFLOPs: 11.91 | 7: iteration 28470/ 173500 | consumed samples: 7288320 | consumed tokens: 14926479360 | elapsed time per iteration (s): 0.08 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 4.622740E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.327 | TFLOPs: 11.93 | 7: iteration 28480/ 173500 | consumed samples: 7290880 | consumed tokens: 14931722240 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.611832E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.993 | TFLOPs: 11.90 | 7: iteration 28490/ 173500 | consumed samples: 7293440 | consumed tokens: 14936965120 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.618286E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.449 | TFLOPs: 11.91 | 7: iteration 28500/ 173500 | consumed samples: 7296000 | consumed tokens: 14942208000 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.621174E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.748 | TFLOPs: 11.95 | 7: iteration 28510/ 173500 | consumed samples: 7298560 | consumed tokens: 14947450880 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.629789E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.496 | TFLOPs: 11.96 | 7: iteration 28520/ 173500 | consumed samples: 7301120 | consumed tokens: 14952693760 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.627204E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.210 | TFLOPs: 11.92 | 7: iteration 28530/ 173500 | consumed samples: 7303680 | consumed tokens: 14957936640 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.623603E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.148 | TFLOPs: 11.96 | 7: iteration 28540/ 173500 | consumed samples: 7306240 | consumed tokens: 14963179520 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.620617E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.264 | TFLOPs: 11.92 | 7: iteration 28550/ 173500 | consumed samples: 7308800 | consumed tokens: 14968422400 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.614094E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.496 | TFLOPs: 11.90 | 7: iteration 28560/ 173500 | consumed samples: 7311360 | consumed tokens: 14973665280 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.621458E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.802 | TFLOPs: 11.59 | 7: iteration 28570/ 173500 | consumed samples: 7313920 | consumed tokens: 14978908160 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.631385E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.124 | TFLOPs: 11.95 | 7: iteration 28580/ 173500 | consumed samples: 7316480 | consumed tokens: 14984151040 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.621725E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.675 | TFLOPs: 11.92 | 7: iteration 28590/ 173500 | consumed samples: 7319040 | consumed tokens: 14989393920 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.623820E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.177 | TFLOPs: 11.97 | 7: iteration 28600/ 173500 | consumed samples: 7321600 | consumed tokens: 14994636800 | elapsed time per iteration (s): 0.08 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 4.618672E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.635 | TFLOPs: 11.92 | 7: iteration 28610/ 173500 | consumed samples: 7324160 | consumed tokens: 14999879680 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.618221E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.531 | TFLOPs: 11.95 | 7: iteration 28620/ 173500 | consumed samples: 7326720 | consumed tokens: 15005122560 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.615110E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.212 | TFLOPs: 11.94 | 7: iteration 28630/ 173500 | consumed samples: 7329280 | consumed tokens: 15010365440 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.628485E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.747 | TFLOPs: 11.93 | 7: iteration 28640/ 173500 | consumed samples: 7331840 | consumed tokens: 15015608320 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.620911E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.059 | TFLOPs: 11.97 | 7: iteration 28650/ 173500 | consumed samples: 7334400 | consumed tokens: 15020851200 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.617964E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.931 | TFLOPs: 11.93 | 7: iteration 28660/ 173500 | consumed samples: 7336960 | consumed tokens: 15026094080 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.633766E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.506 | TFLOPs: 11.90 | 7: iteration 28670/ 173500 | consumed samples: 7339520 | consumed tokens: 15031336960 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.626856E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.743 | TFLOPs: 11.89 | 7: iteration 28680/ 173500 | consumed samples: 7342080 | consumed tokens: 15036579840 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.617349E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.919 | TFLOPs: 11.81 | 7: iteration 28690/ 173500 | consumed samples: 7344640 | consumed tokens: 15041822720 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.623293E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.906 | TFLOPs: 11.78 | 7: iteration 28700/ 173500 | consumed samples: 7347200 | consumed tokens: 15047065600 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.632399E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.093 | TFLOPs: 11.84 | 7: iteration 28710/ 173500 | consumed samples: 7349760 | consumed tokens: 15052308480 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.626967E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.014 | TFLOPs: 11.90 | 7: iteration 28720/ 173500 | consumed samples: 7352320 | consumed tokens: 15057551360 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.628748E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.007 | TFLOPs: 11.63 | 7: iteration 28730/ 173500 | consumed samples: 7354880 | consumed tokens: 15062794240 | elapsed time per iteration (s): 0.08 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 4.626948E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.866 | TFLOPs: 11.93 | 7: iteration 28740/ 173500 | consumed samples: 7357440 | consumed tokens: 15068037120 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.607157E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.222 | TFLOPs: 11.67 | 7: iteration 28750/ 173500 | consumed samples: 7360000 | consumed tokens: 15073280000 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.619518E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.827 | TFLOPs: 11.95 | 7: iteration 28760/ 173500 | consumed samples: 7362560 | consumed tokens: 15078522880 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.616530E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.310 | TFLOPs: 11.63 | 7: iteration 28770/ 173500 | consumed samples: 7365120 | consumed tokens: 15083765760 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.626223E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.316 | TFLOPs: 11.71 | 7: iteration 28780/ 173500 | consumed samples: 7367680 | consumed tokens: 15089008640 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.629853E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.845 | TFLOPs: 11.91 | 7: iteration 28790/ 173500 | consumed samples: 7370240 | consumed tokens: 15094251520 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.622130E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.487 | TFLOPs: 11.91 | 7: iteration 28800/ 173500 | consumed samples: 7372800 | consumed tokens: 15099494400 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.617913E+00 | grad norm: 0.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.149 | TFLOPs: 11.85 | 7: iteration 28810/ 173500 | consumed samples: 7375360 | consumed tokens: 15104737280 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.624823E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.558 | TFLOPs: 11.69 | 7: iteration 28820/ 173500 | consumed samples: 7377920 | consumed tokens: 15109980160 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.630036E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.782 | TFLOPs: 11.97 | 7: iteration 28830/ 173500 | consumed samples: 7380480 | consumed tokens: 15115223040 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.619744E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.844 | TFLOPs: 11.98 | 7: iteration 28840/ 173500 | consumed samples: 7383040 | consumed tokens: 15120465920 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.623300E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.049 | TFLOPs: 12.02 | 7: iteration 28850/ 173500 | consumed samples: 7385600 | consumed tokens: 15125708800 | elapsed time per iteration (s): 0.08 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 4.621615E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.382 | TFLOPs: 12.06 | 7: iteration 28860/ 173500 | consumed samples: 7388160 | consumed tokens: 15130951680 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.611943E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.187 | TFLOPs: 11.86 | 7: iteration 28870/ 173500 | consumed samples: 7390720 | consumed tokens: 15136194560 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.619762E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.109 | TFLOPs: 12.03 | 7: iteration 28880/ 173500 | consumed samples: 7393280 | consumed tokens: 15141437440 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.617604E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.265 | TFLOPs: 12.07 | 7: iteration 28890/ 173500 | consumed samples: 7395840 | consumed tokens: 15146680320 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.623677E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.937 | TFLOPs: 12.06 | 7: iteration 28900/ 173500 | consumed samples: 7398400 | consumed tokens: 15151923200 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.614962E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.580 | TFLOPs: 12.04 | 7: iteration 28910/ 173500 | consumed samples: 7400960 | consumed tokens: 15157166080 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.615380E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.129 | TFLOPs: 11.62 | 7: iteration 28920/ 173500 | consumed samples: 7403520 | consumed tokens: 15162408960 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.611093E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.543 | TFLOPs: 11.98 | 7: iteration 28930/ 173500 | consumed samples: 7406080 | consumed tokens: 15167651840 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.614144E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.667 | TFLOPs: 12.05 | 7: iteration 28940/ 173500 | consumed samples: 7408640 | consumed tokens: 15172894720 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.616502E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.217 | TFLOPs: 12.03 | 7: iteration 28950/ 173500 | consumed samples: 7411200 | consumed tokens: 15178137600 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.621623E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.176 | TFLOPs: 12.01 | 7: iteration 28960/ 173500 | consumed samples: 7413760 | consumed tokens: 15183380480 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.615898E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.925 | TFLOPs: 12.07 | 7: iteration 28970/ 173500 | consumed samples: 7416320 | consumed tokens: 15188623360 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.621643E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.104 | TFLOPs: 12.01 | 7: iteration 28980/ 173500 | consumed samples: 7418880 | consumed tokens: 15193866240 | elapsed time per iteration (s): 0.08 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 4.610484E+00 | grad norm: 0.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.017 | TFLOPs: 12.04 | 7: iteration 28990/ 173500 | consumed samples: 7421440 | consumed tokens: 15199109120 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.617734E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.260 | TFLOPs: 12.02 | 7: iteration 29000/ 173500 | consumed samples: 7424000 | consumed tokens: 15204352000 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.632682E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.971 | TFLOPs: 12.04 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 29000 | lm loss value: 4.435247E+00 | lm loss PPL: 8.437300E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 29000 to checkpoints_14m91b100m 0: [2023-03-17 00:58:54,452] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step29000 is begin to save! 0: [2023-03-17 00:58:54,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_01-model_00-model_states.pt... 0: [2023-03-17 00:58:54,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_01-model_00-model_states.pt. 0: [2023-03-17 00:58:54,482] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_03-model_00-model_states.pt... 0: [2023-03-17 00:58:54,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_03-model_00-model_states.pt. 0: [2023-03-17 00:58:54,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_04-model_00-model_states.pt... 0: [2023-03-17 00:58:54,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_04-model_00-model_states.pt. 0: [2023-03-17 00:58:54,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_05-model_00-model_states.pt... 0: [2023-03-17 00:58:54,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_05-model_00-model_states.pt. 0: [2023-03-17 00:58:54,491] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_06-model_00-model_states.pt... 0: [2023-03-17 00:58:54,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_06-model_00-model_states.pt. 0: [2023-03-17 00:58:54,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/layer_08-model_00-model_states.pt... 0: [2023-03-17 00:58:54,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/layer_08-model_00-model_states.pt. 0: [2023-03-17 00:58:54,495] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step29000/mp_rank_00_model_states.pt 0: [2023-03-17 00:58:54,495] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/mp_rank_00_model_states.pt... 0: [2023-03-17 00:58:54,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/mp_rank_00_model_states.pt. 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 00:58:54,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 00:58:54,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 00:58:54,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 00:58:54,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 00:58:54,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 00:58:54,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: successfully saved checkpoint at iteration 29000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.62 7: iteration 29010/ 173500 | consumed samples: 7426560 | consumed tokens: 15209594880 | elapsed time per iteration (s): 0.10 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.619199E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.685 | TFLOPs: 9.92 | 7: iteration 29020/ 173500 | consumed samples: 7429120 | consumed tokens: 15214837760 | elapsed time per iteration (s): 0.11 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.606644E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2336.688 | TFLOPs: 8.69 | 7: iteration 29030/ 173500 | consumed samples: 7431680 | consumed tokens: 15220080640 | elapsed time per iteration (s): 0.12 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.610559E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2101.058 | TFLOPs: 7.82 | 7: iteration 29040/ 173500 | consumed samples: 7434240 | consumed tokens: 15225323520 | elapsed time per iteration (s): 0.12 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.623931E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2086.278 | TFLOPs: 7.76 | 7: iteration 29050/ 173500 | consumed samples: 7436800 | consumed tokens: 15230566400 | elapsed time per iteration (s): 0.11 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.621689E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.514 | TFLOPs: 8.35 | 7: iteration 29060/ 173500 | consumed samples: 7439360 | consumed tokens: 15235809280 | elapsed time per iteration (s): 0.09 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.618901E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.571 | TFLOPs: 11.19 | 7: iteration 29070/ 173500 | consumed samples: 7441920 | consumed tokens: 15241052160 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.619957E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.818 | TFLOPs: 11.83 | 7: iteration 29080/ 173500 | consumed samples: 7444480 | consumed tokens: 15246295040 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.612542E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.839 | TFLOPs: 11.81 | 7: iteration 29090/ 173500 | consumed samples: 7447040 | consumed tokens: 15251537920 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.628431E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.578 | TFLOPs: 11.87 | 7: iteration 29100/ 173500 | consumed samples: 7449600 | consumed tokens: 15256780800 | elapsed time per iteration (s): 0.10 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.617469E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2588.724 | TFLOPs: 9.63 | 7: iteration 29110/ 173500 | consumed samples: 7452160 | consumed tokens: 15262023680 | elapsed time per iteration (s): 0.08 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 4.621135E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.438 | TFLOPs: 11.64 | 7: iteration 29120/ 173500 | consumed samples: 7454720 | consumed tokens: 15267266560 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.617713E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.616 | TFLOPs: 11.60 | 7: iteration 29130/ 173500 | consumed samples: 7457280 | consumed tokens: 15272509440 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.619260E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.836 | TFLOPs: 11.86 | 7: iteration 29140/ 173500 | consumed samples: 7459840 | consumed tokens: 15277752320 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.625543E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.205 | TFLOPs: 11.85 | 7: iteration 29150/ 173500 | consumed samples: 7462400 | consumed tokens: 15282995200 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.626820E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.024 | TFLOPs: 11.23 | 7: iteration 29160/ 173500 | consumed samples: 7464960 | consumed tokens: 15288238080 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.612566E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.611 | TFLOPs: 11.86 | 7: iteration 29170/ 173500 | consumed samples: 7467520 | consumed tokens: 15293480960 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.637650E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.698 | TFLOPs: 11.86 | 7: iteration 29180/ 173500 | consumed samples: 7470080 | consumed tokens: 15298723840 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.624991E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.532 | TFLOPs: 11.29 | 7: iteration 29190/ 173500 | consumed samples: 7472640 | consumed tokens: 15303966720 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.615893E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.710 | TFLOPs: 11.28 | 7: iteration 29200/ 173500 | consumed samples: 7475200 | consumed tokens: 15309209600 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.613729E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.955 | TFLOPs: 11.86 | 7: iteration 29210/ 173500 | consumed samples: 7477760 | consumed tokens: 15314452480 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.623065E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.821 | TFLOPs: 11.58 | 7: iteration 29220/ 173500 | consumed samples: 7480320 | consumed tokens: 15319695360 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.614994E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.388 | TFLOPs: 11.54 | 7: iteration 29230/ 173500 | consumed samples: 7482880 | consumed tokens: 15324938240 | elapsed time per iteration (s): 0.08 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 4.620213E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.735 | TFLOPs: 11.79 | 7: iteration 29240/ 173500 | consumed samples: 7485440 | consumed tokens: 15330181120 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.591517E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.280 | TFLOPs: 11.86 | 7: iteration 29250/ 173500 | consumed samples: 7488000 | consumed tokens: 15335424000 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.621110E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.762 | TFLOPs: 11.79 | 7: iteration 29260/ 173500 | consumed samples: 7490560 | consumed tokens: 15340666880 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.621010E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.611 | TFLOPs: 11.84 | 7: iteration 29270/ 173500 | consumed samples: 7493120 | consumed tokens: 15345909760 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.611463E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.449 | TFLOPs: 11.88 | 7: iteration 29280/ 173500 | consumed samples: 7495680 | consumed tokens: 15351152640 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.614014E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.391 | TFLOPs: 11.35 | 7: iteration 29290/ 173500 | consumed samples: 7498240 | consumed tokens: 15356395520 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.632531E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.242 | TFLOPs: 11.59 | 7: iteration 29300/ 173500 | consumed samples: 7500800 | consumed tokens: 15361638400 | elapsed time per iteration (s): 0.08 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.623290E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.148 | TFLOPs: 11.80 | 7: iteration 29310/ 173500 | consumed samples: 7503360 | consumed tokens: 15366881280 | elapsed time per iteration (s): 0.09 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.614951E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2826.622 | TFLOPs: 10.51 | 7: iteration 29320/ 173500 | consumed samples: 7505920 | consumed tokens: 15372124160 | elapsed time per iteration (s): 0.12 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.623254E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.548 | TFLOPs: 8.07 | 7: iteration 29330/ 173500 | consumed samples: 7508480 | consumed tokens: 15377367040 | elapsed time per iteration (s): 0.13 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.627520E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.206 | TFLOPs: 7.24 | 7: iteration 29340/ 173500 | consumed samples: 7511040 | consumed tokens: 15382609920 | elapsed time per iteration (s): 0.13 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.620013E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.373 | TFLOPs: 7.35 | 7: iteration 29350/ 173500 | consumed samples: 7513600 | consumed tokens: 15387852800 | elapsed time per iteration (s): 0.13 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.611904E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.266 | TFLOPs: 7.46 | 7: iteration 29360/ 173500 | consumed samples: 7516160 | consumed tokens: 15393095680 | elapsed time per iteration (s): 0.12 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 4.620069E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2058.972 | TFLOPs: 7.66 | 7: iteration 29370/ 173500 | consumed samples: 7518720 | consumed tokens: 15398338560 | elapsed time per iteration (s): 0.10 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.603785E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2597.688 | TFLOPs: 9.66 | 7: iteration 29380/ 173500 | consumed samples: 7521280 | consumed tokens: 15403581440 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.616872E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.513 | TFLOPs: 11.86 | 7: iteration 29390/ 173500 | consumed samples: 7523840 | consumed tokens: 15408824320 | elapsed time per iteration (s): 0.09 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.605066E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.015 | TFLOPs: 10.20 | 7: iteration 29400/ 173500 | consumed samples: 7526400 | consumed tokens: 15414067200 | elapsed time per iteration (s): 0.11 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.610648E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.161 | TFLOPs: 8.59 | 7: iteration 29410/ 173500 | consumed samples: 7528960 | consumed tokens: 15419310080 | elapsed time per iteration (s): 0.09 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.613928E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.710 | TFLOPs: 11.08 | 7: iteration 29420/ 173500 | consumed samples: 7531520 | consumed tokens: 15424552960 | elapsed time per iteration (s): 0.09 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.616846E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.489 | TFLOPs: 11.07 | 7: iteration 29430/ 173500 | consumed samples: 7534080 | consumed tokens: 15429795840 | elapsed time per iteration (s): 0.09 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.603649E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.115 | TFLOPs: 11.08 | 7: iteration 29440/ 173500 | consumed samples: 7536640 | consumed tokens: 15435038720 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.624737E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.142 | TFLOPs: 11.61 | 7: iteration 29450/ 173500 | consumed samples: 7539200 | consumed tokens: 15440281600 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.603886E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.176 | TFLOPs: 11.55 | 7: iteration 29460/ 173500 | consumed samples: 7541760 | consumed tokens: 15445524480 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.602468E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.367 | TFLOPs: 11.60 | 7: iteration 29470/ 173500 | consumed samples: 7544320 | consumed tokens: 15450767360 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.599149E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.405 | TFLOPs: 11.64 | 7: iteration 29480/ 173500 | consumed samples: 7546880 | consumed tokens: 15456010240 | elapsed time per iteration (s): 0.09 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.612769E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.578 | TFLOPs: 10.80 | 7: iteration 29490/ 173500 | consumed samples: 7549440 | consumed tokens: 15461253120 | elapsed time per iteration (s): 0.08 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 4.619375E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.185 | TFLOPs: 11.88 | 7: iteration 29500/ 173500 | consumed samples: 7552000 | consumed tokens: 15466496000 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.613976E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.073 | TFLOPs: 11.90 | 7: iteration 29510/ 173500 | consumed samples: 7554560 | consumed tokens: 15471738880 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.610003E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.320 | TFLOPs: 11.52 | 7: iteration 29520/ 173500 | consumed samples: 7557120 | consumed tokens: 15476981760 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.597089E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.193 | TFLOPs: 11.92 | 7: iteration 29530/ 173500 | consumed samples: 7559680 | consumed tokens: 15482224640 | elapsed time per iteration (s): 0.09 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.613325E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2980.216 | TFLOPs: 11.09 | 7: iteration 29540/ 173500 | consumed samples: 7562240 | consumed tokens: 15487467520 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.608904E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.102 | TFLOPs: 11.60 | 7: iteration 29550/ 173500 | consumed samples: 7564800 | consumed tokens: 15492710400 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.621249E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.122 | TFLOPs: 11.88 | 7: iteration 29560/ 173500 | consumed samples: 7567360 | consumed tokens: 15497953280 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.616747E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.261 | TFLOPs: 11.41 | 7: iteration 29570/ 173500 | consumed samples: 7569920 | consumed tokens: 15503196160 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.617048E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.775 | TFLOPs: 11.92 | 7: iteration 29580/ 173500 | consumed samples: 7572480 | consumed tokens: 15508439040 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.627369E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.368 | TFLOPs: 11.65 | 7: iteration 29590/ 173500 | consumed samples: 7575040 | consumed tokens: 15513681920 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.615605E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.954 | TFLOPs: 11.84 | 7: iteration 29600/ 173500 | consumed samples: 7577600 | consumed tokens: 15518924800 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.622466E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.252 | TFLOPs: 11.52 | 7: iteration 29610/ 173500 | consumed samples: 7580160 | consumed tokens: 15524167680 | elapsed time per iteration (s): 0.08 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 4.614203E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.096 | TFLOPs: 11.87 | 7: iteration 29620/ 173500 | consumed samples: 7582720 | consumed tokens: 15529410560 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.610221E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.815 | TFLOPs: 11.86 | 7: iteration 29630/ 173500 | consumed samples: 7585280 | consumed tokens: 15534653440 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.619446E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.972 | TFLOPs: 11.29 | 7: iteration 29640/ 173500 | consumed samples: 7587840 | consumed tokens: 15539896320 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.609514E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.048 | TFLOPs: 11.32 | 7: iteration 29650/ 173500 | consumed samples: 7590400 | consumed tokens: 15545139200 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.616709E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.194 | TFLOPs: 11.85 | 7: iteration 29660/ 173500 | consumed samples: 7592960 | consumed tokens: 15550382080 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.622626E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.637 | TFLOPs: 11.32 | 7: iteration 29670/ 173500 | consumed samples: 7595520 | consumed tokens: 15555624960 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.617524E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.246 | TFLOPs: 11.35 | 7: iteration 29680/ 173500 | consumed samples: 7598080 | consumed tokens: 15560867840 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.623397E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.686 | TFLOPs: 11.27 | 7: iteration 29690/ 173500 | consumed samples: 7600640 | consumed tokens: 15566110720 | elapsed time per iteration (s): 0.09 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.625357E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.849 | TFLOPs: 10.79 | 7: iteration 29700/ 173500 | consumed samples: 7603200 | consumed tokens: 15571353600 | elapsed time per iteration (s): 0.08 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.610582E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.249 | TFLOPs: 11.34 | 7: iteration 29710/ 173500 | consumed samples: 7605760 | consumed tokens: 15576596480 | elapsed time per iteration (s): 0.09 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.608028E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.542 | TFLOPs: 10.80 | 7: iteration 29720/ 173500 | consumed samples: 7608320 | consumed tokens: 15581839360 | elapsed time per iteration (s): 0.09 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.618647E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.130 | TFLOPs: 11.11 | 7: iteration 29730/ 173500 | consumed samples: 7610880 | consumed tokens: 15587082240 | elapsed time per iteration (s): 0.09 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 4.610105E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.490 | TFLOPs: 11.14 | 7: iteration 29740/ 173500 | consumed samples: 7613440 | consumed tokens: 15592325120 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.622252E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.552 | TFLOPs: 11.64 | 7: iteration 29750/ 173500 | consumed samples: 7616000 | consumed tokens: 15597568000 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.634785E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.598 | TFLOPs: 11.31 | 7: iteration 29760/ 173500 | consumed samples: 7618560 | consumed tokens: 15602810880 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.619324E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.669 | TFLOPs: 11.61 | 7: iteration 29770/ 173500 | consumed samples: 7621120 | consumed tokens: 15608053760 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.601547E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.767 | TFLOPs: 11.70 | 7: iteration 29780/ 173500 | consumed samples: 7623680 | consumed tokens: 15613296640 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.613184E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.175 | TFLOPs: 11.70 | 7: iteration 29790/ 173500 | consumed samples: 7626240 | consumed tokens: 15618539520 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.613583E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.927 | TFLOPs: 11.68 | 7: iteration 29800/ 173500 | consumed samples: 7628800 | consumed tokens: 15623782400 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.607922E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.269 | TFLOPs: 11.68 | 7: iteration 29810/ 173500 | consumed samples: 7631360 | consumed tokens: 15629025280 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.623201E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.726 | TFLOPs: 11.98 | 7: iteration 29820/ 173500 | consumed samples: 7633920 | consumed tokens: 15634268160 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.614302E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.785 | TFLOPs: 11.38 | 7: iteration 29830/ 173500 | consumed samples: 7636480 | consumed tokens: 15639511040 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.618128E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.360 | TFLOPs: 11.59 | 7: iteration 29840/ 173500 | consumed samples: 7639040 | consumed tokens: 15644753920 | elapsed time per iteration (s): 0.09 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.612536E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.474 | TFLOPs: 11.12 | 7: iteration 29850/ 173500 | consumed samples: 7641600 | consumed tokens: 15649996800 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.609165E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.900 | TFLOPs: 11.99 | 7: iteration 29860/ 173500 | consumed samples: 7644160 | consumed tokens: 15655239680 | elapsed time per iteration (s): 0.08 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 4.621061E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.357 | TFLOPs: 11.96 | 7: iteration 29870/ 173500 | consumed samples: 7646720 | consumed tokens: 15660482560 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.613886E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.137 | TFLOPs: 11.64 | 7: iteration 29880/ 173500 | consumed samples: 7649280 | consumed tokens: 15665725440 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.611869E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.191 | TFLOPs: 11.43 | 7: iteration 29890/ 173500 | consumed samples: 7651840 | consumed tokens: 15670968320 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.608279E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.986 | TFLOPs: 11.79 | 7: iteration 29900/ 173500 | consumed samples: 7654400 | consumed tokens: 15676211200 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.612726E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.221 | TFLOPs: 11.67 | 7: iteration 29910/ 173500 | consumed samples: 7656960 | consumed tokens: 15681454080 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.613733E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.752 | TFLOPs: 11.62 | 7: iteration 29920/ 173500 | consumed samples: 7659520 | consumed tokens: 15686696960 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.589930E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.435 | TFLOPs: 11.91 | 7: iteration 29930/ 173500 | consumed samples: 7662080 | consumed tokens: 15691939840 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.619740E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.323 | TFLOPs: 11.87 | 7: iteration 29940/ 173500 | consumed samples: 7664640 | consumed tokens: 15697182720 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.598738E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.508 | TFLOPs: 11.82 | 7: iteration 29950/ 173500 | consumed samples: 7667200 | consumed tokens: 15702425600 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.615747E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.285 | TFLOPs: 11.84 | 7: iteration 29960/ 173500 | consumed samples: 7669760 | consumed tokens: 15707668480 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.609785E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.062 | TFLOPs: 11.93 | 7: iteration 29970/ 173500 | consumed samples: 7672320 | consumed tokens: 15712911360 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.609035E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.783 | TFLOPs: 11.89 | 7: iteration 29980/ 173500 | consumed samples: 7674880 | consumed tokens: 15718154240 | elapsed time per iteration (s): 0.08 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 4.608094E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.544 | TFLOPs: 11.89 | 7: iteration 29990/ 173500 | consumed samples: 7677440 | consumed tokens: 15723397120 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.614597E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.910 | TFLOPs: 11.53 | 0: [2023-03-17 01:00:21,131] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=0, lr=[0.00018823900512431258, 0.00018823900512431258, 0.00018823900512431258], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 30000/ 173500 | consumed samples: 7680000 | consumed tokens: 15728640000 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.618875E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.726 | TFLOPs: 11.81 | 0: steps: 30000 loss: 4.6070 iter time (s): 0.084 samples/sec: 3061.235 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 30000 | lm loss value: 4.490486E+00 | lm loss PPL: 8.916474E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 30000 to checkpoints_14m91b100m 0: [2023-03-17 01:00:21,190] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is begin to save! 0: [2023-03-17 01:00:21,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:00:21,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:00:21,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:00:21,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:00:21,222] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:00:21,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:00:21,225] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:00:21,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:00:21,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:00:21,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:00:21,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:00:21,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:00:21,232] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step30000/mp_rank_00_model_states.pt 0: [2023-03-17 01:00:21,232] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:00:21,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:00:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:00:21,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:00:21,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:00:21,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:00:21,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 01:00:21,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 01:00:21,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2023-03-17 01:00:21,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2023-03-17 01:00:21,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:00:21,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:00:21,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: successfully saved checkpoint at iteration 30000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.90 7: iteration 30010/ 173500 | consumed samples: 7682560 | consumed tokens: 15733882880 | elapsed time per iteration (s): 0.09 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.609666E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2745.226 | TFLOPs: 10.21 | 7: iteration 30020/ 173500 | consumed samples: 7685120 | consumed tokens: 15739125760 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.607979E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.110 | TFLOPs: 11.94 | 7: iteration 30030/ 173500 | consumed samples: 7687680 | consumed tokens: 15744368640 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.602888E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.901 | TFLOPs: 11.93 | 7: iteration 30040/ 173500 | consumed samples: 7690240 | consumed tokens: 15749611520 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.609743E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.814 | TFLOPs: 11.23 | 7: iteration 30050/ 173500 | consumed samples: 7692800 | consumed tokens: 15754854400 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.599371E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.184 | TFLOPs: 11.88 | 7: iteration 30060/ 173500 | consumed samples: 7695360 | consumed tokens: 15760097280 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.611240E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.587 | TFLOPs: 11.92 | 7: iteration 30070/ 173500 | consumed samples: 7697920 | consumed tokens: 15765340160 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.613456E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.192 | TFLOPs: 11.87 | 7: iteration 30080/ 173500 | consumed samples: 7700480 | consumed tokens: 15770583040 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.616552E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.177 | TFLOPs: 11.91 | 7: iteration 30090/ 173500 | consumed samples: 7703040 | consumed tokens: 15775825920 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.599561E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.363 | TFLOPs: 11.99 | 7: iteration 30100/ 173500 | consumed samples: 7705600 | consumed tokens: 15781068800 | elapsed time per iteration (s): 0.08 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 4.622218E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.974 | TFLOPs: 11.95 | 7: iteration 30110/ 173500 | consumed samples: 7708160 | consumed tokens: 15786311680 | elapsed time per iteration (s): 0.08 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.606994E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.094 | TFLOPs: 11.95 | 7: iteration 30120/ 173500 | consumed samples: 7710720 | consumed tokens: 15791554560 | elapsed time per iteration (s): 0.08 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.608234E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.662 | TFLOPs: 11.22 | 7: iteration 30130/ 173500 | consumed samples: 7713280 | consumed tokens: 15796797440 | elapsed time per iteration (s): 0.08 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.621307E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.435 | TFLOPs: 11.87 | 7: iteration 30140/ 173500 | consumed samples: 7715840 | consumed tokens: 15802040320 | elapsed time per iteration (s): 0.08 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.611681E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.772 | TFLOPs: 11.34 | 7: iteration 30150/ 173500 | consumed samples: 7718400 | consumed tokens: 15807283200 | elapsed time per iteration (s): 0.11 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.629929E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2284.558 | TFLOPs: 8.50 | 7: iteration 30160/ 173500 | consumed samples: 7720960 | consumed tokens: 15812526080 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.591652E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1933.577 | TFLOPs: 7.19 | 7: iteration 30170/ 173500 | consumed samples: 7723520 | consumed tokens: 15817768960 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.615557E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1994.856 | TFLOPs: 7.42 | 7: iteration 30180/ 173500 | consumed samples: 7726080 | consumed tokens: 15823011840 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.609703E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2031.408 | TFLOPs: 7.56 | 7: iteration 30190/ 173500 | consumed samples: 7728640 | consumed tokens: 15828254720 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.620411E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.246 | TFLOPs: 7.26 | 7: iteration 30200/ 173500 | consumed samples: 7731200 | consumed tokens: 15833497600 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.609034E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.153 | TFLOPs: 7.35 | 7: iteration 30210/ 173500 | consumed samples: 7733760 | consumed tokens: 15838740480 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.616724E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.265 | TFLOPs: 7.28 | 7: iteration 30220/ 173500 | consumed samples: 7736320 | consumed tokens: 15843983360 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.609719E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.498 | TFLOPs: 7.35 | 7: iteration 30230/ 173500 | consumed samples: 7738880 | consumed tokens: 15849226240 | elapsed time per iteration (s): 0.13 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 4.606926E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.354 | TFLOPs: 7.24 | 7: iteration 30240/ 173500 | consumed samples: 7741440 | consumed tokens: 15854469120 | elapsed time per iteration (s): 0.14 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.609045E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1871.040 | TFLOPs: 6.96 | 7: iteration 30250/ 173500 | consumed samples: 7744000 | consumed tokens: 15859712000 | elapsed time per iteration (s): 0.11 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.598681E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.601 | TFLOPs: 8.88 | 7: iteration 30260/ 173500 | consumed samples: 7746560 | consumed tokens: 15864954880 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.605658E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.105 | TFLOPs: 11.49 | 7: iteration 30270/ 173500 | consumed samples: 7749120 | consumed tokens: 15870197760 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.609238E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.867 | TFLOPs: 11.79 | 7: iteration 30280/ 173500 | consumed samples: 7751680 | consumed tokens: 15875440640 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.628694E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.859 | TFLOPs: 11.73 | 7: iteration 30290/ 173500 | consumed samples: 7754240 | consumed tokens: 15880683520 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.602093E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.690 | TFLOPs: 11.82 | 7: iteration 30300/ 173500 | consumed samples: 7756800 | consumed tokens: 15885926400 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.617336E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.191 | TFLOPs: 11.52 | 7: iteration 30310/ 173500 | consumed samples: 7759360 | consumed tokens: 15891169280 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.620439E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.663 | TFLOPs: 11.81 | 7: iteration 30320/ 173500 | consumed samples: 7761920 | consumed tokens: 15896412160 | elapsed time per iteration (s): 0.12 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.598644E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2218.341 | TFLOPs: 8.25 | 7: iteration 30330/ 173500 | consumed samples: 7764480 | consumed tokens: 15901655040 | elapsed time per iteration (s): 0.10 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.600780E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2683.297 | TFLOPs: 9.98 | 7: iteration 30340/ 173500 | consumed samples: 7767040 | consumed tokens: 15906897920 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.607528E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.913 | TFLOPs: 11.74 | 7: iteration 30350/ 173500 | consumed samples: 7769600 | consumed tokens: 15912140800 | elapsed time per iteration (s): 0.08 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 4.620935E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.993 | TFLOPs: 11.79 | 7: iteration 30360/ 173500 | consumed samples: 7772160 | consumed tokens: 15917383680 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.602877E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.248 | TFLOPs: 11.80 | 7: iteration 30370/ 173500 | consumed samples: 7774720 | consumed tokens: 15922626560 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.602571E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.503 | TFLOPs: 11.76 | 7: iteration 30380/ 173500 | consumed samples: 7777280 | consumed tokens: 15927869440 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.596401E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.573 | TFLOPs: 11.59 | 7: iteration 30390/ 173500 | consumed samples: 7779840 | consumed tokens: 15933112320 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.615961E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.154 | TFLOPs: 11.86 | 7: iteration 30400/ 173500 | consumed samples: 7782400 | consumed tokens: 15938355200 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.613462E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.938 | TFLOPs: 11.82 | 7: iteration 30410/ 173500 | consumed samples: 7784960 | consumed tokens: 15943598080 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.615921E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.734 | TFLOPs: 11.60 | 7: iteration 30420/ 173500 | consumed samples: 7787520 | consumed tokens: 15948840960 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.608416E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.106 | TFLOPs: 11.81 | 7: iteration 30430/ 173500 | consumed samples: 7790080 | consumed tokens: 15954083840 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.596059E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.935 | TFLOPs: 11.84 | 7: iteration 30440/ 173500 | consumed samples: 7792640 | consumed tokens: 15959326720 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.607215E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.770 | TFLOPs: 11.57 | 7: iteration 30450/ 173500 | consumed samples: 7795200 | consumed tokens: 15964569600 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.605727E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.097 | TFLOPs: 11.94 | 7: iteration 30460/ 173500 | consumed samples: 7797760 | consumed tokens: 15969812480 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.609710E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.130 | TFLOPs: 11.85 | 7: iteration 30470/ 173500 | consumed samples: 7800320 | consumed tokens: 15975055360 | elapsed time per iteration (s): 0.08 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 4.600623E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.368 | TFLOPs: 12.00 | 7: iteration 30480/ 173500 | consumed samples: 7802880 | consumed tokens: 15980298240 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.612032E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.697 | TFLOPs: 11.88 | 7: iteration 30490/ 173500 | consumed samples: 7805440 | consumed tokens: 15985541120 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.607655E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.519 | TFLOPs: 11.99 | 7: iteration 30500/ 173500 | consumed samples: 7808000 | consumed tokens: 15990784000 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.604396E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.265 | TFLOPs: 11.99 | 7: iteration 30510/ 173500 | consumed samples: 7810560 | consumed tokens: 15996026880 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.602680E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.379 | TFLOPs: 11.95 | 7: iteration 30520/ 173500 | consumed samples: 7813120 | consumed tokens: 16001269760 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.609026E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.852 | TFLOPs: 12.01 | 7: iteration 30530/ 173500 | consumed samples: 7815680 | consumed tokens: 16006512640 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.606288E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.732 | TFLOPs: 11.96 | 7: iteration 30540/ 173500 | consumed samples: 7818240 | consumed tokens: 16011755520 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.610256E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.430 | TFLOPs: 12.03 | 7: iteration 30550/ 173500 | consumed samples: 7820800 | consumed tokens: 16016998400 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.609216E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.048 | TFLOPs: 11.95 | 7: iteration 30560/ 173500 | consumed samples: 7823360 | consumed tokens: 16022241280 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.605521E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.532 | TFLOPs: 12.01 | 7: iteration 30570/ 173500 | consumed samples: 7825920 | consumed tokens: 16027484160 | elapsed time per iteration (s): 0.08 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.619341E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.163 | TFLOPs: 11.93 | 7: iteration 30580/ 173500 | consumed samples: 7828480 | consumed tokens: 16032727040 | elapsed time per iteration (s): 0.09 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.613922E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2853.584 | TFLOPs: 10.61 | 7: iteration 30590/ 173500 | consumed samples: 7831040 | consumed tokens: 16037969920 | elapsed time per iteration (s): 0.09 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 4.605664E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.790 | TFLOPs: 10.22 | 7: iteration 30600/ 173500 | consumed samples: 7833600 | consumed tokens: 16043212800 | elapsed time per iteration (s): 0.09 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.621199E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.945 | TFLOPs: 10.94 | 7: iteration 30610/ 173500 | consumed samples: 7836160 | consumed tokens: 16048455680 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.603268E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.517 | TFLOPs: 11.70 | 7: iteration 30620/ 173500 | consumed samples: 7838720 | consumed tokens: 16053698560 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.606132E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.259 | TFLOPs: 11.99 | 7: iteration 30630/ 173500 | consumed samples: 7841280 | consumed tokens: 16058941440 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.617466E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.528 | TFLOPs: 11.98 | 7: iteration 30640/ 173500 | consumed samples: 7843840 | consumed tokens: 16064184320 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.613222E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.129 | TFLOPs: 11.96 | 7: iteration 30650/ 173500 | consumed samples: 7846400 | consumed tokens: 16069427200 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.607561E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.571 | TFLOPs: 11.96 | 7: iteration 30660/ 173500 | consumed samples: 7848960 | consumed tokens: 16074670080 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.612364E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.613 | TFLOPs: 11.72 | 7: iteration 30670/ 173500 | consumed samples: 7851520 | consumed tokens: 16079912960 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.603551E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.244 | TFLOPs: 11.96 | 7: iteration 30680/ 173500 | consumed samples: 7854080 | consumed tokens: 16085155840 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.589418E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.944 | TFLOPs: 11.98 | 7: iteration 30690/ 173500 | consumed samples: 7856640 | consumed tokens: 16090398720 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.597165E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.834 | TFLOPs: 11.98 | 7: iteration 30700/ 173500 | consumed samples: 7859200 | consumed tokens: 16095641600 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.614913E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.213 | TFLOPs: 11.75 | 7: iteration 30710/ 173500 | consumed samples: 7861760 | consumed tokens: 16100884480 | elapsed time per iteration (s): 0.08 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 4.603012E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.150 | TFLOPs: 11.70 | 7: iteration 30720/ 173500 | consumed samples: 7864320 | consumed tokens: 16106127360 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.610893E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.836 | TFLOPs: 12.01 | 7: iteration 30730/ 173500 | consumed samples: 7866880 | consumed tokens: 16111370240 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.617542E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.018 | TFLOPs: 11.62 | 7: iteration 30740/ 173500 | consumed samples: 7869440 | consumed tokens: 16116613120 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.604820E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.079 | TFLOPs: 11.87 | 7: iteration 30750/ 173500 | consumed samples: 7872000 | consumed tokens: 16121856000 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.605291E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.619 | TFLOPs: 11.99 | 7: iteration 30760/ 173500 | consumed samples: 7874560 | consumed tokens: 16127098880 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.604134E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.139 | TFLOPs: 11.97 | 7: iteration 30770/ 173500 | consumed samples: 7877120 | consumed tokens: 16132341760 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.615438E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.629 | TFLOPs: 11.87 | 7: iteration 30780/ 173500 | consumed samples: 7879680 | consumed tokens: 16137584640 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.609610E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.472 | TFLOPs: 11.37 | 7: iteration 30790/ 173500 | consumed samples: 7882240 | consumed tokens: 16142827520 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.601966E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.003 | TFLOPs: 12.02 | 7: iteration 30800/ 173500 | consumed samples: 7884800 | consumed tokens: 16148070400 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.590928E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.814 | TFLOPs: 12.03 | 7: iteration 30810/ 173500 | consumed samples: 7887360 | consumed tokens: 16153313280 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.620921E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.323 | TFLOPs: 11.95 | 7: iteration 30820/ 173500 | consumed samples: 7889920 | consumed tokens: 16158556160 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.605574E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.033 | TFLOPs: 12.01 | 7: iteration 30830/ 173500 | consumed samples: 7892480 | consumed tokens: 16163799040 | elapsed time per iteration (s): 0.08 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 4.608229E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.441 | TFLOPs: 12.03 | 7: iteration 30840/ 173500 | consumed samples: 7895040 | consumed tokens: 16169041920 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.610360E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.830 | TFLOPs: 11.73 | 7: iteration 30850/ 173500 | consumed samples: 7897600 | consumed tokens: 16174284800 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.607347E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.699 | TFLOPs: 11.30 | 7: iteration 30860/ 173500 | consumed samples: 7900160 | consumed tokens: 16179527680 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.595228E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.969 | TFLOPs: 11.50 | 7: iteration 30870/ 173500 | consumed samples: 7902720 | consumed tokens: 16184770560 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.604610E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.238 | TFLOPs: 11.99 | 7: iteration 30880/ 173500 | consumed samples: 7905280 | consumed tokens: 16190013440 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.612568E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.370 | TFLOPs: 12.00 | 7: iteration 30890/ 173500 | consumed samples: 7907840 | consumed tokens: 16195256320 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.609379E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.454 | TFLOPs: 11.99 | 7: iteration 30900/ 173500 | consumed samples: 7910400 | consumed tokens: 16200499200 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.606100E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.729 | TFLOPs: 11.95 | 7: iteration 30910/ 173500 | consumed samples: 7912960 | consumed tokens: 16205742080 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.598291E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.662 | TFLOPs: 12.01 | 7: iteration 30920/ 173500 | consumed samples: 7915520 | consumed tokens: 16210984960 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.599740E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.234 | TFLOPs: 11.98 | 7: iteration 30930/ 173500 | consumed samples: 7918080 | consumed tokens: 16216227840 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.610556E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.722 | TFLOPs: 12.01 | 7: iteration 30940/ 173500 | consumed samples: 7920640 | consumed tokens: 16221470720 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.595970E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.710 | TFLOPs: 12.01 | 7: iteration 30950/ 173500 | consumed samples: 7923200 | consumed tokens: 16226713600 | elapsed time per iteration (s): 0.08 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 4.615451E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.244 | TFLOPs: 11.68 | 7: iteration 30960/ 173500 | consumed samples: 7925760 | consumed tokens: 16231956480 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.600742E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.531 | TFLOPs: 11.73 | 7: iteration 30970/ 173500 | consumed samples: 7928320 | consumed tokens: 16237199360 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.615440E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.306 | TFLOPs: 11.96 | 7: iteration 30980/ 173500 | consumed samples: 7930880 | consumed tokens: 16242442240 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.596626E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.967 | TFLOPs: 11.89 | 7: iteration 30990/ 173500 | consumed samples: 7933440 | consumed tokens: 16247685120 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.603381E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.563 | TFLOPs: 12.02 | 7: iteration 31000/ 173500 | consumed samples: 7936000 | consumed tokens: 16252928000 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.605992E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.271 | TFLOPs: 12.03 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 31000 | lm loss value: 4.471336E+00 | lm loss PPL: 8.747350E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 31000 to checkpoints_14m91b100m 0: [2023-03-17 01:01:47,615] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step31000 is begin to save! 0: [2023-03-17 01:01:47,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:01:47,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:01:47,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:01:47,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:01:47,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:01:47,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:01:47,650] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:01:47,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:01:47,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:01:47,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:01:47,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:01:47,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:01:47,657] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step31000/mp_rank_00_model_states.pt 0: [2023-03-17 01:01:47,657] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:01:47,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:01:47,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:01:47,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 3: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2023-03-17 01:01:47,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: successfully saved checkpoint at iteration 31000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.63 7: iteration 31010/ 173500 | consumed samples: 7938560 | consumed tokens: 16258170880 | elapsed time per iteration (s): 0.09 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.606512E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.258 | TFLOPs: 10.57 | 7: iteration 31020/ 173500 | consumed samples: 7941120 | consumed tokens: 16263413760 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.605315E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.332 | TFLOPs: 12.05 | 7: iteration 31030/ 173500 | consumed samples: 7943680 | consumed tokens: 16268656640 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.614442E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.516 | TFLOPs: 11.85 | 7: iteration 31040/ 173500 | consumed samples: 7946240 | consumed tokens: 16273899520 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.605953E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.591 | TFLOPs: 11.90 | 7: iteration 31050/ 173500 | consumed samples: 7948800 | consumed tokens: 16279142400 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.606660E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.318 | TFLOPs: 11.91 | 7: iteration 31060/ 173500 | consumed samples: 7951360 | consumed tokens: 16284385280 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.615188E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.926 | TFLOPs: 11.91 | 7: iteration 31070/ 173500 | consumed samples: 7953920 | consumed tokens: 16289628160 | elapsed time per iteration (s): 0.08 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 4.615858E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.109 | TFLOPs: 11.93 | 7: iteration 31080/ 173500 | consumed samples: 7956480 | consumed tokens: 16294871040 | elapsed time per iteration (s): 0.11 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.598598E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.861 | TFLOPs: 8.99 | 7: iteration 31090/ 173500 | consumed samples: 7959040 | consumed tokens: 16300113920 | elapsed time per iteration (s): 0.10 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.606446E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2620.620 | TFLOPs: 9.75 | 7: iteration 31100/ 173500 | consumed samples: 7961600 | consumed tokens: 16305356800 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.620876E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.275 | TFLOPs: 11.88 | 7: iteration 31110/ 173500 | consumed samples: 7964160 | consumed tokens: 16310599680 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.597321E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.254 | TFLOPs: 11.90 | 7: iteration 31120/ 173500 | consumed samples: 7966720 | consumed tokens: 16315842560 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.616433E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.378 | TFLOPs: 11.89 | 7: iteration 31130/ 173500 | consumed samples: 7969280 | consumed tokens: 16321085440 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.624387E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.483 | TFLOPs: 11.84 | 7: iteration 31140/ 173500 | consumed samples: 7971840 | consumed tokens: 16326328320 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.606993E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.502 | TFLOPs: 11.83 | 7: iteration 31150/ 173500 | consumed samples: 7974400 | consumed tokens: 16331571200 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.605987E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.052 | TFLOPs: 11.89 | 7: iteration 31160/ 173500 | consumed samples: 7976960 | consumed tokens: 16336814080 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.608394E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.187 | TFLOPs: 11.88 | 7: iteration 31170/ 173500 | consumed samples: 7979520 | consumed tokens: 16342056960 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.598729E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.672 | TFLOPs: 11.83 | 7: iteration 31180/ 173500 | consumed samples: 7982080 | consumed tokens: 16347299840 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.604648E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.280 | TFLOPs: 11.40 | 7: iteration 31190/ 173500 | consumed samples: 7984640 | consumed tokens: 16352542720 | elapsed time per iteration (s): 0.08 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 4.616003E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.128 | TFLOPs: 11.65 | 7: iteration 31200/ 173500 | consumed samples: 7987200 | consumed tokens: 16357785600 | elapsed time per iteration (s): 0.09 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.615535E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.385 | TFLOPs: 10.77 | 7: iteration 31210/ 173500 | consumed samples: 7989760 | consumed tokens: 16363028480 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.598426E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.442 | TFLOPs: 11.76 | 7: iteration 31220/ 173500 | consumed samples: 7992320 | consumed tokens: 16368271360 | elapsed time per iteration (s): 0.09 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.610597E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.282 | TFLOPs: 10.84 | 7: iteration 31230/ 173500 | consumed samples: 7994880 | consumed tokens: 16373514240 | elapsed time per iteration (s): 0.12 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.612729E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.734 | TFLOPs: 7.75 | 7: iteration 31240/ 173500 | consumed samples: 7997440 | consumed tokens: 16378757120 | elapsed time per iteration (s): 0.12 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.618264E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2161.392 | TFLOPs: 8.04 | 7: iteration 31250/ 173500 | consumed samples: 8000000 | consumed tokens: 16384000000 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.605261E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.621 | TFLOPs: 11.80 | 7: iteration 31260/ 173500 | consumed samples: 8002560 | consumed tokens: 16389242880 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.612012E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.551 | TFLOPs: 11.91 | 7: iteration 31270/ 173500 | consumed samples: 8005120 | consumed tokens: 16394485760 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.606341E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.052 | TFLOPs: 11.90 | 7: iteration 31280/ 173500 | consumed samples: 8007680 | consumed tokens: 16399728640 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.603002E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.839 | TFLOPs: 11.93 | 7: iteration 31290/ 173500 | consumed samples: 8010240 | consumed tokens: 16404971520 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.596990E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.112 | TFLOPs: 11.92 | 7: iteration 31300/ 173500 | consumed samples: 8012800 | consumed tokens: 16410214400 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.598563E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.110 | TFLOPs: 11.84 | 7: iteration 31310/ 173500 | consumed samples: 8015360 | consumed tokens: 16415457280 | elapsed time per iteration (s): 0.08 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 4.595913E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.953 | TFLOPs: 11.92 | 7: iteration 31320/ 173500 | consumed samples: 8017920 | consumed tokens: 16420700160 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.614894E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.566 | TFLOPs: 11.74 | 7: iteration 31330/ 173500 | consumed samples: 8020480 | consumed tokens: 16425943040 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.595550E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.979 | TFLOPs: 11.89 | 7: iteration 31340/ 173500 | consumed samples: 8023040 | consumed tokens: 16431185920 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.602680E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.012 | TFLOPs: 11.83 | 7: iteration 31350/ 173500 | consumed samples: 8025600 | consumed tokens: 16436428800 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.611779E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.882 | TFLOPs: 11.87 | 7: iteration 31360/ 173500 | consumed samples: 8028160 | consumed tokens: 16441671680 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.594379E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.378 | TFLOPs: 11.89 | 7: iteration 31370/ 173500 | consumed samples: 8030720 | consumed tokens: 16446914560 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.609667E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.181 | TFLOPs: 11.85 | 7: iteration 31380/ 173500 | consumed samples: 8033280 | consumed tokens: 16452157440 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.614236E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.323 | TFLOPs: 11.93 | 7: iteration 31390/ 173500 | consumed samples: 8035840 | consumed tokens: 16457400320 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.619906E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.761 | TFLOPs: 11.82 | 7: iteration 31400/ 173500 | consumed samples: 8038400 | consumed tokens: 16462643200 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.591737E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.002 | TFLOPs: 11.98 | 7: iteration 31410/ 173500 | consumed samples: 8040960 | consumed tokens: 16467886080 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.612988E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.535 | TFLOPs: 11.69 | 7: iteration 31420/ 173500 | consumed samples: 8043520 | consumed tokens: 16473128960 | elapsed time per iteration (s): 0.08 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 4.599783E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.560 | TFLOPs: 11.83 | 7: iteration 31430/ 173500 | consumed samples: 8046080 | consumed tokens: 16478371840 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.604589E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.247 | TFLOPs: 11.88 | 7: iteration 31440/ 173500 | consumed samples: 8048640 | consumed tokens: 16483614720 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.618225E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.983 | TFLOPs: 11.88 | 7: iteration 31450/ 173500 | consumed samples: 8051200 | consumed tokens: 16488857600 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.606818E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.094 | TFLOPs: 11.90 | 7: iteration 31460/ 173500 | consumed samples: 8053760 | consumed tokens: 16494100480 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.599471E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.031 | TFLOPs: 11.87 | 7: iteration 31470/ 173500 | consumed samples: 8056320 | consumed tokens: 16499343360 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.611974E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.807 | TFLOPs: 11.86 | 7: iteration 31480/ 173500 | consumed samples: 8058880 | consumed tokens: 16504586240 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.597849E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.205 | TFLOPs: 11.79 | 7: iteration 31490/ 173500 | consumed samples: 8061440 | consumed tokens: 16509829120 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.598948E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.800 | TFLOPs: 11.86 | 7: iteration 31500/ 173500 | consumed samples: 8064000 | consumed tokens: 16515072000 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.601333E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.764 | TFLOPs: 11.96 | 7: iteration 31510/ 173500 | consumed samples: 8066560 | consumed tokens: 16520314880 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.606248E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.182 | TFLOPs: 11.63 | 7: iteration 31520/ 173500 | consumed samples: 8069120 | consumed tokens: 16525557760 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.602367E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.601 | TFLOPs: 11.94 | 7: iteration 31530/ 173500 | consumed samples: 8071680 | consumed tokens: 16530800640 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.601929E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.963 | TFLOPs: 11.37 | 7: iteration 31540/ 173500 | consumed samples: 8074240 | consumed tokens: 16536043520 | elapsed time per iteration (s): 0.08 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 4.605022E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.978 | TFLOPs: 11.86 | 7: iteration 31550/ 173500 | consumed samples: 8076800 | consumed tokens: 16541286400 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.615329E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.970 | TFLOPs: 11.87 | 7: iteration 31560/ 173500 | consumed samples: 8079360 | consumed tokens: 16546529280 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.602353E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.665 | TFLOPs: 11.81 | 7: iteration 31570/ 173500 | consumed samples: 8081920 | consumed tokens: 16551772160 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.614240E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.430 | TFLOPs: 11.88 | 7: iteration 31580/ 173500 | consumed samples: 8084480 | consumed tokens: 16557015040 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.611393E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.306 | TFLOPs: 11.90 | 7: iteration 31590/ 173500 | consumed samples: 8087040 | consumed tokens: 16562257920 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.596311E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.854 | TFLOPs: 11.87 | 7: iteration 31600/ 173500 | consumed samples: 8089600 | consumed tokens: 16567500800 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.592022E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3017.751 | TFLOPs: 11.22 | 7: iteration 31610/ 173500 | consumed samples: 8092160 | consumed tokens: 16572743680 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.610071E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.308 | TFLOPs: 11.94 | 7: iteration 31620/ 173500 | consumed samples: 8094720 | consumed tokens: 16577986560 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.603823E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.703 | TFLOPs: 11.92 | 7: iteration 31630/ 173500 | consumed samples: 8097280 | consumed tokens: 16583229440 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.603284E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.616 | TFLOPs: 11.96 | 7: iteration 31640/ 173500 | consumed samples: 8099840 | consumed tokens: 16588472320 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.612774E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.812 | TFLOPs: 11.91 | 7: iteration 31650/ 173500 | consumed samples: 8102400 | consumed tokens: 16593715200 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.596312E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.501 | TFLOPs: 11.78 | 7: iteration 31660/ 173500 | consumed samples: 8104960 | consumed tokens: 16598958080 | elapsed time per iteration (s): 0.08 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 4.601313E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.746 | TFLOPs: 11.86 | 7: iteration 31670/ 173500 | consumed samples: 8107520 | consumed tokens: 16604200960 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.599038E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.214 | TFLOPs: 11.90 | 7: iteration 31680/ 173500 | consumed samples: 8110080 | consumed tokens: 16609443840 | elapsed time per iteration (s): 0.09 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.620179E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.945 | TFLOPs: 10.96 | 7: iteration 31690/ 173500 | consumed samples: 8112640 | consumed tokens: 16614686720 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.601318E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.167 | TFLOPs: 11.88 | 7: iteration 31700/ 173500 | consumed samples: 8115200 | consumed tokens: 16619929600 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.604105E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.765 | TFLOPs: 11.78 | 7: iteration 31710/ 173500 | consumed samples: 8117760 | consumed tokens: 16625172480 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.601587E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.146 | TFLOPs: 11.91 | 7: iteration 31720/ 173500 | consumed samples: 8120320 | consumed tokens: 16630415360 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.610581E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.622 | TFLOPs: 11.89 | 7: iteration 31730/ 173500 | consumed samples: 8122880 | consumed tokens: 16635658240 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.605147E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.287 | TFLOPs: 11.77 | 7: iteration 31740/ 173500 | consumed samples: 8125440 | consumed tokens: 16640901120 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.607871E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.542 | TFLOPs: 11.87 | 7: iteration 31750/ 173500 | consumed samples: 8128000 | consumed tokens: 16646144000 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.605545E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.722 | TFLOPs: 11.79 | 7: iteration 31760/ 173500 | consumed samples: 8130560 | consumed tokens: 16651386880 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.608265E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.269 | TFLOPs: 11.90 | 7: iteration 31770/ 173500 | consumed samples: 8133120 | consumed tokens: 16656629760 | elapsed time per iteration (s): 0.08 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 4.595850E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.678 | TFLOPs: 11.89 | 7: iteration 31780/ 173500 | consumed samples: 8135680 | consumed tokens: 16661872640 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.614537E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.396 | TFLOPs: 11.89 | 7: iteration 31790/ 173500 | consumed samples: 8138240 | consumed tokens: 16667115520 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.610009E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.489 | TFLOPs: 11.86 | 7: iteration 31800/ 173500 | consumed samples: 8140800 | consumed tokens: 16672358400 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.607935E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.727 | TFLOPs: 11.91 | 7: iteration 31810/ 173500 | consumed samples: 8143360 | consumed tokens: 16677601280 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.616494E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.501 | TFLOPs: 11.93 | 7: iteration 31820/ 173500 | consumed samples: 8145920 | consumed tokens: 16682844160 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.604300E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.356 | TFLOPs: 11.87 | 7: iteration 31830/ 173500 | consumed samples: 8148480 | consumed tokens: 16688087040 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.611148E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.385 | TFLOPs: 11.89 | 7: iteration 31840/ 173500 | consumed samples: 8151040 | consumed tokens: 16693329920 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.600711E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.488 | TFLOPs: 12.00 | 7: iteration 31850/ 173500 | consumed samples: 8153600 | consumed tokens: 16698572800 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.600100E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.965 | TFLOPs: 11.80 | 7: iteration 31860/ 173500 | consumed samples: 8156160 | consumed tokens: 16703815680 | elapsed time per iteration (s): 0.09 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.612594E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2882.003 | TFLOPs: 10.72 | 7: iteration 31870/ 173500 | consumed samples: 8158720 | consumed tokens: 16709058560 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.613788E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.070 | TFLOPs: 12.01 | 7: iteration 31880/ 173500 | consumed samples: 8161280 | consumed tokens: 16714301440 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.591855E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.090 | TFLOPs: 11.89 | 7: iteration 31890/ 173500 | consumed samples: 8163840 | consumed tokens: 16719544320 | elapsed time per iteration (s): 0.08 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 4.603499E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.073 | TFLOPs: 12.00 | 7: iteration 31900/ 173500 | consumed samples: 8166400 | consumed tokens: 16724787200 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.618462E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.054 | TFLOPs: 11.98 | 7: iteration 31910/ 173500 | consumed samples: 8168960 | consumed tokens: 16730030080 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.606039E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.311 | TFLOPs: 12.02 | 7: iteration 31920/ 173500 | consumed samples: 8171520 | consumed tokens: 16735272960 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.600980E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.472 | TFLOPs: 12.01 | 7: iteration 31930/ 173500 | consumed samples: 8174080 | consumed tokens: 16740515840 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.591012E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.489 | TFLOPs: 12.01 | 7: iteration 31940/ 173500 | consumed samples: 8176640 | consumed tokens: 16745758720 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.625143E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.458 | TFLOPs: 11.93 | 7: iteration 31950/ 173500 | consumed samples: 8179200 | consumed tokens: 16751001600 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.600018E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.265 | TFLOPs: 11.88 | 7: iteration 31960/ 173500 | consumed samples: 8181760 | consumed tokens: 16756244480 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.591325E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.904 | TFLOPs: 11.91 | 7: iteration 31970/ 173500 | consumed samples: 8184320 | consumed tokens: 16761487360 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.593361E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.686 | TFLOPs: 11.63 | 7: iteration 31980/ 173500 | consumed samples: 8186880 | consumed tokens: 16766730240 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.606589E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.919 | TFLOPs: 11.91 | 7: iteration 31990/ 173500 | consumed samples: 8189440 | consumed tokens: 16771973120 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.600830E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.786 | TFLOPs: 11.92 | 0: [2023-03-17 01:03:09,513] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=0, lr=[0.00018655987222005428, 0.00018655987222005428, 0.00018655987222005428], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 32000/ 173500 | consumed samples: 8192000 | consumed tokens: 16777216000 | elapsed time per iteration (s): 0.08 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.601144E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.976 | TFLOPs: 11.94 | 0: steps: 32000 loss: 4.6398 iter time (s): 0.083 samples/sec: 3080.666 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 32000 | lm loss value: 4.462266E+00 | lm loss PPL: 8.668371E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 32000 to checkpoints_14m91b100m 0: [2023-03-17 01:03:09,571] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step32000 is begin to save! 0: [2023-03-17 01:03:09,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:03:09,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:03:09,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:03:09,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:03:09,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:03:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:03:09,606] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:03:09,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:03:09,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:03:09,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:03:09,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:03:09,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:03:09,613] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step32000/mp_rank_00_model_states.pt 0: [2023-03-17 01:03:09,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:03:09,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:03:09,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:03:09,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:03:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2023-03-17 01:03:09,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:03:09,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2023-03-17 01:03:09,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:03:09,646] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:03:09,646] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2023-03-17 01:03:09,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:03:09,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:03:09,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: successfully saved checkpoint at iteration 32000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.93 7: iteration 32010/ 173500 | consumed samples: 8194560 | consumed tokens: 16782458880 | elapsed time per iteration (s): 0.09 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 4.612761E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2757.919 | TFLOPs: 10.26 | 7: iteration 32020/ 173500 | consumed samples: 8197120 | consumed tokens: 16787701760 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.602560E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.258 | TFLOPs: 11.80 | 7: iteration 32030/ 173500 | consumed samples: 8199680 | consumed tokens: 16792944640 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.612314E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.080 | TFLOPs: 11.84 | 7: iteration 32040/ 173500 | consumed samples: 8202240 | consumed tokens: 16798187520 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.610432E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.771 | TFLOPs: 12.01 | 7: iteration 32050/ 173500 | consumed samples: 8204800 | consumed tokens: 16803430400 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.607987E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.175 | TFLOPs: 11.94 | 7: iteration 32060/ 173500 | consumed samples: 8207360 | consumed tokens: 16808673280 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.602032E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.422 | TFLOPs: 11.96 | 7: iteration 32070/ 173500 | consumed samples: 8209920 | consumed tokens: 16813916160 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.604695E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.217 | TFLOPs: 11.95 | 7: iteration 32080/ 173500 | consumed samples: 8212480 | consumed tokens: 16819159040 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.599476E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.069 | TFLOPs: 11.94 | 7: iteration 32090/ 173500 | consumed samples: 8215040 | consumed tokens: 16824401920 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.604299E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.771 | TFLOPs: 11.94 | 7: iteration 32100/ 173500 | consumed samples: 8217600 | consumed tokens: 16829644800 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.599347E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.465 | TFLOPs: 11.92 | 7: iteration 32110/ 173500 | consumed samples: 8220160 | consumed tokens: 16834887680 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.606023E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.910 | TFLOPs: 11.67 | 7: iteration 32120/ 173500 | consumed samples: 8222720 | consumed tokens: 16840130560 | elapsed time per iteration (s): 0.08 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 4.603100E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.537 | TFLOPs: 11.93 | 7: iteration 32130/ 173500 | consumed samples: 8225280 | consumed tokens: 16845373440 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.586472E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.838 | TFLOPs: 11.66 | 7: iteration 32140/ 173500 | consumed samples: 8227840 | consumed tokens: 16850616320 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.605631E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.881 | TFLOPs: 11.91 | 7: iteration 32150/ 173500 | consumed samples: 8230400 | consumed tokens: 16855859200 | elapsed time per iteration (s): 0.10 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.600917E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2491.593 | TFLOPs: 9.27 | 7: iteration 32160/ 173500 | consumed samples: 8232960 | consumed tokens: 16861102080 | elapsed time per iteration (s): 0.12 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.590649E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.268 | TFLOPs: 8.27 | 7: iteration 32170/ 173500 | consumed samples: 8235520 | consumed tokens: 16866344960 | elapsed time per iteration (s): 0.09 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.607901E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.714 | TFLOPs: 10.03 | 7: iteration 32180/ 173500 | consumed samples: 8238080 | consumed tokens: 16871587840 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.609074E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.656 | TFLOPs: 11.80 | 7: iteration 32190/ 173500 | consumed samples: 8240640 | consumed tokens: 16876830720 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.615279E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.083 | TFLOPs: 11.82 | 7: iteration 32200/ 173500 | consumed samples: 8243200 | consumed tokens: 16882073600 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.616339E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.134 | TFLOPs: 11.78 | 7: iteration 32210/ 173500 | consumed samples: 8245760 | consumed tokens: 16887316480 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.612772E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.352 | TFLOPs: 11.78 | 7: iteration 32220/ 173500 | consumed samples: 8248320 | consumed tokens: 16892559360 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.610775E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.604 | TFLOPs: 11.76 | 7: iteration 32230/ 173500 | consumed samples: 8250880 | consumed tokens: 16897802240 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.612453E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.948 | TFLOPs: 11.82 | 7: iteration 32240/ 173500 | consumed samples: 8253440 | consumed tokens: 16903045120 | elapsed time per iteration (s): 0.08 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 4.591581E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.096 | TFLOPs: 11.78 | 7: iteration 32250/ 173500 | consumed samples: 8256000 | consumed tokens: 16908288000 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.610204E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.821 | TFLOPs: 11.81 | 7: iteration 32260/ 173500 | consumed samples: 8258560 | consumed tokens: 16913530880 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.604612E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.560 | TFLOPs: 11.79 | 7: iteration 32270/ 173500 | consumed samples: 8261120 | consumed tokens: 16918773760 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.595118E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.315 | TFLOPs: 11.61 | 7: iteration 32280/ 173500 | consumed samples: 8263680 | consumed tokens: 16924016640 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.610437E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.379 | TFLOPs: 11.71 | 7: iteration 32290/ 173500 | consumed samples: 8266240 | consumed tokens: 16929259520 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.598313E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.104 | TFLOPs: 11.78 | 7: iteration 32300/ 173500 | consumed samples: 8268800 | consumed tokens: 16934502400 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.603313E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.152 | TFLOPs: 11.79 | 7: iteration 32310/ 173500 | consumed samples: 8271360 | consumed tokens: 16939745280 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.603657E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.718 | TFLOPs: 11.80 | 7: iteration 32320/ 173500 | consumed samples: 8273920 | consumed tokens: 16944988160 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.606823E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.946 | TFLOPs: 11.81 | 7: iteration 32330/ 173500 | consumed samples: 8276480 | consumed tokens: 16950231040 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.588295E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.577 | TFLOPs: 11.80 | 7: iteration 32340/ 173500 | consumed samples: 8279040 | consumed tokens: 16955473920 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.600869E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.144 | TFLOPs: 11.80 | 7: iteration 32350/ 173500 | consumed samples: 8281600 | consumed tokens: 16960716800 | elapsed time per iteration (s): 0.08 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 4.607883E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.742 | TFLOPs: 11.79 | 7: iteration 32360/ 173500 | consumed samples: 8284160 | consumed tokens: 16965959680 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.600955E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.446 | TFLOPs: 11.81 | 7: iteration 32370/ 173500 | consumed samples: 8286720 | consumed tokens: 16971202560 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.589668E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.606 | TFLOPs: 11.80 | 7: iteration 32380/ 173500 | consumed samples: 8289280 | consumed tokens: 16976445440 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.598172E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.438 | TFLOPs: 11.80 | 7: iteration 32390/ 173500 | consumed samples: 8291840 | consumed tokens: 16981688320 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.602518E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.175 | TFLOPs: 11.80 | 7: iteration 32400/ 173500 | consumed samples: 8294400 | consumed tokens: 16986931200 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.601221E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.337 | TFLOPs: 11.72 | 7: iteration 32410/ 173500 | consumed samples: 8296960 | consumed tokens: 16992174080 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.604007E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.888 | TFLOPs: 11.74 | 7: iteration 32420/ 173500 | consumed samples: 8299520 | consumed tokens: 16997416960 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.595559E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.459 | TFLOPs: 11.74 | 7: iteration 32430/ 173500 | consumed samples: 8302080 | consumed tokens: 17002659840 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.610907E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.670 | TFLOPs: 11.77 | 7: iteration 32440/ 173500 | consumed samples: 8304640 | consumed tokens: 17007902720 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.597867E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.244 | TFLOPs: 11.77 | 7: iteration 32450/ 173500 | consumed samples: 8307200 | consumed tokens: 17013145600 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.606358E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.799 | TFLOPs: 11.75 | 7: iteration 32460/ 173500 | consumed samples: 8309760 | consumed tokens: 17018388480 | elapsed time per iteration (s): 0.08 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.592074E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.550 | TFLOPs: 11.79 | 7: iteration 32470/ 173500 | consumed samples: 8312320 | consumed tokens: 17023631360 | elapsed time per iteration (s): 0.10 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 4.596575E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.743 | TFLOPs: 9.60 | 7: iteration 32480/ 173500 | consumed samples: 8314880 | consumed tokens: 17028874240 | elapsed time per iteration (s): 0.11 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.609858E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2411.355 | TFLOPs: 8.97 | 7: iteration 32490/ 173500 | consumed samples: 8317440 | consumed tokens: 17034117120 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.595486E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.144 | TFLOPs: 11.68 | 7: iteration 32500/ 173500 | consumed samples: 8320000 | consumed tokens: 17039360000 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.599873E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.863 | TFLOPs: 11.79 | 7: iteration 32510/ 173500 | consumed samples: 8322560 | consumed tokens: 17044602880 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.590424E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.937 | TFLOPs: 11.79 | 7: iteration 32520/ 173500 | consumed samples: 8325120 | consumed tokens: 17049845760 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.599779E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.484 | TFLOPs: 11.78 | 7: iteration 32530/ 173500 | consumed samples: 8327680 | consumed tokens: 17055088640 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.600039E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.368 | TFLOPs: 11.77 | 7: iteration 32540/ 173500 | consumed samples: 8330240 | consumed tokens: 17060331520 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.598054E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.398 | TFLOPs: 11.77 | 7: iteration 32550/ 173500 | consumed samples: 8332800 | consumed tokens: 17065574400 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.589324E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.526 | TFLOPs: 11.77 | 7: iteration 32560/ 173500 | consumed samples: 8335360 | consumed tokens: 17070817280 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.597765E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.934 | TFLOPs: 11.79 | 7: iteration 32570/ 173500 | consumed samples: 8337920 | consumed tokens: 17076060160 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.614742E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.789 | TFLOPs: 11.79 | 7: iteration 32580/ 173500 | consumed samples: 8340480 | consumed tokens: 17081303040 | elapsed time per iteration (s): 0.08 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 4.600211E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.335 | TFLOPs: 11.78 | 7: iteration 32590/ 173500 | consumed samples: 8343040 | consumed tokens: 17086545920 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.601653E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.215 | TFLOPs: 11.67 | 7: iteration 32600/ 173500 | consumed samples: 8345600 | consumed tokens: 17091788800 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.602534E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.941 | TFLOPs: 11.78 | 7: iteration 32610/ 173500 | consumed samples: 8348160 | consumed tokens: 17097031680 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.594555E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.683 | TFLOPs: 11.80 | 7: iteration 32620/ 173500 | consumed samples: 8350720 | consumed tokens: 17102274560 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.598729E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.602 | TFLOPs: 11.80 | 7: iteration 32630/ 173500 | consumed samples: 8353280 | consumed tokens: 17107517440 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.602286E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.689 | TFLOPs: 11.76 | 7: iteration 32640/ 173500 | consumed samples: 8355840 | consumed tokens: 17112760320 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.604420E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.806 | TFLOPs: 11.79 | 7: iteration 32650/ 173500 | consumed samples: 8358400 | consumed tokens: 17118003200 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.590816E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.478 | TFLOPs: 11.76 | 7: iteration 32660/ 173500 | consumed samples: 8360960 | consumed tokens: 17123246080 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.591865E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.650 | TFLOPs: 11.80 | 7: iteration 32670/ 173500 | consumed samples: 8363520 | consumed tokens: 17128488960 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.604578E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.106 | TFLOPs: 11.77 | 7: iteration 32680/ 173500 | consumed samples: 8366080 | consumed tokens: 17133731840 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.592907E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.949 | TFLOPs: 11.79 | 7: iteration 32690/ 173500 | consumed samples: 8368640 | consumed tokens: 17138974720 | elapsed time per iteration (s): 0.08 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 4.599318E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.461 | TFLOPs: 11.81 | 7: iteration 32700/ 173500 | consumed samples: 8371200 | consumed tokens: 17144217600 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.602260E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.134 | TFLOPs: 11.81 | 7: iteration 32710/ 173500 | consumed samples: 8373760 | consumed tokens: 17149460480 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.594017E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.643 | TFLOPs: 11.77 | 7: iteration 32720/ 173500 | consumed samples: 8376320 | consumed tokens: 17154703360 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.589948E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.893 | TFLOPs: 11.80 | 7: iteration 32730/ 173500 | consumed samples: 8378880 | consumed tokens: 17159946240 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.590359E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.971 | TFLOPs: 11.71 | 7: iteration 32740/ 173500 | consumed samples: 8381440 | consumed tokens: 17165189120 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.589516E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.726 | TFLOPs: 11.69 | 7: iteration 32750/ 173500 | consumed samples: 8384000 | consumed tokens: 17170432000 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.591153E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.604 | TFLOPs: 11.79 | 7: iteration 32760/ 173500 | consumed samples: 8386560 | consumed tokens: 17175674880 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.595535E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.238 | TFLOPs: 11.79 | 7: iteration 32770/ 173500 | consumed samples: 8389120 | consumed tokens: 17180917760 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.588511E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.012 | TFLOPs: 11.78 | 7: iteration 32780/ 173500 | consumed samples: 8391680 | consumed tokens: 17186160640 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.600237E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.671 | TFLOPs: 11.65 | 7: iteration 32790/ 173500 | consumed samples: 8394240 | consumed tokens: 17191403520 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.585498E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.941 | TFLOPs: 11.81 | 7: iteration 32800/ 173500 | consumed samples: 8396800 | consumed tokens: 17196646400 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.608355E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.189 | TFLOPs: 11.77 | 7: iteration 32810/ 173500 | consumed samples: 8399360 | consumed tokens: 17201889280 | elapsed time per iteration (s): 0.08 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 4.591666E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.542 | TFLOPs: 11.78 | 7: iteration 32820/ 173500 | consumed samples: 8401920 | consumed tokens: 17207132160 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.588906E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.550 | TFLOPs: 11.77 | 7: iteration 32830/ 173500 | consumed samples: 8404480 | consumed tokens: 17212375040 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.584298E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.836 | TFLOPs: 11.73 | 7: iteration 32840/ 173500 | consumed samples: 8407040 | consumed tokens: 17217617920 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.591224E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.058 | TFLOPs: 11.77 | 7: iteration 32850/ 173500 | consumed samples: 8409600 | consumed tokens: 17222860800 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.601231E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.902 | TFLOPs: 11.75 | 7: iteration 32860/ 173500 | consumed samples: 8412160 | consumed tokens: 17228103680 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.599208E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.841 | TFLOPs: 11.82 | 7: iteration 32870/ 173500 | consumed samples: 8414720 | consumed tokens: 17233346560 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.608840E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.872 | TFLOPs: 11.77 | 7: iteration 32880/ 173500 | consumed samples: 8417280 | consumed tokens: 17238589440 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.599956E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.614 | TFLOPs: 11.80 | 7: iteration 32890/ 173500 | consumed samples: 8419840 | consumed tokens: 17243832320 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.597989E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.185 | TFLOPs: 11.81 | 7: iteration 32900/ 173500 | consumed samples: 8422400 | consumed tokens: 17249075200 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.604908E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.020 | TFLOPs: 11.77 | 7: iteration 32910/ 173500 | consumed samples: 8424960 | consumed tokens: 17254318080 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.591675E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.219 | TFLOPs: 11.78 | 7: iteration 32920/ 173500 | consumed samples: 8427520 | consumed tokens: 17259560960 | elapsed time per iteration (s): 0.08 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 4.611546E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.974 | TFLOPs: 11.80 | 7: iteration 32930/ 173500 | consumed samples: 8430080 | consumed tokens: 17264803840 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.602567E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.396 | TFLOPs: 11.73 | 7: iteration 32940/ 173500 | consumed samples: 8432640 | consumed tokens: 17270046720 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.602324E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.674 | TFLOPs: 11.81 | 7: iteration 32950/ 173500 | consumed samples: 8435200 | consumed tokens: 17275289600 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.608702E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.837 | TFLOPs: 11.75 | 7: iteration 32960/ 173500 | consumed samples: 8437760 | consumed tokens: 17280532480 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.598542E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.705 | TFLOPs: 11.80 | 7: iteration 32970/ 173500 | consumed samples: 8440320 | consumed tokens: 17285775360 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.602402E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.037 | TFLOPs: 11.79 | 7: iteration 32980/ 173500 | consumed samples: 8442880 | consumed tokens: 17291018240 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.590965E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.771 | TFLOPs: 11.80 | 7: iteration 32990/ 173500 | consumed samples: 8445440 | consumed tokens: 17296261120 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.599019E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.523 | TFLOPs: 11.77 | 7: iteration 33000/ 173500 | consumed samples: 8448000 | consumed tokens: 17301504000 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.594726E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.230 | TFLOPs: 11.75 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 33000 | lm loss value: 4.464251E+00 | lm loss PPL: 8.685591E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 33000 to checkpoints_14m91b100m 0: [2023-03-17 01:04:31,590] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step33000 is begin to save! 0: [2023-03-17 01:04:31,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:04:31,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:04:31,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:04:31,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:04:31,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:04:31,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:04:31,632] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:04:31,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:04:31,635] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:04:31,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:04:31,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:04:31,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:04:31,639] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step33000/mp_rank_00_model_states.pt 0: [2023-03-17 01:04:31,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:04:31,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:04:31,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2023-03-17 01:04:31,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2023-03-17 01:04:31,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:04:31,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2023-03-17 01:04:31,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:04:31,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:04:31,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2023-03-17 01:04:31,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:04:31,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:04:31,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: successfully saved checkpoint at iteration 33000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 85.72 7: iteration 33010/ 173500 | consumed samples: 8450560 | consumed tokens: 17306746880 | elapsed time per iteration (s): 0.09 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.595110E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.277 | TFLOPs: 10.24 | 7: iteration 33020/ 173500 | consumed samples: 8453120 | consumed tokens: 17311989760 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.603188E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.254 | TFLOPs: 11.78 | 7: iteration 33030/ 173500 | consumed samples: 8455680 | consumed tokens: 17317232640 | elapsed time per iteration (s): 0.08 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 4.595275E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.803 | TFLOPs: 11.80 | 7: iteration 33040/ 173500 | consumed samples: 8458240 | consumed tokens: 17322475520 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.602715E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.970 | TFLOPs: 11.68 | 7: iteration 33050/ 173500 | consumed samples: 8460800 | consumed tokens: 17327718400 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.604827E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.073 | TFLOPs: 11.74 | 7: iteration 33060/ 173500 | consumed samples: 8463360 | consumed tokens: 17332961280 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.593145E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.027 | TFLOPs: 11.71 | 7: iteration 33070/ 173500 | consumed samples: 8465920 | consumed tokens: 17338204160 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.582681E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.460 | TFLOPs: 11.38 | 7: iteration 33080/ 173500 | consumed samples: 8468480 | consumed tokens: 17343447040 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.592548E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.954 | TFLOPs: 11.82 | 7: iteration 33090/ 173500 | consumed samples: 8471040 | consumed tokens: 17348689920 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.605275E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.363 | TFLOPs: 11.87 | 7: iteration 33100/ 173500 | consumed samples: 8473600 | consumed tokens: 17353932800 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.594478E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.459 | TFLOPs: 11.57 | 7: iteration 33110/ 173500 | consumed samples: 8476160 | consumed tokens: 17359175680 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.598625E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.175 | TFLOPs: 11.82 | 7: iteration 33120/ 173500 | consumed samples: 8478720 | consumed tokens: 17364418560 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.595436E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.417 | TFLOPs: 11.87 | 7: iteration 33130/ 173500 | consumed samples: 8481280 | consumed tokens: 17369661440 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.606198E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.238 | TFLOPs: 11.87 | 7: iteration 33140/ 173500 | consumed samples: 8483840 | consumed tokens: 17374904320 | elapsed time per iteration (s): 0.08 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 4.601632E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.516 | TFLOPs: 11.81 | 7: iteration 33150/ 173500 | consumed samples: 8486400 | consumed tokens: 17380147200 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.599343E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.011 | TFLOPs: 11.90 | 7: iteration 33160/ 173500 | consumed samples: 8488960 | consumed tokens: 17385390080 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.593661E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.685 | TFLOPs: 11.82 | 7: iteration 33170/ 173500 | consumed samples: 8491520 | consumed tokens: 17390632960 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.594301E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.841 | TFLOPs: 11.89 | 7: iteration 33180/ 173500 | consumed samples: 8494080 | consumed tokens: 17395875840 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.592072E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.433 | TFLOPs: 11.89 | 7: iteration 33190/ 173500 | consumed samples: 8496640 | consumed tokens: 17401118720 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.592553E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.410 | TFLOPs: 11.86 | 7: iteration 33200/ 173500 | consumed samples: 8499200 | consumed tokens: 17406361600 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.599133E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.345 | TFLOPs: 11.83 | 7: iteration 33210/ 173500 | consumed samples: 8501760 | consumed tokens: 17411604480 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.587223E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.689 | TFLOPs: 11.57 | 7: iteration 33220/ 173500 | consumed samples: 8504320 | consumed tokens: 17416847360 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.599332E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.272 | TFLOPs: 11.87 | 7: iteration 33230/ 173500 | consumed samples: 8506880 | consumed tokens: 17422090240 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.598618E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.510 | TFLOPs: 11.84 | 7: iteration 33240/ 173500 | consumed samples: 8509440 | consumed tokens: 17427333120 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.590189E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.579 | TFLOPs: 11.84 | 7: iteration 33250/ 173500 | consumed samples: 8512000 | consumed tokens: 17432576000 | elapsed time per iteration (s): 0.08 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 4.601560E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.205 | TFLOPs: 11.84 | 7: iteration 33260/ 173500 | consumed samples: 8514560 | consumed tokens: 17437818880 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.613356E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.413 | TFLOPs: 11.79 | 7: iteration 33270/ 173500 | consumed samples: 8517120 | consumed tokens: 17443061760 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.603422E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.025 | TFLOPs: 11.79 | 7: iteration 33280/ 173500 | consumed samples: 8519680 | consumed tokens: 17448304640 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.587733E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.476 | TFLOPs: 11.82 | 7: iteration 33290/ 173500 | consumed samples: 8522240 | consumed tokens: 17453547520 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.602445E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.108 | TFLOPs: 11.78 | 7: iteration 33300/ 173500 | consumed samples: 8524800 | consumed tokens: 17458790400 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.606721E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.831 | TFLOPs: 11.81 | 7: iteration 33310/ 173500 | consumed samples: 8527360 | consumed tokens: 17464033280 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.605118E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.089 | TFLOPs: 11.79 | 7: iteration 33320/ 173500 | consumed samples: 8529920 | consumed tokens: 17469276160 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.589965E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.850 | TFLOPs: 11.84 | 7: iteration 33330/ 173500 | consumed samples: 8532480 | consumed tokens: 17474519040 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.595434E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.032 | TFLOPs: 11.84 | 7: iteration 33340/ 173500 | consumed samples: 8535040 | consumed tokens: 17479761920 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.604509E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.746 | TFLOPs: 11.85 | 7: iteration 33350/ 173500 | consumed samples: 8537600 | consumed tokens: 17485004800 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.603279E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.227 | TFLOPs: 11.87 | 7: iteration 33360/ 173500 | consumed samples: 8540160 | consumed tokens: 17490247680 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.605226E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.165 | TFLOPs: 11.86 | 7: iteration 33370/ 173500 | consumed samples: 8542720 | consumed tokens: 17495490560 | elapsed time per iteration (s): 0.08 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 4.575093E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.956 | TFLOPs: 11.80 | 7: iteration 33380/ 173500 | consumed samples: 8545280 | consumed tokens: 17500733440 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.584999E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.173 | TFLOPs: 11.56 | 7: iteration 33390/ 173500 | consumed samples: 8547840 | consumed tokens: 17505976320 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.598698E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.352 | TFLOPs: 11.77 | 7: iteration 33400/ 173500 | consumed samples: 8550400 | consumed tokens: 17511219200 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.592340E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.476 | TFLOPs: 11.74 | 7: iteration 33410/ 173500 | consumed samples: 8552960 | consumed tokens: 17516462080 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.593710E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.541 | TFLOPs: 11.72 | 7: iteration 33420/ 173500 | consumed samples: 8555520 | consumed tokens: 17521704960 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.606400E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.802 | TFLOPs: 11.81 | 7: iteration 33430/ 173500 | consumed samples: 8558080 | consumed tokens: 17526947840 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.598617E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.307 | TFLOPs: 11.77 | 7: iteration 33440/ 173500 | consumed samples: 8560640 | consumed tokens: 17532190720 | elapsed time per iteration (s): 0.29 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.591098E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 871.458 | TFLOPs: 3.24 | 7: iteration 33450/ 173500 | consumed samples: 8563200 | consumed tokens: 17537433600 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.590604E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.127 | TFLOPs: 11.43 | 7: iteration 33460/ 173500 | consumed samples: 8565760 | consumed tokens: 17542676480 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.587048E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.606 | TFLOPs: 11.76 | 7: iteration 33470/ 173500 | consumed samples: 8568320 | consumed tokens: 17547919360 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.592307E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.127 | TFLOPs: 11.74 | 7: iteration 33480/ 173500 | consumed samples: 8570880 | consumed tokens: 17553162240 | elapsed time per iteration (s): 0.08 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 4.590388E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.062 | TFLOPs: 11.83 | 7: iteration 33490/ 173500 | consumed samples: 8573440 | consumed tokens: 17558405120 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.578489E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.755 | TFLOPs: 11.75 | 7: iteration 33500/ 173500 | consumed samples: 8576000 | consumed tokens: 17563648000 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.603351E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.586 | TFLOPs: 11.76 | 7: iteration 33510/ 173500 | consumed samples: 8578560 | consumed tokens: 17568890880 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.573396E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.422 | TFLOPs: 11.77 | 7: iteration 33520/ 173500 | consumed samples: 8581120 | consumed tokens: 17574133760 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.584449E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.866 | TFLOPs: 11.80 | 7: iteration 33530/ 173500 | consumed samples: 8583680 | consumed tokens: 17579376640 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.583405E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.346 | TFLOPs: 11.76 | 7: iteration 33540/ 173500 | consumed samples: 8586240 | consumed tokens: 17584619520 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.599165E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.182 | TFLOPs: 11.54 | 7: iteration 33550/ 173500 | consumed samples: 8588800 | consumed tokens: 17589862400 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.594799E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.334 | TFLOPs: 11.54 | 7: iteration 33560/ 173500 | consumed samples: 8591360 | consumed tokens: 17595105280 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.599768E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.057 | TFLOPs: 11.75 | 7: iteration 33570/ 173500 | consumed samples: 8593920 | consumed tokens: 17600348160 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.600093E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.103 | TFLOPs: 11.74 | 7: iteration 33580/ 173500 | consumed samples: 8596480 | consumed tokens: 17605591040 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.595850E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.853 | TFLOPs: 11.76 | 7: iteration 33590/ 173500 | consumed samples: 8599040 | consumed tokens: 17610833920 | elapsed time per iteration (s): 0.08 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 4.582927E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.856 | TFLOPs: 11.79 | 7: iteration 33600/ 173500 | consumed samples: 8601600 | consumed tokens: 17616076800 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.598885E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.444 | TFLOPs: 11.80 | 7: iteration 33610/ 173500 | consumed samples: 8604160 | consumed tokens: 17621319680 | elapsed time per iteration (s): 0.09 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.587271E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.075 | TFLOPs: 10.86 | 7: iteration 33620/ 173500 | consumed samples: 8606720 | consumed tokens: 17626562560 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.601226E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.632 | TFLOPs: 11.85 | 7: iteration 33630/ 173500 | consumed samples: 8609280 | consumed tokens: 17631805440 | elapsed time per iteration (s): 0.11 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.605755E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2403.747 | TFLOPs: 8.94 | 7: iteration 33640/ 173500 | consumed samples: 8611840 | consumed tokens: 17637048320 | elapsed time per iteration (s): 0.11 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.597234E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.746 | TFLOPs: 8.66 | 7: iteration 33650/ 173500 | consumed samples: 8614400 | consumed tokens: 17642291200 | elapsed time per iteration (s): 0.10 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.594293E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2469.644 | TFLOPs: 9.19 | 7: iteration 33660/ 173500 | consumed samples: 8616960 | consumed tokens: 17647534080 | elapsed time per iteration (s): 0.09 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.590301E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.201 | TFLOPs: 10.88 | 7: iteration 33670/ 173500 | consumed samples: 8619520 | consumed tokens: 17652776960 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.590216E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.114 | TFLOPs: 11.96 | 7: iteration 33680/ 173500 | consumed samples: 8622080 | consumed tokens: 17658019840 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.594617E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.232 | TFLOPs: 11.81 | 7: iteration 33690/ 173500 | consumed samples: 8624640 | consumed tokens: 17663262720 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.596983E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.930 | TFLOPs: 11.73 | 7: iteration 33700/ 173500 | consumed samples: 8627200 | consumed tokens: 17668505600 | elapsed time per iteration (s): 0.08 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 4.590819E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.194 | TFLOPs: 11.94 | 7: iteration 33710/ 173500 | consumed samples: 8629760 | consumed tokens: 17673748480 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.592672E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.291 | TFLOPs: 11.97 | 7: iteration 33720/ 173500 | consumed samples: 8632320 | consumed tokens: 17678991360 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.596144E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.010 | TFLOPs: 11.94 | 7: iteration 33730/ 173500 | consumed samples: 8634880 | consumed tokens: 17684234240 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.591280E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.540 | TFLOPs: 11.94 | 7: iteration 33740/ 173500 | consumed samples: 8637440 | consumed tokens: 17689477120 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.595581E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.938 | TFLOPs: 11.98 | 7: iteration 33750/ 173500 | consumed samples: 8640000 | consumed tokens: 17694720000 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.588746E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.178 | TFLOPs: 11.93 | 7: iteration 33760/ 173500 | consumed samples: 8642560 | consumed tokens: 17699962880 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.593901E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.066 | TFLOPs: 11.89 | 7: iteration 33770/ 173500 | consumed samples: 8645120 | consumed tokens: 17705205760 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.589254E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.709 | TFLOPs: 11.97 | 7: iteration 33780/ 173500 | consumed samples: 8647680 | consumed tokens: 17710448640 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.601377E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.832 | TFLOPs: 11.96 | 7: iteration 33790/ 173500 | consumed samples: 8650240 | consumed tokens: 17715691520 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.601323E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.733 | TFLOPs: 11.98 | 7: iteration 33800/ 173500 | consumed samples: 8652800 | consumed tokens: 17720934400 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.588433E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.507 | TFLOPs: 11.90 | 7: iteration 33810/ 173500 | consumed samples: 8655360 | consumed tokens: 17726177280 | elapsed time per iteration (s): 0.08 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 4.594066E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.015 | TFLOPs: 11.98 | 7: iteration 33820/ 173500 | consumed samples: 8657920 | consumed tokens: 17731420160 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.593538E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.749 | TFLOPs: 11.96 | 7: iteration 33830/ 173500 | consumed samples: 8660480 | consumed tokens: 17736663040 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.594795E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.332 | TFLOPs: 11.96 | 7: iteration 33840/ 173500 | consumed samples: 8663040 | consumed tokens: 17741905920 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.596788E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.833 | TFLOPs: 11.94 | 7: iteration 33850/ 173500 | consumed samples: 8665600 | consumed tokens: 17747148800 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.584486E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.834 | TFLOPs: 11.72 | 7: iteration 33860/ 173500 | consumed samples: 8668160 | consumed tokens: 17752391680 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.584266E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.849 | TFLOPs: 11.88 | 7: iteration 33870/ 173500 | consumed samples: 8670720 | consumed tokens: 17757634560 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.587782E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.983 | TFLOPs: 11.80 | 7: iteration 33880/ 173500 | consumed samples: 8673280 | consumed tokens: 17762877440 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.594524E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.377 | TFLOPs: 11.74 | 7: iteration 33890/ 173500 | consumed samples: 8675840 | consumed tokens: 17768120320 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.595901E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.711 | TFLOPs: 11.84 | 7: iteration 33900/ 173500 | consumed samples: 8678400 | consumed tokens: 17773363200 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.584265E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.582 | TFLOPs: 11.82 | 7: iteration 33910/ 173500 | consumed samples: 8680960 | consumed tokens: 17778606080 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.604356E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.206 | TFLOPs: 11.79 | 7: iteration 33920/ 173500 | consumed samples: 8683520 | consumed tokens: 17783848960 | elapsed time per iteration (s): 0.08 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 4.610542E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.820 | TFLOPs: 11.84 | 7: iteration 33930/ 173500 | consumed samples: 8686080 | consumed tokens: 17789091840 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.585963E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.417 | TFLOPs: 11.80 | 7: iteration 33940/ 173500 | consumed samples: 8688640 | consumed tokens: 17794334720 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.596479E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.805 | TFLOPs: 11.76 | 7: iteration 33950/ 173500 | consumed samples: 8691200 | consumed tokens: 17799577600 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.592720E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.204 | TFLOPs: 11.82 | 7: iteration 33960/ 173500 | consumed samples: 8693760 | consumed tokens: 17804820480 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.602406E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.411 | TFLOPs: 11.79 | 7: iteration 33970/ 173500 | consumed samples: 8696320 | consumed tokens: 17810063360 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.587422E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.496 | TFLOPs: 11.82 | 7: iteration 33980/ 173500 | consumed samples: 8698880 | consumed tokens: 17815306240 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.601795E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.370 | TFLOPs: 11.79 | 7: iteration 33990/ 173500 | consumed samples: 8701440 | consumed tokens: 17820549120 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.591124E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.304 | TFLOPs: 11.79 | 0: [2023-03-17 01:05:55,385] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=0, lr=[0.00018477830620634072, 0.00018477830620634072, 0.00018477830620634072], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 34000/ 173500 | consumed samples: 8704000 | consumed tokens: 17825792000 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.601678E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.790 | TFLOPs: 11.88 | 0: steps: 34000 loss: 4.6021 iter time (s): 0.082 samples/sec: 3131.439 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 34000 | lm loss value: 4.485698E+00 | lm loss PPL: 8.873889E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 34000 to checkpoints_14m91b100m 0: [2023-03-17 01:05:55,443] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step34000 is begin to save! 0: [2023-03-17 01:05:55,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:05:55,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:05:55,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:05:55,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:05:55,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:05:55,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:05:55,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:05:55,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:05:55,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:05:55,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:05:55,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:05:55,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:05:55,485] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step34000/mp_rank_00_model_states.pt 0: [2023-03-17 01:05:55,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:05:55,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:05:55,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:05:55,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:05:55,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:05:55,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 01:05:55,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2023-03-17 01:05:55,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2023-03-17 01:05:55,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:05:55,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:05:55,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: successfully saved checkpoint at iteration 34000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.19 7: iteration 34010/ 173500 | consumed samples: 8706560 | consumed tokens: 17831034880 | elapsed time per iteration (s): 0.09 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.605365E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.641 | TFLOPs: 10.17 | 7: iteration 34020/ 173500 | consumed samples: 8709120 | consumed tokens: 17836277760 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.594416E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.134 | TFLOPs: 11.88 | 7: iteration 34030/ 173500 | consumed samples: 8711680 | consumed tokens: 17841520640 | elapsed time per iteration (s): 0.08 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 4.595858E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.538 | TFLOPs: 11.86 | 7: iteration 34040/ 173500 | consumed samples: 8714240 | consumed tokens: 17846763520 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.581388E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.102 | TFLOPs: 11.87 | 7: iteration 34050/ 173500 | consumed samples: 8716800 | consumed tokens: 17852006400 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.592200E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.436 | TFLOPs: 11.88 | 7: iteration 34060/ 173500 | consumed samples: 8719360 | consumed tokens: 17857249280 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.596375E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.167 | TFLOPs: 11.88 | 7: iteration 34070/ 173500 | consumed samples: 8721920 | consumed tokens: 17862492160 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.583450E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.418 | TFLOPs: 11.87 | 7: iteration 34080/ 173500 | consumed samples: 8724480 | consumed tokens: 17867735040 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.602675E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.807 | TFLOPs: 11.87 | 7: iteration 34090/ 173500 | consumed samples: 8727040 | consumed tokens: 17872977920 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.594182E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.700 | TFLOPs: 11.83 | 7: iteration 34100/ 173500 | consumed samples: 8729600 | consumed tokens: 17878220800 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.586358E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.254 | TFLOPs: 11.78 | 7: iteration 34110/ 173500 | consumed samples: 8732160 | consumed tokens: 17883463680 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.611889E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.513 | TFLOPs: 11.66 | 7: iteration 34120/ 173500 | consumed samples: 8734720 | consumed tokens: 17888706560 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.584630E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.339 | TFLOPs: 11.83 | 7: iteration 34130/ 173500 | consumed samples: 8737280 | consumed tokens: 17893949440 | elapsed time per iteration (s): 0.08 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 4.579504E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.087 | TFLOPs: 11.85 | 7: iteration 34140/ 173500 | consumed samples: 8739840 | consumed tokens: 17899192320 | elapsed time per iteration (s): 0.09 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.585546E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.365 | TFLOPs: 10.42 | 7: iteration 34150/ 173500 | consumed samples: 8742400 | consumed tokens: 17904435200 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.582851E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.681 | TFLOPs: 11.74 | 7: iteration 34160/ 173500 | consumed samples: 8744960 | consumed tokens: 17909678080 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.603006E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.787 | TFLOPs: 11.77 | 7: iteration 34170/ 173500 | consumed samples: 8747520 | consumed tokens: 17914920960 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.594395E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.275 | TFLOPs: 11.72 | 7: iteration 34180/ 173500 | consumed samples: 8750080 | consumed tokens: 17920163840 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.587276E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.681 | TFLOPs: 11.77 | 7: iteration 34190/ 173500 | consumed samples: 8752640 | consumed tokens: 17925406720 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.587669E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.405 | TFLOPs: 11.76 | 7: iteration 34200/ 173500 | consumed samples: 8755200 | consumed tokens: 17930649600 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.597060E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.568 | TFLOPs: 11.79 | 7: iteration 34210/ 173500 | consumed samples: 8757760 | consumed tokens: 17935892480 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.598688E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.807 | TFLOPs: 11.72 | 7: iteration 34220/ 173500 | consumed samples: 8760320 | consumed tokens: 17941135360 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.594380E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.919 | TFLOPs: 11.78 | 7: iteration 34230/ 173500 | consumed samples: 8762880 | consumed tokens: 17946378240 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.587897E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.759 | TFLOPs: 11.75 | 7: iteration 34240/ 173500 | consumed samples: 8765440 | consumed tokens: 17951621120 | elapsed time per iteration (s): 0.08 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 4.604051E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.326 | TFLOPs: 11.70 | 7: iteration 34250/ 173500 | consumed samples: 8768000 | consumed tokens: 17956864000 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.595811E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.083 | TFLOPs: 11.77 | 7: iteration 34260/ 173500 | consumed samples: 8770560 | consumed tokens: 17962106880 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.594291E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.289 | TFLOPs: 11.79 | 7: iteration 34270/ 173500 | consumed samples: 8773120 | consumed tokens: 17967349760 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.602115E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.186 | TFLOPs: 11.66 | 7: iteration 34280/ 173500 | consumed samples: 8775680 | consumed tokens: 17972592640 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.596359E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.092 | TFLOPs: 11.78 | 7: iteration 34290/ 173500 | consumed samples: 8778240 | consumed tokens: 17977835520 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.588534E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.055 | TFLOPs: 11.74 | 7: iteration 34300/ 173500 | consumed samples: 8780800 | consumed tokens: 17983078400 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.606922E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.023 | TFLOPs: 11.79 | 7: iteration 34310/ 173500 | consumed samples: 8783360 | consumed tokens: 17988321280 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.592248E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.066 | TFLOPs: 11.79 | 7: iteration 34320/ 173500 | consumed samples: 8785920 | consumed tokens: 17993564160 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.587936E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.337 | TFLOPs: 11.80 | 7: iteration 34330/ 173500 | consumed samples: 8788480 | consumed tokens: 17998807040 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.592113E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.812 | TFLOPs: 11.75 | 7: iteration 34340/ 173500 | consumed samples: 8791040 | consumed tokens: 18004049920 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.593879E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.071 | TFLOPs: 11.79 | 7: iteration 34350/ 173500 | consumed samples: 8793600 | consumed tokens: 18009292800 | elapsed time per iteration (s): 0.08 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 4.577835E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.583 | TFLOPs: 11.76 | 7: iteration 34360/ 173500 | consumed samples: 8796160 | consumed tokens: 18014535680 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.596488E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.458 | TFLOPs: 11.76 | 7: iteration 34370/ 173500 | consumed samples: 8798720 | consumed tokens: 18019778560 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.597465E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.510 | TFLOPs: 11.80 | 7: iteration 34380/ 173500 | consumed samples: 8801280 | consumed tokens: 18025021440 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.584769E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.907 | TFLOPs: 11.79 | 7: iteration 34390/ 173500 | consumed samples: 8803840 | consumed tokens: 18030264320 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.609646E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.754 | TFLOPs: 11.72 | 7: iteration 34400/ 173500 | consumed samples: 8806400 | consumed tokens: 18035507200 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.588807E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.671 | TFLOPs: 11.77 | 7: iteration 34410/ 173500 | consumed samples: 8808960 | consumed tokens: 18040750080 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.587387E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.276 | TFLOPs: 11.78 | 7: iteration 34420/ 173500 | consumed samples: 8811520 | consumed tokens: 18045992960 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.589009E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.769 | TFLOPs: 11.49 | 7: iteration 34430/ 173500 | consumed samples: 8814080 | consumed tokens: 18051235840 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.599192E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.037 | TFLOPs: 11.82 | 7: iteration 34440/ 173500 | consumed samples: 8816640 | consumed tokens: 18056478720 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.580003E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.115 | TFLOPs: 11.81 | 7: iteration 34450/ 173500 | consumed samples: 8819200 | consumed tokens: 18061721600 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.587356E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.342 | TFLOPs: 11.81 | 7: iteration 34460/ 173500 | consumed samples: 8821760 | consumed tokens: 18066964480 | elapsed time per iteration (s): 0.08 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 4.581126E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.905 | TFLOPs: 11.81 | 7: iteration 34470/ 173500 | consumed samples: 8824320 | consumed tokens: 18072207360 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.576359E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.312 | TFLOPs: 11.82 | 7: iteration 34480/ 173500 | consumed samples: 8826880 | consumed tokens: 18077450240 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.578466E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.295 | TFLOPs: 11.81 | 7: iteration 34490/ 173500 | consumed samples: 8829440 | consumed tokens: 18082693120 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.593031E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.359 | TFLOPs: 11.78 | 7: iteration 34500/ 173500 | consumed samples: 8832000 | consumed tokens: 18087936000 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.600720E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.125 | TFLOPs: 11.82 | 7: iteration 34510/ 173500 | consumed samples: 8834560 | consumed tokens: 18093178880 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.591615E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.395 | TFLOPs: 11.79 | 7: iteration 34520/ 173500 | consumed samples: 8837120 | consumed tokens: 18098421760 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.580624E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.691 | TFLOPs: 11.74 | 7: iteration 34530/ 173500 | consumed samples: 8839680 | consumed tokens: 18103664640 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.612190E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.146 | TFLOPs: 11.82 | 7: iteration 34540/ 173500 | consumed samples: 8842240 | consumed tokens: 18108907520 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.581271E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.165 | TFLOPs: 11.75 | 7: iteration 34550/ 173500 | consumed samples: 8844800 | consumed tokens: 18114150400 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.595500E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.244 | TFLOPs: 11.74 | 7: iteration 34560/ 173500 | consumed samples: 8847360 | consumed tokens: 18119393280 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.584869E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.518 | TFLOPs: 11.76 | 7: iteration 34570/ 173500 | consumed samples: 8849920 | consumed tokens: 18124636160 | elapsed time per iteration (s): 0.08 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 4.597042E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.387 | TFLOPs: 11.74 | 7: iteration 34580/ 173500 | consumed samples: 8852480 | consumed tokens: 18129879040 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.586372E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.382 | TFLOPs: 11.77 | 7: iteration 34590/ 173500 | consumed samples: 8855040 | consumed tokens: 18135121920 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.578684E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.483 | TFLOPs: 11.76 | 7: iteration 34600/ 173500 | consumed samples: 8857600 | consumed tokens: 18140364800 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.585666E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.010 | TFLOPs: 11.75 | 7: iteration 34610/ 173500 | consumed samples: 8860160 | consumed tokens: 18145607680 | elapsed time per iteration (s): 0.10 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.585147E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.495 | TFLOPs: 9.80 | 7: iteration 34620/ 173500 | consumed samples: 8862720 | consumed tokens: 18150850560 | elapsed time per iteration (s): 0.09 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.592294E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.551 | TFLOPs: 10.97 | 7: iteration 34630/ 173500 | consumed samples: 8865280 | consumed tokens: 18156093440 | elapsed time per iteration (s): 0.09 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.582986E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2818.393 | TFLOPs: 10.48 | 7: iteration 34640/ 173500 | consumed samples: 8867840 | consumed tokens: 18161336320 | elapsed time per iteration (s): 0.09 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.592481E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.330 | TFLOPs: 10.57 | 7: iteration 34650/ 173500 | consumed samples: 8870400 | consumed tokens: 18166579200 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.582364E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.444 | TFLOPs: 11.74 | 7: iteration 34660/ 173500 | consumed samples: 8872960 | consumed tokens: 18171822080 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.582358E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.508 | TFLOPs: 11.74 | 7: iteration 34670/ 173500 | consumed samples: 8875520 | consumed tokens: 18177064960 | elapsed time per iteration (s): 0.08 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 4.586776E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.275 | TFLOPs: 11.74 | 7: iteration 34680/ 173500 | consumed samples: 8878080 | consumed tokens: 18182307840 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.587250E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.654 | TFLOPs: 11.75 | 7: iteration 34690/ 173500 | consumed samples: 8880640 | consumed tokens: 18187550720 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.588322E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.724 | TFLOPs: 11.80 | 7: iteration 34700/ 173500 | consumed samples: 8883200 | consumed tokens: 18192793600 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.596519E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.414 | TFLOPs: 11.81 | 7: iteration 34710/ 173500 | consumed samples: 8885760 | consumed tokens: 18198036480 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.585989E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.343 | TFLOPs: 11.80 | 7: iteration 34720/ 173500 | consumed samples: 8888320 | consumed tokens: 18203279360 | elapsed time per iteration (s): 0.09 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.568663E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.097 | TFLOPs: 10.98 | 7: iteration 34730/ 173500 | consumed samples: 8890880 | consumed tokens: 18208522240 | elapsed time per iteration (s): 0.09 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.584498E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.220 | TFLOPs: 10.58 | 7: iteration 34740/ 173500 | consumed samples: 8893440 | consumed tokens: 18213765120 | elapsed time per iteration (s): 0.09 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.592422E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.446 | TFLOPs: 10.38 | 7: iteration 34750/ 173500 | consumed samples: 8896000 | consumed tokens: 18219008000 | elapsed time per iteration (s): 0.11 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.587500E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2273.830 | TFLOPs: 8.46 | 7: iteration 34760/ 173500 | consumed samples: 8898560 | consumed tokens: 18224250880 | elapsed time per iteration (s): 0.11 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.590975E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2423.007 | TFLOPs: 9.01 | 7: iteration 34770/ 173500 | consumed samples: 8901120 | consumed tokens: 18229493760 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.588192E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.826 | TFLOPs: 11.75 | 7: iteration 34780/ 173500 | consumed samples: 8903680 | consumed tokens: 18234736640 | elapsed time per iteration (s): 0.08 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 4.581959E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.551 | TFLOPs: 11.79 | 7: iteration 34790/ 173500 | consumed samples: 8906240 | consumed tokens: 18239979520 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.585981E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.155 | TFLOPs: 11.73 | 7: iteration 34800/ 173500 | consumed samples: 8908800 | consumed tokens: 18245222400 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.601122E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.911 | TFLOPs: 11.78 | 7: iteration 34810/ 173500 | consumed samples: 8911360 | consumed tokens: 18250465280 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.585146E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.254 | TFLOPs: 11.73 | 7: iteration 34820/ 173500 | consumed samples: 8913920 | consumed tokens: 18255708160 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.596888E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.145 | TFLOPs: 11.85 | 7: iteration 34830/ 173500 | consumed samples: 8916480 | consumed tokens: 18260951040 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.597565E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.446 | TFLOPs: 11.86 | 7: iteration 34840/ 173500 | consumed samples: 8919040 | consumed tokens: 18266193920 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.607737E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.701 | TFLOPs: 11.86 | 7: iteration 34850/ 173500 | consumed samples: 8921600 | consumed tokens: 18271436800 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.589384E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.155 | TFLOPs: 11.86 | 7: iteration 34860/ 173500 | consumed samples: 8924160 | consumed tokens: 18276679680 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.586571E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.219 | TFLOPs: 11.86 | 7: iteration 34870/ 173500 | consumed samples: 8926720 | consumed tokens: 18281922560 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.581548E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.342 | TFLOPs: 11.89 | 7: iteration 34880/ 173500 | consumed samples: 8929280 | consumed tokens: 18287165440 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.595320E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.539 | TFLOPs: 11.87 | 7: iteration 34890/ 173500 | consumed samples: 8931840 | consumed tokens: 18292408320 | elapsed time per iteration (s): 0.08 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 4.600696E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.352 | TFLOPs: 11.83 | 7: iteration 34900/ 173500 | consumed samples: 8934400 | consumed tokens: 18297651200 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.589590E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.163 | TFLOPs: 11.87 | 7: iteration 34910/ 173500 | consumed samples: 8936960 | consumed tokens: 18302894080 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.578446E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.570 | TFLOPs: 11.85 | 7: iteration 34920/ 173500 | consumed samples: 8939520 | consumed tokens: 18308136960 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.592707E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.235 | TFLOPs: 11.87 | 7: iteration 34930/ 173500 | consumed samples: 8942080 | consumed tokens: 18313379840 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.596250E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.909 | TFLOPs: 11.85 | 7: iteration 34940/ 173500 | consumed samples: 8944640 | consumed tokens: 18318622720 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.586963E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.717 | TFLOPs: 11.82 | 7: iteration 34950/ 173500 | consumed samples: 8947200 | consumed tokens: 18323865600 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.578570E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.908 | TFLOPs: 11.84 | 7: iteration 34960/ 173500 | consumed samples: 8949760 | consumed tokens: 18329108480 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.567966E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.189 | TFLOPs: 11.80 | 7: iteration 34970/ 173500 | consumed samples: 8952320 | consumed tokens: 18334351360 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.581552E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.234 | TFLOPs: 11.84 | 7: iteration 34980/ 173500 | consumed samples: 8954880 | consumed tokens: 18339594240 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.597701E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.373 | TFLOPs: 11.78 | 7: iteration 34990/ 173500 | consumed samples: 8957440 | consumed tokens: 18344837120 | elapsed time per iteration (s): 0.08 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 4.578045E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.944 | TFLOPs: 11.87 | 7: iteration 35000/ 173500 | consumed samples: 8960000 | consumed tokens: 18350080000 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.570020E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.814 | TFLOPs: 11.82 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 35000 | lm loss value: 4.451998E+00 | lm loss PPL: 8.579818E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 35000 to checkpoints_14m91b100m 0: [2023-03-17 01:07:17,664] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step35000 is begin to save! 0: [2023-03-17 01:07:17,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:07:17,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:07:17,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:07:17,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:07:17,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:07:17,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:07:17,698] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:07:17,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:07:17,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:07:17,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:07:17,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:07:17,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:07:17,706] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step35000/mp_rank_00_model_states.pt 0: [2023-03-17 01:07:17,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:07:17,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:07:17,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2023-03-17 01:07:17,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: successfully saved checkpoint at iteration 35000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.19 7: iteration 35010/ 173500 | consumed samples: 8962560 | consumed tokens: 18355322880 | elapsed time per iteration (s): 0.09 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.588472E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.585 | TFLOPs: 10.36 | 7: iteration 35020/ 173500 | consumed samples: 8965120 | consumed tokens: 18360565760 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.603184E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.856 | TFLOPs: 11.83 | 7: iteration 35030/ 173500 | consumed samples: 8967680 | consumed tokens: 18365808640 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.599161E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.433 | TFLOPs: 11.83 | 7: iteration 35040/ 173500 | consumed samples: 8970240 | consumed tokens: 18371051520 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.581134E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.255 | TFLOPs: 11.84 | 7: iteration 35050/ 173500 | consumed samples: 8972800 | consumed tokens: 18376294400 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.588805E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.070 | TFLOPs: 11.87 | 7: iteration 35060/ 173500 | consumed samples: 8975360 | consumed tokens: 18381537280 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.583849E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.606 | TFLOPs: 11.83 | 7: iteration 35070/ 173500 | consumed samples: 8977920 | consumed tokens: 18386780160 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.593790E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.446 | TFLOPs: 11.75 | 7: iteration 35080/ 173500 | consumed samples: 8980480 | consumed tokens: 18392023040 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.596883E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.568 | TFLOPs: 11.88 | 7: iteration 35090/ 173500 | consumed samples: 8983040 | consumed tokens: 18397265920 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.598898E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.754 | TFLOPs: 11.88 | 7: iteration 35100/ 173500 | consumed samples: 8985600 | consumed tokens: 18402508800 | elapsed time per iteration (s): 0.08 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 4.580899E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.695 | TFLOPs: 11.87 | 7: iteration 35110/ 173500 | consumed samples: 8988160 | consumed tokens: 18407751680 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.577896E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.251 | TFLOPs: 11.86 | 7: iteration 35120/ 173500 | consumed samples: 8990720 | consumed tokens: 18412994560 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.599057E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.146 | TFLOPs: 11.83 | 7: iteration 35130/ 173500 | consumed samples: 8993280 | consumed tokens: 18418237440 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.578279E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.714 | TFLOPs: 11.80 | 7: iteration 35140/ 173500 | consumed samples: 8995840 | consumed tokens: 18423480320 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.592627E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.018 | TFLOPs: 11.74 | 7: iteration 35150/ 173500 | consumed samples: 8998400 | consumed tokens: 18428723200 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.585635E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.288 | TFLOPs: 11.80 | 7: iteration 35160/ 173500 | consumed samples: 9000960 | consumed tokens: 18433966080 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.584819E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.936 | TFLOPs: 11.82 | 7: iteration 35170/ 173500 | consumed samples: 9003520 | consumed tokens: 18439208960 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.596802E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.939 | TFLOPs: 11.84 | 7: iteration 35180/ 173500 | consumed samples: 9006080 | consumed tokens: 18444451840 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.586044E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.773 | TFLOPs: 11.82 | 7: iteration 35190/ 173500 | consumed samples: 9008640 | consumed tokens: 18449694720 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.584164E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.345 | TFLOPs: 11.81 | 7: iteration 35200/ 173500 | consumed samples: 9011200 | consumed tokens: 18454937600 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.583574E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.886 | TFLOPs: 11.80 | 7: iteration 35210/ 173500 | consumed samples: 9013760 | consumed tokens: 18460180480 | elapsed time per iteration (s): 0.08 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 4.588401E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.796 | TFLOPs: 11.82 | 7: iteration 35220/ 173500 | consumed samples: 9016320 | consumed tokens: 18465423360 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.596667E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.667 | TFLOPs: 11.82 | 7: iteration 35230/ 173500 | consumed samples: 9018880 | consumed tokens: 18470666240 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.599743E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.480 | TFLOPs: 11.84 | 7: iteration 35240/ 173500 | consumed samples: 9021440 | consumed tokens: 18475909120 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.582460E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.591 | TFLOPs: 11.85 | 7: iteration 35250/ 173500 | consumed samples: 9024000 | consumed tokens: 18481152000 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.582964E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.433 | TFLOPs: 11.83 | 7: iteration 35260/ 173500 | consumed samples: 9026560 | consumed tokens: 18486394880 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.588864E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.929 | TFLOPs: 11.81 | 7: iteration 35270/ 173500 | consumed samples: 9029120 | consumed tokens: 18491637760 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.575631E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.711 | TFLOPs: 11.85 | 7: iteration 35280/ 173500 | consumed samples: 9031680 | consumed tokens: 18496880640 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.577179E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.220 | TFLOPs: 11.75 | 7: iteration 35290/ 173500 | consumed samples: 9034240 | consumed tokens: 18502123520 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.574939E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.794 | TFLOPs: 11.82 | 7: iteration 35300/ 173500 | consumed samples: 9036800 | consumed tokens: 18507366400 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.591795E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.843 | TFLOPs: 11.86 | 7: iteration 35310/ 173500 | consumed samples: 9039360 | consumed tokens: 18512609280 | elapsed time per iteration (s): 0.08 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 4.582381E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.891 | TFLOPs: 11.85 | 7: iteration 35320/ 173500 | consumed samples: 9041920 | consumed tokens: 18517852160 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.590551E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.466 | TFLOPs: 11.79 | 7: iteration 35330/ 173500 | consumed samples: 9044480 | consumed tokens: 18523095040 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.594496E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.342 | TFLOPs: 11.79 | 7: iteration 35340/ 173500 | consumed samples: 9047040 | consumed tokens: 18528337920 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.574128E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.749 | TFLOPs: 11.80 | 7: iteration 35350/ 173500 | consumed samples: 9049600 | consumed tokens: 18533580800 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.586340E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.120 | TFLOPs: 11.80 | 7: iteration 35360/ 173500 | consumed samples: 9052160 | consumed tokens: 18538823680 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.585840E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.201 | TFLOPs: 11.76 | 7: iteration 35370/ 173500 | consumed samples: 9054720 | consumed tokens: 18544066560 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.579451E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.436 | TFLOPs: 11.84 | 7: iteration 35380/ 173500 | consumed samples: 9057280 | consumed tokens: 18549309440 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.595411E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.353 | TFLOPs: 11.84 | 7: iteration 35390/ 173500 | consumed samples: 9059840 | consumed tokens: 18554552320 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.583138E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.603 | TFLOPs: 11.77 | 7: iteration 35400/ 173500 | consumed samples: 9062400 | consumed tokens: 18559795200 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.583073E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.413 | TFLOPs: 11.79 | 7: iteration 35410/ 173500 | consumed samples: 9064960 | consumed tokens: 18565038080 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.576067E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.368 | TFLOPs: 11.86 | 7: iteration 35420/ 173500 | consumed samples: 9067520 | consumed tokens: 18570280960 | elapsed time per iteration (s): 0.08 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 4.579609E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.828 | TFLOPs: 11.82 | 7: iteration 35430/ 173500 | consumed samples: 9070080 | consumed tokens: 18575523840 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.583310E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.196 | TFLOPs: 11.85 | 7: iteration 35440/ 173500 | consumed samples: 9072640 | consumed tokens: 18580766720 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.579447E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.846 | TFLOPs: 11.79 | 7: iteration 35450/ 173500 | consumed samples: 9075200 | consumed tokens: 18586009600 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.583341E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.959 | TFLOPs: 11.82 | 7: iteration 35460/ 173500 | consumed samples: 9077760 | consumed tokens: 18591252480 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.595332E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.671 | TFLOPs: 11.80 | 7: iteration 35470/ 173500 | consumed samples: 9080320 | consumed tokens: 18596495360 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.578094E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.383 | TFLOPs: 11.60 | 7: iteration 35480/ 173500 | consumed samples: 9082880 | consumed tokens: 18601738240 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.565889E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.556 | TFLOPs: 11.78 | 7: iteration 35490/ 173500 | consumed samples: 9085440 | consumed tokens: 18606981120 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.568700E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.679 | TFLOPs: 11.79 | 7: iteration 35500/ 173500 | consumed samples: 9088000 | consumed tokens: 18612224000 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.582896E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.036 | TFLOPs: 11.87 | 7: iteration 35510/ 173500 | consumed samples: 9090560 | consumed tokens: 18617466880 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.595527E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.197 | TFLOPs: 11.82 | 7: iteration 35520/ 173500 | consumed samples: 9093120 | consumed tokens: 18622709760 | elapsed time per iteration (s): 0.08 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 4.589730E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.824 | TFLOPs: 11.52 | 7: iteration 35530/ 173500 | consumed samples: 9095680 | consumed tokens: 18627952640 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.586653E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.850 | TFLOPs: 11.78 | 7: iteration 35540/ 173500 | consumed samples: 9098240 | consumed tokens: 18633195520 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.585538E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.330 | TFLOPs: 11.82 | 7: iteration 35550/ 173500 | consumed samples: 9100800 | consumed tokens: 18638438400 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.587736E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.455 | TFLOPs: 11.84 | 7: iteration 35560/ 173500 | consumed samples: 9103360 | consumed tokens: 18643681280 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.585623E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.171 | TFLOPs: 11.87 | 7: iteration 35570/ 173500 | consumed samples: 9105920 | consumed tokens: 18648924160 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.580585E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.696 | TFLOPs: 11.77 | 7: iteration 35580/ 173500 | consumed samples: 9108480 | consumed tokens: 18654167040 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.600444E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.139 | TFLOPs: 11.88 | 7: iteration 35590/ 173500 | consumed samples: 9111040 | consumed tokens: 18659409920 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.581044E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.676 | TFLOPs: 11.86 | 7: iteration 35600/ 173500 | consumed samples: 9113600 | consumed tokens: 18664652800 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.590309E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.725 | TFLOPs: 11.86 | 7: iteration 35610/ 173500 | consumed samples: 9116160 | consumed tokens: 18669895680 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.590841E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.269 | TFLOPs: 11.82 | 7: iteration 35620/ 173500 | consumed samples: 9118720 | consumed tokens: 18675138560 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.590671E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.447 | TFLOPs: 11.83 | 7: iteration 35630/ 173500 | consumed samples: 9121280 | consumed tokens: 18680381440 | elapsed time per iteration (s): 0.08 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 4.594730E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.387 | TFLOPs: 11.84 | 7: iteration 35640/ 173500 | consumed samples: 9123840 | consumed tokens: 18685624320 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.583990E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.455 | TFLOPs: 11.88 | 7: iteration 35650/ 173500 | consumed samples: 9126400 | consumed tokens: 18690867200 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.580036E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.100 | TFLOPs: 11.82 | 7: iteration 35660/ 173500 | consumed samples: 9128960 | consumed tokens: 18696110080 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.591798E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.558 | TFLOPs: 11.92 | 7: iteration 35670/ 173500 | consumed samples: 9131520 | consumed tokens: 18701352960 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.582400E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.827 | TFLOPs: 11.88 | 7: iteration 35680/ 173500 | consumed samples: 9134080 | consumed tokens: 18706595840 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.586066E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.337 | TFLOPs: 11.81 | 7: iteration 35690/ 173500 | consumed samples: 9136640 | consumed tokens: 18711838720 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.590191E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.223 | TFLOPs: 11.86 | 7: iteration 35700/ 173500 | consumed samples: 9139200 | consumed tokens: 18717081600 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.591985E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.599 | TFLOPs: 11.90 | 7: iteration 35710/ 173500 | consumed samples: 9141760 | consumed tokens: 18722324480 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.597533E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.202 | TFLOPs: 11.92 | 7: iteration 35720/ 173500 | consumed samples: 9144320 | consumed tokens: 18727567360 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.583861E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.934 | TFLOPs: 11.98 | 7: iteration 35730/ 173500 | consumed samples: 9146880 | consumed tokens: 18732810240 | elapsed time per iteration (s): 0.08 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 4.578274E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.058 | TFLOPs: 11.97 | 7: iteration 35740/ 173500 | consumed samples: 9149440 | consumed tokens: 18738053120 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.593520E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.998 | TFLOPs: 11.88 | 7: iteration 35750/ 173500 | consumed samples: 9152000 | consumed tokens: 18743296000 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.589149E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.526 | TFLOPs: 11.83 | 7: iteration 35760/ 173500 | consumed samples: 9154560 | consumed tokens: 18748538880 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.580230E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.084 | TFLOPs: 11.90 | 7: iteration 35770/ 173500 | consumed samples: 9157120 | consumed tokens: 18753781760 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.576440E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.234 | TFLOPs: 11.91 | 7: iteration 35780/ 173500 | consumed samples: 9159680 | consumed tokens: 18759024640 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.570787E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.506 | TFLOPs: 11.88 | 7: iteration 35790/ 173500 | consumed samples: 9162240 | consumed tokens: 18764267520 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.577060E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.364 | TFLOPs: 11.90 | 7: iteration 35800/ 173500 | consumed samples: 9164800 | consumed tokens: 18769510400 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.589461E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.975 | TFLOPs: 11.84 | 7: iteration 35810/ 173500 | consumed samples: 9167360 | consumed tokens: 18774753280 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.584369E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.123 | TFLOPs: 11.90 | 7: iteration 35820/ 173500 | consumed samples: 9169920 | consumed tokens: 18779996160 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.590625E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.826 | TFLOPs: 11.82 | 7: iteration 35830/ 173500 | consumed samples: 9172480 | consumed tokens: 18785239040 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.586649E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.626 | TFLOPs: 11.81 | 7: iteration 35840/ 173500 | consumed samples: 9175040 | consumed tokens: 18790481920 | elapsed time per iteration (s): 0.08 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 4.590735E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.128 | TFLOPs: 11.84 | 7: iteration 35850/ 173500 | consumed samples: 9177600 | consumed tokens: 18795724800 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.578354E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.609 | TFLOPs: 11.82 | 7: iteration 35860/ 173500 | consumed samples: 9180160 | consumed tokens: 18800967680 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.577851E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.388 | TFLOPs: 11.86 | 7: iteration 35870/ 173500 | consumed samples: 9182720 | consumed tokens: 18806210560 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.578857E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.819 | TFLOPs: 11.83 | 7: iteration 35880/ 173500 | consumed samples: 9185280 | consumed tokens: 18811453440 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.598132E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.127 | TFLOPs: 11.85 | 7: iteration 35890/ 173500 | consumed samples: 9187840 | consumed tokens: 18816696320 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.569526E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.787 | TFLOPs: 11.81 | 7: iteration 35900/ 173500 | consumed samples: 9190400 | consumed tokens: 18821939200 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.576522E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.146 | TFLOPs: 11.84 | 7: iteration 35910/ 173500 | consumed samples: 9192960 | consumed tokens: 18827182080 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.580852E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.307 | TFLOPs: 11.84 | 7: iteration 35920/ 173500 | consumed samples: 9195520 | consumed tokens: 18832424960 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.580721E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.923 | TFLOPs: 11.86 | 7: iteration 35930/ 173500 | consumed samples: 9198080 | consumed tokens: 18837667840 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.580898E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.890 | TFLOPs: 11.81 | 7: iteration 35940/ 173500 | consumed samples: 9200640 | consumed tokens: 18842910720 | elapsed time per iteration (s): 0.08 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 4.572654E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.871 | TFLOPs: 11.85 | 7: iteration 35950/ 173500 | consumed samples: 9203200 | consumed tokens: 18848153600 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.574329E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.031 | TFLOPs: 11.84 | 7: iteration 35960/ 173500 | consumed samples: 9205760 | consumed tokens: 18853396480 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.583920E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.957 | TFLOPs: 11.83 | 7: iteration 35970/ 173500 | consumed samples: 9208320 | consumed tokens: 18858639360 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.593999E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.915 | TFLOPs: 11.84 | 7: iteration 35980/ 173500 | consumed samples: 9210880 | consumed tokens: 18863882240 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.585417E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.333 | TFLOPs: 11.75 | 7: iteration 35990/ 173500 | consumed samples: 9213440 | consumed tokens: 18869125120 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.588984E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.144 | TFLOPs: 11.83 | 0: [2023-03-17 01:08:38,253] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=0, lr=[0.00018289669072542715, 0.00018289669072542715, 0.00018289669072542715], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 36000/ 173500 | consumed samples: 9216000 | consumed tokens: 18874368000 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.590218E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.549 | TFLOPs: 11.53 | 0: steps: 36000 loss: 4.5701 iter time (s): 0.080 samples/sec: 3192.329 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 36000 | lm loss value: 4.504884E+00 | lm loss PPL: 9.045783E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 36000 to checkpoints_14m91b100m 0: [2023-03-17 01:08:38,310] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step36000 is begin to save! 0: [2023-03-17 01:08:38,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:08:38,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:08:38,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:08:38,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:08:38,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:08:38,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:08:38,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:08:38,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:08:38,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:08:38,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:08:38,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:08:38,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:08:38,353] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step36000/mp_rank_00_model_states.pt 0: [2023-03-17 01:08:38,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:08:38,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:08:38,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 01:08:38,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:08:38,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:08:38,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:08:38,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:08:38,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2023-03-17 01:08:38,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:08:38,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: successfully saved checkpoint at iteration 36000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.65 7: iteration 36010/ 173500 | consumed samples: 9218560 | consumed tokens: 18879610880 | elapsed time per iteration (s): 0.09 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.592443E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2709.696 | TFLOPs: 10.08 | 7: iteration 36020/ 173500 | consumed samples: 9221120 | consumed tokens: 18884853760 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.583294E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.052 | TFLOPs: 11.63 | 7: iteration 36030/ 173500 | consumed samples: 9223680 | consumed tokens: 18890096640 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.578920E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.463 | TFLOPs: 11.52 | 7: iteration 36040/ 173500 | consumed samples: 9226240 | consumed tokens: 18895339520 | elapsed time per iteration (s): 0.08 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 4.582945E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.580 | TFLOPs: 11.87 | 7: iteration 36050/ 173500 | consumed samples: 9228800 | consumed tokens: 18900582400 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.588756E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.321 | TFLOPs: 11.81 | 7: iteration 36060/ 173500 | consumed samples: 9231360 | consumed tokens: 18905825280 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.588867E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.441 | TFLOPs: 11.72 | 7: iteration 36070/ 173500 | consumed samples: 9233920 | consumed tokens: 18911068160 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.582536E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.886 | TFLOPs: 11.79 | 7: iteration 36080/ 173500 | consumed samples: 9236480 | consumed tokens: 18916311040 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.566326E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.145 | TFLOPs: 11.78 | 7: iteration 36090/ 173500 | consumed samples: 9239040 | consumed tokens: 18921553920 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.590958E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.603 | TFLOPs: 11.58 | 7: iteration 36100/ 173500 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.583347E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.733 | TFLOPs: 11.52 | 7: iteration 36110/ 173500 | consumed samples: 9244160 | consumed tokens: 18932039680 | elapsed time per iteration (s): 0.09 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.582616E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.704 | TFLOPs: 11.15 | 7: iteration 36120/ 173500 | consumed samples: 9246720 | consumed tokens: 18937282560 | elapsed time per iteration (s): 0.11 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.580615E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2397.192 | TFLOPs: 8.92 | 7: iteration 36130/ 173500 | consumed samples: 9249280 | consumed tokens: 18942525440 | elapsed time per iteration (s): 0.11 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.581882E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.634 | TFLOPs: 9.05 | 7: iteration 36140/ 173500 | consumed samples: 9251840 | consumed tokens: 18947768320 | elapsed time per iteration (s): 0.10 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.576825E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2513.676 | TFLOPs: 9.35 | 7: iteration 36150/ 173500 | consumed samples: 9254400 | consumed tokens: 18953011200 | elapsed time per iteration (s): 0.08 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 4.583765E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.137 | TFLOPs: 11.81 | 7: iteration 36160/ 173500 | consumed samples: 9256960 | consumed tokens: 18958254080 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.559962E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.978 | TFLOPs: 11.87 | 7: iteration 36170/ 173500 | consumed samples: 9259520 | consumed tokens: 18963496960 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.576294E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.713 | TFLOPs: 11.83 | 7: iteration 36180/ 173500 | consumed samples: 9262080 | consumed tokens: 18968739840 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.583791E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.315 | TFLOPs: 11.88 | 7: iteration 36190/ 173500 | consumed samples: 9264640 | consumed tokens: 18973982720 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.588703E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.401 | TFLOPs: 11.85 | 7: iteration 36200/ 173500 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.574123E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.503 | TFLOPs: 11.86 | 7: iteration 36210/ 173500 | consumed samples: 9269760 | consumed tokens: 18984468480 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.570632E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.917 | TFLOPs: 11.88 | 7: iteration 36220/ 173500 | consumed samples: 9272320 | consumed tokens: 18989711360 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.581018E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.130 | TFLOPs: 11.88 | 7: iteration 36230/ 173500 | consumed samples: 9274880 | consumed tokens: 18994954240 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.584911E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.184 | TFLOPs: 11.86 | 7: iteration 36240/ 173500 | consumed samples: 9277440 | consumed tokens: 19000197120 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.577716E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.503 | TFLOPs: 11.87 | 7: iteration 36250/ 173500 | consumed samples: 9280000 | consumed tokens: 19005440000 | elapsed time per iteration (s): 0.08 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 4.572754E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.349 | TFLOPs: 11.89 | 7: iteration 36260/ 173500 | consumed samples: 9282560 | consumed tokens: 19010682880 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.596471E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.204 | TFLOPs: 11.78 | 7: iteration 36270/ 173500 | consumed samples: 9285120 | consumed tokens: 19015925760 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.578888E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.765 | TFLOPs: 11.88 | 7: iteration 36280/ 173500 | consumed samples: 9287680 | consumed tokens: 19021168640 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.582523E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.469 | TFLOPs: 11.85 | 7: iteration 36290/ 173500 | consumed samples: 9290240 | consumed tokens: 19026411520 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.577198E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.954 | TFLOPs: 11.85 | 7: iteration 36300/ 173500 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.579405E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.749 | TFLOPs: 11.84 | 7: iteration 36310/ 173500 | consumed samples: 9295360 | consumed tokens: 19036897280 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.582418E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.982 | TFLOPs: 11.86 | 7: iteration 36320/ 173500 | consumed samples: 9297920 | consumed tokens: 19042140160 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.583539E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.613 | TFLOPs: 11.85 | 7: iteration 36330/ 173500 | consumed samples: 9300480 | consumed tokens: 19047383040 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.584493E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.000 | TFLOPs: 11.86 | 7: iteration 36340/ 173500 | consumed samples: 9303040 | consumed tokens: 19052625920 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.579221E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.940 | TFLOPs: 11.81 | 7: iteration 36350/ 173500 | consumed samples: 9305600 | consumed tokens: 19057868800 | elapsed time per iteration (s): 0.08 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 4.579617E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.279 | TFLOPs: 11.84 | 7: iteration 36360/ 173500 | consumed samples: 9308160 | consumed tokens: 19063111680 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.580201E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.292 | TFLOPs: 11.55 | 7: iteration 36370/ 173500 | consumed samples: 9310720 | consumed tokens: 19068354560 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.592130E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.899 | TFLOPs: 11.79 | 7: iteration 36380/ 173500 | consumed samples: 9313280 | consumed tokens: 19073597440 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.576331E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.655 | TFLOPs: 11.82 | 7: iteration 36390/ 173500 | consumed samples: 9315840 | consumed tokens: 19078840320 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.575435E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.496 | TFLOPs: 11.80 | 7: iteration 36400/ 173500 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.587112E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.387 | TFLOPs: 11.83 | 7: iteration 36410/ 173500 | consumed samples: 9320960 | consumed tokens: 19089326080 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.591919E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.255 | TFLOPs: 11.81 | 7: iteration 36420/ 173500 | consumed samples: 9323520 | consumed tokens: 19094568960 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.580497E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.784 | TFLOPs: 11.67 | 7: iteration 36430/ 173500 | consumed samples: 9326080 | consumed tokens: 19099811840 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.586872E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.196 | TFLOPs: 11.84 | 7: iteration 36440/ 173500 | consumed samples: 9328640 | consumed tokens: 19105054720 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.575399E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.173 | TFLOPs: 11.73 | 7: iteration 36450/ 173500 | consumed samples: 9331200 | consumed tokens: 19110297600 | elapsed time per iteration (s): 0.27 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.582553E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 945.913 | TFLOPs: 3.52 | 7: iteration 36460/ 173500 | consumed samples: 9333760 | consumed tokens: 19115540480 | elapsed time per iteration (s): 0.08 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 4.585625E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.311 | TFLOPs: 11.65 | 7: iteration 36470/ 173500 | consumed samples: 9336320 | consumed tokens: 19120783360 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.579321E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.824 | TFLOPs: 11.91 | 7: iteration 36480/ 173500 | consumed samples: 9338880 | consumed tokens: 19126026240 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.597116E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.654 | TFLOPs: 11.96 | 7: iteration 36490/ 173500 | consumed samples: 9341440 | consumed tokens: 19131269120 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.579067E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.444 | TFLOPs: 11.94 | 7: iteration 36500/ 173500 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.577249E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.322 | TFLOPs: 12.00 | 7: iteration 36510/ 173500 | consumed samples: 9346560 | consumed tokens: 19141754880 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.586958E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.656 | TFLOPs: 12.02 | 7: iteration 36520/ 173500 | consumed samples: 9349120 | consumed tokens: 19146997760 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.562714E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.077 | TFLOPs: 11.99 | 7: iteration 36530/ 173500 | consumed samples: 9351680 | consumed tokens: 19152240640 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.578170E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.850 | TFLOPs: 11.85 | 7: iteration 36540/ 173500 | consumed samples: 9354240 | consumed tokens: 19157483520 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.584074E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.232 | TFLOPs: 11.88 | 7: iteration 36550/ 173500 | consumed samples: 9356800 | consumed tokens: 19162726400 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.585831E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.027 | TFLOPs: 11.85 | 7: iteration 36560/ 173500 | consumed samples: 9359360 | consumed tokens: 19167969280 | elapsed time per iteration (s): 0.08 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 4.584156E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.130 | TFLOPs: 11.84 | 7: iteration 36570/ 173500 | consumed samples: 9361920 | consumed tokens: 19173212160 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.579979E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.108 | TFLOPs: 11.74 | 7: iteration 36580/ 173500 | consumed samples: 9364480 | consumed tokens: 19178455040 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.579428E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.731 | TFLOPs: 11.81 | 7: iteration 36590/ 173500 | consumed samples: 9367040 | consumed tokens: 19183697920 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.573844E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.751 | TFLOPs: 11.88 | 7: iteration 36600/ 173500 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.580860E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.651 | TFLOPs: 11.88 | 7: iteration 36610/ 173500 | consumed samples: 9372160 | consumed tokens: 19194183680 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.567897E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.857 | TFLOPs: 11.59 | 7: iteration 36620/ 173500 | consumed samples: 9374720 | consumed tokens: 19199426560 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.586140E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.867 | TFLOPs: 11.89 | 7: iteration 36630/ 173500 | consumed samples: 9377280 | consumed tokens: 19204669440 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.589840E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.101 | TFLOPs: 12.03 | 7: iteration 36640/ 173500 | consumed samples: 9379840 | consumed tokens: 19209912320 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.598596E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.355 | TFLOPs: 11.72 | 7: iteration 36650/ 173500 | consumed samples: 9382400 | consumed tokens: 19215155200 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.568825E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.167 | TFLOPs: 11.35 | 7: iteration 36660/ 173500 | consumed samples: 9384960 | consumed tokens: 19220398080 | elapsed time per iteration (s): 0.08 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 4.579654E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.835 | TFLOPs: 11.66 | 7: iteration 36670/ 173500 | consumed samples: 9387520 | consumed tokens: 19225640960 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.585910E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.018 | TFLOPs: 11.78 | 7: iteration 36680/ 173500 | consumed samples: 9390080 | consumed tokens: 19230883840 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.585363E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.526 | TFLOPs: 11.81 | 7: iteration 36690/ 173500 | consumed samples: 9392640 | consumed tokens: 19236126720 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.595501E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.694 | TFLOPs: 12.03 | 7: iteration 36700/ 173500 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.583840E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.818 | TFLOPs: 12.03 | 7: iteration 36710/ 173500 | consumed samples: 9397760 | consumed tokens: 19246612480 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.587332E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.893 | TFLOPs: 11.93 | 7: iteration 36720/ 173500 | consumed samples: 9400320 | consumed tokens: 19251855360 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.576738E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.601 | TFLOPs: 11.44 | 7: iteration 36730/ 173500 | consumed samples: 9402880 | consumed tokens: 19257098240 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.583389E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.712 | TFLOPs: 12.01 | 7: iteration 36740/ 173500 | consumed samples: 9405440 | consumed tokens: 19262341120 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.588887E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.851 | TFLOPs: 12.00 | 7: iteration 36750/ 173500 | consumed samples: 9408000 | consumed tokens: 19267584000 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.591194E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.873 | TFLOPs: 12.01 | 7: iteration 36760/ 173500 | consumed samples: 9410560 | consumed tokens: 19272826880 | elapsed time per iteration (s): 0.08 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 4.588213E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.021 | TFLOPs: 11.95 | 7: iteration 36770/ 173500 | consumed samples: 9413120 | consumed tokens: 19278069760 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.591215E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.777 | TFLOPs: 11.70 | 7: iteration 36780/ 173500 | consumed samples: 9415680 | consumed tokens: 19283312640 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.576620E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.989 | TFLOPs: 12.04 | 7: iteration 36790/ 173500 | consumed samples: 9418240 | consumed tokens: 19288555520 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.573746E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.785 | TFLOPs: 11.83 | 7: iteration 36800/ 173500 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.586720E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.905 | TFLOPs: 11.95 | 7: iteration 36810/ 173500 | consumed samples: 9423360 | consumed tokens: 19299041280 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.572835E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.266 | TFLOPs: 11.99 | 7: iteration 36820/ 173500 | consumed samples: 9425920 | consumed tokens: 19304284160 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.571476E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.217 | TFLOPs: 11.44 | 7: iteration 36830/ 173500 | consumed samples: 9428480 | consumed tokens: 19309527040 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.586354E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.076 | TFLOPs: 11.66 | 7: iteration 36840/ 173500 | consumed samples: 9431040 | consumed tokens: 19314769920 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.586612E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.701 | TFLOPs: 11.98 | 7: iteration 36850/ 173500 | consumed samples: 9433600 | consumed tokens: 19320012800 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.587094E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.543 | TFLOPs: 12.00 | 7: iteration 36860/ 173500 | consumed samples: 9436160 | consumed tokens: 19325255680 | elapsed time per iteration (s): 0.08 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 4.576405E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.473 | TFLOPs: 11.88 | 7: iteration 36870/ 173500 | consumed samples: 9438720 | consumed tokens: 19330498560 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.581165E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.940 | TFLOPs: 11.40 | 7: iteration 36880/ 173500 | consumed samples: 9441280 | consumed tokens: 19335741440 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.583119E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.187 | TFLOPs: 11.65 | 7: iteration 36890/ 173500 | consumed samples: 9443840 | consumed tokens: 19340984320 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.570798E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.137 | TFLOPs: 11.94 | 7: iteration 36900/ 173500 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.588069E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.125 | TFLOPs: 11.95 | 7: iteration 36910/ 173500 | consumed samples: 9448960 | consumed tokens: 19351470080 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.575520E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.110 | TFLOPs: 11.96 | 7: iteration 36920/ 173500 | consumed samples: 9451520 | consumed tokens: 19356712960 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.575492E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.040 | TFLOPs: 11.71 | 7: iteration 36930/ 173500 | consumed samples: 9454080 | consumed tokens: 19361955840 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.576783E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.652 | TFLOPs: 11.60 | 7: iteration 36940/ 173500 | consumed samples: 9456640 | consumed tokens: 19367198720 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.580972E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.481 | TFLOPs: 11.95 | 7: iteration 36950/ 173500 | consumed samples: 9459200 | consumed tokens: 19372441600 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.568674E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.398 | TFLOPs: 12.00 | 7: iteration 36960/ 173500 | consumed samples: 9461760 | consumed tokens: 19377684480 | elapsed time per iteration (s): 0.08 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 4.577838E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.221 | TFLOPs: 11.75 | 7: iteration 36970/ 173500 | consumed samples: 9464320 | consumed tokens: 19382927360 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.578643E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.825 | TFLOPs: 12.01 | 7: iteration 36980/ 173500 | consumed samples: 9466880 | consumed tokens: 19388170240 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.576380E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.826 | TFLOPs: 12.00 | 7: iteration 36990/ 173500 | consumed samples: 9469440 | consumed tokens: 19393413120 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.573483E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.461 | TFLOPs: 11.97 | 7: iteration 37000/ 173500 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.590746E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.433 | TFLOPs: 11.94 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 37000 | lm loss value: 4.471077E+00 | lm loss PPL: 8.745085E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 37000 to checkpoints_14m91b100m 0: [2023-03-17 01:10:01,633] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step37000 is begin to save! 0: [2023-03-17 01:10:01,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:10:01,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:10:01,662] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:10:01,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:10:01,665] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:10:01,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:10:01,669] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:10:01,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:10:01,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:10:01,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:10:01,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:10:01,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:10:01,675] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step37000/mp_rank_00_model_states.pt 0: [2023-03-17 01:10:01,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:10:01,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:10:01,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:10:01,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2023-03-17 01:10:01,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2023-03-17 01:10:01,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:10:01,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2023-03-17 01:10:01,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:10:01,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:10:01,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2023-03-17 01:10:01,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:10:01,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:10:01,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2023-03-17 01:10:01,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:10:01,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:10:01,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: successfully saved checkpoint at iteration 37000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.92 7: iteration 37010/ 173500 | consumed samples: 9474560 | consumed tokens: 19403898880 | elapsed time per iteration (s): 0.09 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.573774E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2820.724 | TFLOPs: 10.49 | 7: iteration 37020/ 173500 | consumed samples: 9477120 | consumed tokens: 19409141760 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.566997E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.870 | TFLOPs: 11.98 | 7: iteration 37030/ 173500 | consumed samples: 9479680 | consumed tokens: 19414384640 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.584648E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.062 | TFLOPs: 12.05 | 7: iteration 37040/ 173500 | consumed samples: 9482240 | consumed tokens: 19419627520 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.570123E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.537 | TFLOPs: 11.92 | 7: iteration 37050/ 173500 | consumed samples: 9484800 | consumed tokens: 19424870400 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.562685E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.626 | TFLOPs: 11.88 | 7: iteration 37060/ 173500 | consumed samples: 9487360 | consumed tokens: 19430113280 | elapsed time per iteration (s): 0.08 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 4.580008E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.241 | TFLOPs: 11.86 | 7: iteration 37070/ 173500 | consumed samples: 9489920 | consumed tokens: 19435356160 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.580074E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.901 | TFLOPs: 11.88 | 7: iteration 37080/ 173500 | consumed samples: 9492480 | consumed tokens: 19440599040 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.569913E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.133 | TFLOPs: 11.85 | 7: iteration 37090/ 173500 | consumed samples: 9495040 | consumed tokens: 19445841920 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.577065E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.547 | TFLOPs: 11.80 | 7: iteration 37100/ 173500 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.576417E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.024 | TFLOPs: 11.81 | 7: iteration 37110/ 173500 | consumed samples: 9500160 | consumed tokens: 19456327680 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.578651E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.291 | TFLOPs: 11.78 | 7: iteration 37120/ 173500 | consumed samples: 9502720 | consumed tokens: 19461570560 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.578431E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.848 | TFLOPs: 11.86 | 7: iteration 37130/ 173500 | consumed samples: 9505280 | consumed tokens: 19466813440 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.578206E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.310 | TFLOPs: 11.85 | 7: iteration 37140/ 173500 | consumed samples: 9507840 | consumed tokens: 19472056320 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.588262E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.058 | TFLOPs: 11.85 | 7: iteration 37150/ 173500 | consumed samples: 9510400 | consumed tokens: 19477299200 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.586344E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.022 | TFLOPs: 11.71 | 7: iteration 37160/ 173500 | consumed samples: 9512960 | consumed tokens: 19482542080 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.570474E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.653 | TFLOPs: 11.52 | 7: iteration 37170/ 173500 | consumed samples: 9515520 | consumed tokens: 19487784960 | elapsed time per iteration (s): 0.08 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 4.582606E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.432 | TFLOPs: 11.76 | 7: iteration 37180/ 173500 | consumed samples: 9518080 | consumed tokens: 19493027840 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.564063E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.915 | TFLOPs: 11.78 | 7: iteration 37190/ 173500 | consumed samples: 9520640 | consumed tokens: 19498270720 | elapsed time per iteration (s): 0.12 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.587162E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2158.983 | TFLOPs: 8.03 | 7: iteration 37200/ 173500 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.576578E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.942 | TFLOPs: 11.81 | 7: iteration 37210/ 173500 | consumed samples: 9525760 | consumed tokens: 19508756480 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.574413E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.189 | TFLOPs: 11.75 | 7: iteration 37220/ 173500 | consumed samples: 9528320 | consumed tokens: 19513999360 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.586844E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.810 | TFLOPs: 11.82 | 7: iteration 37230/ 173500 | consumed samples: 9530880 | consumed tokens: 19519242240 | elapsed time per iteration (s): 0.11 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.576962E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2291.082 | TFLOPs: 8.52 | 7: iteration 37240/ 173500 | consumed samples: 9533440 | consumed tokens: 19524485120 | elapsed time per iteration (s): 0.11 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.570937E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.261 | TFLOPs: 9.05 | 7: iteration 37250/ 173500 | consumed samples: 9536000 | consumed tokens: 19529728000 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.595544E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.005 | TFLOPs: 11.78 | 7: iteration 37260/ 173500 | consumed samples: 9538560 | consumed tokens: 19534970880 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.578744E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.811 | TFLOPs: 11.77 | 7: iteration 37270/ 173500 | consumed samples: 9541120 | consumed tokens: 19540213760 | elapsed time per iteration (s): 0.08 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 4.576207E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.227 | TFLOPs: 11.84 | 7: iteration 37280/ 173500 | consumed samples: 9543680 | consumed tokens: 19545456640 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.585632E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.823 | TFLOPs: 11.81 | 7: iteration 37290/ 173500 | consumed samples: 9546240 | consumed tokens: 19550699520 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.568734E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.324 | TFLOPs: 11.88 | 7: iteration 37300/ 173500 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.572705E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.225 | TFLOPs: 11.89 | 7: iteration 37310/ 173500 | consumed samples: 9551360 | consumed tokens: 19561185280 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.581258E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.781 | TFLOPs: 11.86 | 7: iteration 37320/ 173500 | consumed samples: 9553920 | consumed tokens: 19566428160 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.582046E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.800 | TFLOPs: 11.89 | 7: iteration 37330/ 173500 | consumed samples: 9556480 | consumed tokens: 19571671040 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.582991E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.498 | TFLOPs: 11.87 | 7: iteration 37340/ 173500 | consumed samples: 9559040 | consumed tokens: 19576913920 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.579644E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.170 | TFLOPs: 11.83 | 7: iteration 37350/ 173500 | consumed samples: 9561600 | consumed tokens: 19582156800 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.572614E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.153 | TFLOPs: 11.81 | 7: iteration 37360/ 173500 | consumed samples: 9564160 | consumed tokens: 19587399680 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.581886E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.917 | TFLOPs: 11.87 | 7: iteration 37370/ 173500 | consumed samples: 9566720 | consumed tokens: 19592642560 | elapsed time per iteration (s): 0.08 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 4.564421E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.783 | TFLOPs: 11.88 | 7: iteration 37380/ 173500 | consumed samples: 9569280 | consumed tokens: 19597885440 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.581725E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.174 | TFLOPs: 11.87 | 7: iteration 37390/ 173500 | consumed samples: 9571840 | consumed tokens: 19603128320 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.594094E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.092 | TFLOPs: 11.87 | 7: iteration 37400/ 173500 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.582393E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.292 | TFLOPs: 11.87 | 7: iteration 37410/ 173500 | consumed samples: 9576960 | consumed tokens: 19613614080 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.579387E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.153 | TFLOPs: 11.89 | 7: iteration 37420/ 173500 | consumed samples: 9579520 | consumed tokens: 19618856960 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.580943E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.064 | TFLOPs: 11.82 | 7: iteration 37430/ 173500 | consumed samples: 9582080 | consumed tokens: 19624099840 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.577403E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.173 | TFLOPs: 11.87 | 7: iteration 37440/ 173500 | consumed samples: 9584640 | consumed tokens: 19629342720 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.583091E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.971 | TFLOPs: 11.87 | 7: iteration 37450/ 173500 | consumed samples: 9587200 | consumed tokens: 19634585600 | elapsed time per iteration (s): 0.08 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.589244E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.681 | TFLOPs: 11.88 | 7: iteration 37460/ 173500 | consumed samples: 9589760 | consumed tokens: 19639828480 | elapsed time per iteration (s): 0.09 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.579385E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.323 | TFLOPs: 10.61 | 7: iteration 37470/ 173500 | consumed samples: 9592320 | consumed tokens: 19645071360 | elapsed time per iteration (s): 0.09 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 4.583348E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.000 | TFLOPs: 10.30 | 7: iteration 37480/ 173500 | consumed samples: 9594880 | consumed tokens: 19650314240 | elapsed time per iteration (s): 0.08 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.576262E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.500 | TFLOPs: 11.34 | 7: iteration 37490/ 173500 | consumed samples: 9597440 | consumed tokens: 19655557120 | elapsed time per iteration (s): 0.08 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.563616E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.366 | TFLOPs: 11.86 | 7: iteration 37500/ 173500 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.08 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.574251E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.047 | TFLOPs: 11.89 | 7: iteration 37510/ 173500 | consumed samples: 9602560 | consumed tokens: 19666042880 | elapsed time per iteration (s): 0.08 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.596388E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.899 | TFLOPs: 11.82 | 7: iteration 37520/ 173500 | consumed samples: 9605120 | consumed tokens: 19671285760 | elapsed time per iteration (s): 0.08 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.572320E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.529 | TFLOPs: 11.85 | 7: iteration 37530/ 173500 | consumed samples: 9607680 | consumed tokens: 19676528640 | elapsed time per iteration (s): 0.11 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.582285E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2420.633 | TFLOPs: 9.00 | 7: iteration 37540/ 173500 | consumed samples: 9610240 | consumed tokens: 19681771520 | elapsed time per iteration (s): 0.11 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.575554E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.776 | TFLOPs: 8.69 | 7: iteration 37550/ 173500 | consumed samples: 9612800 | consumed tokens: 19687014400 | elapsed time per iteration (s): 0.11 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.583873E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.630 | TFLOPs: 8.50 | 7: iteration 37560/ 173500 | consumed samples: 9615360 | consumed tokens: 19692257280 | elapsed time per iteration (s): 0.11 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.574257E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2330.015 | TFLOPs: 8.67 | 7: iteration 37570/ 173500 | consumed samples: 9617920 | consumed tokens: 19697500160 | elapsed time per iteration (s): 0.11 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 4.574084E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2283.260 | TFLOPs: 8.49 | 7: iteration 37580/ 173500 | consumed samples: 9620480 | consumed tokens: 19702743040 | elapsed time per iteration (s): 0.09 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.583435E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.502 | TFLOPs: 10.24 | 7: iteration 37590/ 173500 | consumed samples: 9623040 | consumed tokens: 19707985920 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.573425E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.318 | TFLOPs: 11.84 | 7: iteration 37600/ 173500 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.580577E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.060 | TFLOPs: 11.79 | 7: iteration 37610/ 173500 | consumed samples: 9628160 | consumed tokens: 19718471680 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.576017E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.827 | TFLOPs: 11.85 | 7: iteration 37620/ 173500 | consumed samples: 9630720 | consumed tokens: 19723714560 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.577182E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.451 | TFLOPs: 11.85 | 7: iteration 37630/ 173500 | consumed samples: 9633280 | consumed tokens: 19728957440 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.580619E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.033 | TFLOPs: 11.83 | 7: iteration 37640/ 173500 | consumed samples: 9635840 | consumed tokens: 19734200320 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.577353E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.835 | TFLOPs: 11.79 | 7: iteration 37650/ 173500 | consumed samples: 9638400 | consumed tokens: 19739443200 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.577776E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.144 | TFLOPs: 11.73 | 7: iteration 37660/ 173500 | consumed samples: 9640960 | consumed tokens: 19744686080 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.579489E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.114 | TFLOPs: 11.80 | 7: iteration 37670/ 173500 | consumed samples: 9643520 | consumed tokens: 19749928960 | elapsed time per iteration (s): 0.08 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 4.582876E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.778 | TFLOPs: 11.78 | 7: iteration 37680/ 173500 | consumed samples: 9646080 | consumed tokens: 19755171840 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.573435E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.134 | TFLOPs: 11.79 | 7: iteration 37690/ 173500 | consumed samples: 9648640 | consumed tokens: 19760414720 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.584002E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.207 | TFLOPs: 11.78 | 7: iteration 37700/ 173500 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.579437E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.921 | TFLOPs: 11.74 | 7: iteration 37710/ 173500 | consumed samples: 9653760 | consumed tokens: 19770900480 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.578988E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.637 | TFLOPs: 11.80 | 7: iteration 37720/ 173500 | consumed samples: 9656320 | consumed tokens: 19776143360 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.576215E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.481 | TFLOPs: 11.78 | 7: iteration 37730/ 173500 | consumed samples: 9658880 | consumed tokens: 19781386240 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.584203E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.571 | TFLOPs: 11.83 | 7: iteration 37740/ 173500 | consumed samples: 9661440 | consumed tokens: 19786629120 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.569844E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.201 | TFLOPs: 11.74 | 7: iteration 37750/ 173500 | consumed samples: 9664000 | consumed tokens: 19791872000 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.580189E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.757 | TFLOPs: 11.85 | 7: iteration 37760/ 173500 | consumed samples: 9666560 | consumed tokens: 19797114880 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.584832E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.741 | TFLOPs: 11.79 | 7: iteration 37770/ 173500 | consumed samples: 9669120 | consumed tokens: 19802357760 | elapsed time per iteration (s): 0.08 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 4.569086E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.631 | TFLOPs: 11.85 | 7: iteration 37780/ 173500 | consumed samples: 9671680 | consumed tokens: 19807600640 | elapsed time per iteration (s): 0.08 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.578764E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.472 | TFLOPs: 11.84 | 7: iteration 37790/ 173500 | consumed samples: 9674240 | consumed tokens: 19812843520 | elapsed time per iteration (s): 0.10 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.573963E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.494 | TFLOPs: 9.70 | 7: iteration 37800/ 173500 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.12 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.582295E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.387 | TFLOPs: 8.18 | 7: iteration 37810/ 173500 | consumed samples: 9679360 | consumed tokens: 19823329280 | elapsed time per iteration (s): 0.12 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.570429E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.862 | TFLOPs: 8.10 | 7: iteration 37820/ 173500 | consumed samples: 9681920 | consumed tokens: 19828572160 | elapsed time per iteration (s): 0.10 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.574768E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.927 | TFLOPs: 9.30 | 7: iteration 37830/ 173500 | consumed samples: 9684480 | consumed tokens: 19833815040 | elapsed time per iteration (s): 0.09 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.587439E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.719 | TFLOPs: 10.33 | 7: iteration 37840/ 173500 | consumed samples: 9687040 | consumed tokens: 19839057920 | elapsed time per iteration (s): 0.08 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.572419E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.917 | TFLOPs: 11.86 | 7: iteration 37850/ 173500 | consumed samples: 9689600 | consumed tokens: 19844300800 | elapsed time per iteration (s): 0.08 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.578844E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.747 | TFLOPs: 11.86 | 7: iteration 37860/ 173500 | consumed samples: 9692160 | consumed tokens: 19849543680 | elapsed time per iteration (s): 0.08 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 4.570065E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.534 | TFLOPs: 11.86 | 7: iteration 37870/ 173500 | consumed samples: 9694720 | consumed tokens: 19854786560 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.567229E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.387 | TFLOPs: 11.88 | 7: iteration 37880/ 173500 | consumed samples: 9697280 | consumed tokens: 19860029440 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.577818E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.111 | TFLOPs: 11.84 | 7: iteration 37890/ 173500 | consumed samples: 9699840 | consumed tokens: 19865272320 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.568999E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.164 | TFLOPs: 11.91 | 7: iteration 37900/ 173500 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.586579E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.117 | TFLOPs: 11.89 | 7: iteration 37910/ 173500 | consumed samples: 9704960 | consumed tokens: 19875758080 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.571313E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.956 | TFLOPs: 11.84 | 7: iteration 37920/ 173500 | consumed samples: 9707520 | consumed tokens: 19881000960 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.568424E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.478 | TFLOPs: 11.85 | 7: iteration 37930/ 173500 | consumed samples: 9710080 | consumed tokens: 19886243840 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.574886E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.117 | TFLOPs: 11.87 | 7: iteration 37940/ 173500 | consumed samples: 9712640 | consumed tokens: 19891486720 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.594349E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.099 | TFLOPs: 11.87 | 7: iteration 37950/ 173500 | consumed samples: 9715200 | consumed tokens: 19896729600 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.576772E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.109 | TFLOPs: 11.87 | 7: iteration 37960/ 173500 | consumed samples: 9717760 | consumed tokens: 19901972480 | elapsed time per iteration (s): 0.08 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 4.578909E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.485 | TFLOPs: 11.87 | 7: iteration 37970/ 173500 | consumed samples: 9720320 | consumed tokens: 19907215360 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.579125E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.168 | TFLOPs: 11.86 | 7: iteration 37980/ 173500 | consumed samples: 9722880 | consumed tokens: 19912458240 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.579340E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.321 | TFLOPs: 11.87 | 7: iteration 37990/ 173500 | consumed samples: 9725440 | consumed tokens: 19917701120 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.551828E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.258 | TFLOPs: 11.88 | 0: [2023-03-17 01:11:26,179] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=0, lr=[0.00018091754328052937, 0.00018091754328052937, 0.00018091754328052937], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 38000/ 173500 | consumed samples: 9728000 | consumed tokens: 19922944000 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.578477E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.690 | TFLOPs: 11.87 | 0: steps: 38000 loss: 4.5743 iter time (s): 0.083 samples/sec: 3092.718 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 38000 | lm loss value: 4.431454E+00 | lm loss PPL: 8.405352E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 38000 to checkpoints_14m91b100m 0: [2023-03-17 01:11:26,236] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step38000 is begin to save! 0: [2023-03-17 01:11:26,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:11:26,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:11:26,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:11:26,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:11:26,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:11:26,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:11:26,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:11:26,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:11:26,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:11:26,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:11:26,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:11:26,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:11:26,278] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step38000/mp_rank_00_model_states.pt 0: [2023-03-17 01:11:26,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:11:26,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:11:26,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:11:26,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:11:26,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2023-03-17 01:11:26,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 01:11:26,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:11:26,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2023-03-17 01:11:26,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 01:11:26,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2023-03-17 01:11:26,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2023-03-17 01:11:26,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:11:26,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:11:26,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: successfully saved checkpoint at iteration 38000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.83 7: iteration 38010/ 173500 | consumed samples: 9730560 | consumed tokens: 19928186880 | elapsed time per iteration (s): 0.09 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.573049E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2738.325 | TFLOPs: 10.19 | 7: iteration 38020/ 173500 | consumed samples: 9733120 | consumed tokens: 19933429760 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.571453E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.825 | TFLOPs: 11.83 | 7: iteration 38030/ 173500 | consumed samples: 9735680 | consumed tokens: 19938672640 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.567685E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.167 | TFLOPs: 11.85 | 7: iteration 38040/ 173500 | consumed samples: 9738240 | consumed tokens: 19943915520 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.576828E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.383 | TFLOPs: 11.79 | 7: iteration 38050/ 173500 | consumed samples: 9740800 | consumed tokens: 19949158400 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.572971E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.881 | TFLOPs: 11.81 | 7: iteration 38060/ 173500 | consumed samples: 9743360 | consumed tokens: 19954401280 | elapsed time per iteration (s): 0.08 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 4.569149E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.812 | TFLOPs: 11.78 | 7: iteration 38070/ 173500 | consumed samples: 9745920 | consumed tokens: 19959644160 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.583304E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.172 | TFLOPs: 11.82 | 7: iteration 38080/ 173500 | consumed samples: 9748480 | consumed tokens: 19964887040 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.566578E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.076 | TFLOPs: 11.84 | 7: iteration 38090/ 173500 | consumed samples: 9751040 | consumed tokens: 19970129920 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.574023E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.583 | TFLOPs: 11.85 | 7: iteration 38100/ 173500 | consumed samples: 9753600 | consumed tokens: 19975372800 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.568615E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.290 | TFLOPs: 11.86 | 7: iteration 38110/ 173500 | consumed samples: 9756160 | consumed tokens: 19980615680 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.577365E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.931 | TFLOPs: 11.70 | 7: iteration 38120/ 173500 | consumed samples: 9758720 | consumed tokens: 19985858560 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.580815E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.329 | TFLOPs: 11.64 | 7: iteration 38130/ 173500 | consumed samples: 9761280 | consumed tokens: 19991101440 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.573315E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.495 | TFLOPs: 11.83 | 7: iteration 38140/ 173500 | consumed samples: 9763840 | consumed tokens: 19996344320 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.575807E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.260 | TFLOPs: 11.83 | 7: iteration 38150/ 173500 | consumed samples: 9766400 | consumed tokens: 20001587200 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.583332E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.878 | TFLOPs: 11.84 | 7: iteration 38160/ 173500 | consumed samples: 9768960 | consumed tokens: 20006830080 | elapsed time per iteration (s): 0.08 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 4.590375E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.456 | TFLOPs: 11.83 | 7: iteration 38170/ 173500 | consumed samples: 9771520 | consumed tokens: 20012072960 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.583603E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.773 | TFLOPs: 11.83 | 7: iteration 38180/ 173500 | consumed samples: 9774080 | consumed tokens: 20017315840 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.567253E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.694 | TFLOPs: 11.83 | 7: iteration 38190/ 173500 | consumed samples: 9776640 | consumed tokens: 20022558720 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.566220E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.762 | TFLOPs: 11.70 | 7: iteration 38200/ 173500 | consumed samples: 9779200 | consumed tokens: 20027801600 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.573859E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.040 | TFLOPs: 11.85 | 7: iteration 38210/ 173500 | consumed samples: 9781760 | consumed tokens: 20033044480 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.561077E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.987 | TFLOPs: 11.85 | 7: iteration 38220/ 173500 | consumed samples: 9784320 | consumed tokens: 20038287360 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.580154E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.470 | TFLOPs: 11.85 | 7: iteration 38230/ 173500 | consumed samples: 9786880 | consumed tokens: 20043530240 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.573365E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.481 | TFLOPs: 11.70 | 7: iteration 38240/ 173500 | consumed samples: 9789440 | consumed tokens: 20048773120 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.585132E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.319 | TFLOPs: 11.79 | 7: iteration 38250/ 173500 | consumed samples: 9792000 | consumed tokens: 20054016000 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.581091E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.385 | TFLOPs: 11.80 | 7: iteration 38260/ 173500 | consumed samples: 9794560 | consumed tokens: 20059258880 | elapsed time per iteration (s): 0.08 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 4.563492E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.718 | TFLOPs: 11.79 | 7: iteration 38270/ 173500 | consumed samples: 9797120 | consumed tokens: 20064501760 | elapsed time per iteration (s): 0.09 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.572664E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.118 | TFLOPs: 10.07 | 7: iteration 38280/ 173500 | consumed samples: 9799680 | consumed tokens: 20069744640 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.586925E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.536 | TFLOPs: 8.59 | 7: iteration 38290/ 173500 | consumed samples: 9802240 | consumed tokens: 20074987520 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.568018E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.804 | TFLOPs: 8.62 | 7: iteration 38300/ 173500 | consumed samples: 9804800 | consumed tokens: 20080230400 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.575746E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.571 | TFLOPs: 8.50 | 7: iteration 38310/ 173500 | consumed samples: 9807360 | consumed tokens: 20085473280 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.575460E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2336.051 | TFLOPs: 8.69 | 7: iteration 38320/ 173500 | consumed samples: 9809920 | consumed tokens: 20090716160 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.575949E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.881 | TFLOPs: 8.50 | 7: iteration 38330/ 173500 | consumed samples: 9812480 | consumed tokens: 20095959040 | elapsed time per iteration (s): 0.09 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.587455E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.447 | TFLOPs: 10.22 | 7: iteration 38340/ 173500 | consumed samples: 9815040 | consumed tokens: 20101201920 | elapsed time per iteration (s): 0.10 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.567123E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.262 | TFLOPs: 9.94 | 7: iteration 38350/ 173500 | consumed samples: 9817600 | consumed tokens: 20106444800 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.570436E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.922 | TFLOPs: 8.35 | 7: iteration 38360/ 173500 | consumed samples: 9820160 | consumed tokens: 20111687680 | elapsed time per iteration (s): 0.11 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 4.572792E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2359.024 | TFLOPs: 8.77 | 7: iteration 38370/ 173500 | consumed samples: 9822720 | consumed tokens: 20116930560 | elapsed time per iteration (s): 0.08 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.580538E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.481 | TFLOPs: 11.57 | 7: iteration 38380/ 173500 | consumed samples: 9825280 | consumed tokens: 20122173440 | elapsed time per iteration (s): 0.08 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.579768E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.044 | TFLOPs: 11.30 | 7: iteration 38390/ 173500 | consumed samples: 9827840 | consumed tokens: 20127416320 | elapsed time per iteration (s): 0.09 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.573479E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.883 | TFLOPs: 10.76 | 7: iteration 38400/ 173500 | consumed samples: 9830400 | consumed tokens: 20132659200 | elapsed time per iteration (s): 0.09 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.572030E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2829.652 | TFLOPs: 10.53 | 7: iteration 38410/ 173500 | consumed samples: 9832960 | consumed tokens: 20137902080 | elapsed time per iteration (s): 0.08 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.565349E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.532 | TFLOPs: 11.78 | 7: iteration 38420/ 173500 | consumed samples: 9835520 | consumed tokens: 20143144960 | elapsed time per iteration (s): 0.08 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.572047E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.922 | TFLOPs: 11.79 | 7: iteration 38430/ 173500 | consumed samples: 9838080 | consumed tokens: 20148387840 | elapsed time per iteration (s): 0.08 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.575074E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.598 | TFLOPs: 11.79 | 7: iteration 38440/ 173500 | consumed samples: 9840640 | consumed tokens: 20153630720 | elapsed time per iteration (s): 0.10 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.566440E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.285 | TFLOPs: 9.46 | 7: iteration 38450/ 173500 | consumed samples: 9843200 | consumed tokens: 20158873600 | elapsed time per iteration (s): 0.10 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.587084E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.767 | TFLOPs: 9.83 | 7: iteration 38460/ 173500 | consumed samples: 9845760 | consumed tokens: 20164116480 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.585450E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.333 | TFLOPs: 11.76 | 7: iteration 38470/ 173500 | consumed samples: 9848320 | consumed tokens: 20169359360 | elapsed time per iteration (s): 0.09 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.573994E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2717.750 | TFLOPs: 10.11 | 7: iteration 38480/ 173500 | consumed samples: 9850880 | consumed tokens: 20174602240 | elapsed time per iteration (s): 0.10 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.588672E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2535.711 | TFLOPs: 9.43 | 7: iteration 38490/ 173500 | consumed samples: 9853440 | consumed tokens: 20179845120 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.587169E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.875 | TFLOPs: 11.79 | 7: iteration 38500/ 173500 | consumed samples: 9856000 | consumed tokens: 20185088000 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.561810E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.096 | TFLOPs: 11.72 | 7: iteration 38510/ 173500 | consumed samples: 9858560 | consumed tokens: 20190330880 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.584668E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.988 | TFLOPs: 11.74 | 7: iteration 38520/ 173500 | consumed samples: 9861120 | consumed tokens: 20195573760 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.579897E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.724 | TFLOPs: 11.68 | 7: iteration 38530/ 173500 | consumed samples: 9863680 | consumed tokens: 20200816640 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.582677E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.540 | TFLOPs: 11.82 | 7: iteration 38540/ 173500 | consumed samples: 9866240 | consumed tokens: 20206059520 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.569740E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.384 | TFLOPs: 11.76 | 7: iteration 38550/ 173500 | consumed samples: 9868800 | consumed tokens: 20211302400 | elapsed time per iteration (s): 0.08 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 4.550993E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.023 | TFLOPs: 11.81 | 7: iteration 38560/ 173500 | consumed samples: 9871360 | consumed tokens: 20216545280 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.570420E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.409 | TFLOPs: 11.77 | 7: iteration 38570/ 173500 | consumed samples: 9873920 | consumed tokens: 20221788160 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.575095E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.629 | TFLOPs: 11.81 | 7: iteration 38580/ 173500 | consumed samples: 9876480 | consumed tokens: 20227031040 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.569590E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.479 | TFLOPs: 11.57 | 7: iteration 38590/ 173500 | consumed samples: 9879040 | consumed tokens: 20232273920 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.579887E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.859 | TFLOPs: 11.82 | 7: iteration 38600/ 173500 | consumed samples: 9881600 | consumed tokens: 20237516800 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.582932E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.809 | TFLOPs: 11.78 | 7: iteration 38610/ 173500 | consumed samples: 9884160 | consumed tokens: 20242759680 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.576030E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.507 | TFLOPs: 11.79 | 7: iteration 38620/ 173500 | consumed samples: 9886720 | consumed tokens: 20248002560 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.581778E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.115 | TFLOPs: 11.80 | 7: iteration 38630/ 173500 | consumed samples: 9889280 | consumed tokens: 20253245440 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.574066E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.400 | TFLOPs: 11.78 | 7: iteration 38640/ 173500 | consumed samples: 9891840 | consumed tokens: 20258488320 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.557443E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.289 | TFLOPs: 11.78 | 7: iteration 38650/ 173500 | consumed samples: 9894400 | consumed tokens: 20263731200 | elapsed time per iteration (s): 0.08 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 4.581050E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.929 | TFLOPs: 11.34 | 7: iteration 38660/ 173500 | consumed samples: 9896960 | consumed tokens: 20268974080 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.576716E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.296 | TFLOPs: 11.79 | 7: iteration 38670/ 173500 | consumed samples: 9899520 | consumed tokens: 20274216960 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.586806E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.759 | TFLOPs: 11.75 | 7: iteration 38680/ 173500 | consumed samples: 9902080 | consumed tokens: 20279459840 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.570332E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.401 | TFLOPs: 11.80 | 7: iteration 38690/ 173500 | consumed samples: 9904640 | consumed tokens: 20284702720 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.562835E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.364 | TFLOPs: 11.60 | 7: iteration 38700/ 173500 | consumed samples: 9907200 | consumed tokens: 20289945600 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.563247E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.152 | TFLOPs: 11.80 | 7: iteration 38710/ 173500 | consumed samples: 9909760 | consumed tokens: 20295188480 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.566884E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.673 | TFLOPs: 11.80 | 7: iteration 38720/ 173500 | consumed samples: 9912320 | consumed tokens: 20300431360 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.571332E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.669 | TFLOPs: 11.79 | 7: iteration 38730/ 173500 | consumed samples: 9914880 | consumed tokens: 20305674240 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.579977E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.731 | TFLOPs: 11.80 | 7: iteration 38740/ 173500 | consumed samples: 9917440 | consumed tokens: 20310917120 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.574595E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.516 | TFLOPs: 11.82 | 7: iteration 38750/ 173500 | consumed samples: 9920000 | consumed tokens: 20316160000 | elapsed time per iteration (s): 0.08 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 4.582572E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.858 | TFLOPs: 11.69 | 7: iteration 38760/ 173500 | consumed samples: 9922560 | consumed tokens: 20321402880 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.579266E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.252 | TFLOPs: 11.78 | 7: iteration 38770/ 173500 | consumed samples: 9925120 | consumed tokens: 20326645760 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.586096E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.575 | TFLOPs: 11.79 | 7: iteration 38780/ 173500 | consumed samples: 9927680 | consumed tokens: 20331888640 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.579934E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.982 | TFLOPs: 11.77 | 7: iteration 38790/ 173500 | consumed samples: 9930240 | consumed tokens: 20337131520 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.582284E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.837 | TFLOPs: 11.79 | 7: iteration 38800/ 173500 | consumed samples: 9932800 | consumed tokens: 20342374400 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.571298E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.379 | TFLOPs: 11.78 | 7: iteration 38810/ 173500 | consumed samples: 9935360 | consumed tokens: 20347617280 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.582664E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.870 | TFLOPs: 11.80 | 7: iteration 38820/ 173500 | consumed samples: 9937920 | consumed tokens: 20352860160 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.572200E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.418 | TFLOPs: 11.73 | 7: iteration 38830/ 173500 | consumed samples: 9940480 | consumed tokens: 20358103040 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.581326E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.143 | TFLOPs: 11.81 | 7: iteration 38840/ 173500 | consumed samples: 9943040 | consumed tokens: 20363345920 | elapsed time per iteration (s): 0.08 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 4.574751E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.297 | TFLOPs: 11.81 | 7: iteration 38850/ 173500 | consumed samples: 9945600 | consumed tokens: 20368588800 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.580449E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.518 | TFLOPs: 11.81 | 7: iteration 38860/ 173500 | consumed samples: 9948160 | consumed tokens: 20373831680 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.580899E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.203 | TFLOPs: 11.81 | 7: iteration 38870/ 173500 | consumed samples: 9950720 | consumed tokens: 20379074560 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.584867E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.469 | TFLOPs: 11.79 | 7: iteration 38880/ 173500 | consumed samples: 9953280 | consumed tokens: 20384317440 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.581965E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.353 | TFLOPs: 11.84 | 7: iteration 38890/ 173500 | consumed samples: 9955840 | consumed tokens: 20389560320 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.563310E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.509 | TFLOPs: 11.86 | 7: iteration 38900/ 173500 | consumed samples: 9958400 | consumed tokens: 20394803200 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.561249E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.939 | TFLOPs: 11.86 | 7: iteration 38910/ 173500 | consumed samples: 9960960 | consumed tokens: 20400046080 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.560760E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.826 | TFLOPs: 11.82 | 7: iteration 38920/ 173500 | consumed samples: 9963520 | consumed tokens: 20405288960 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.574355E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.140 | TFLOPs: 11.75 | 7: iteration 38930/ 173500 | consumed samples: 9966080 | consumed tokens: 20410531840 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.563745E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.216 | TFLOPs: 11.77 | 7: iteration 38940/ 173500 | consumed samples: 9968640 | consumed tokens: 20415774720 | elapsed time per iteration (s): 0.08 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 4.572639E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.748 | TFLOPs: 11.81 | 7: iteration 38950/ 173500 | consumed samples: 9971200 | consumed tokens: 20421017600 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.568977E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.143 | TFLOPs: 11.70 | 7: iteration 38960/ 173500 | consumed samples: 9973760 | consumed tokens: 20426260480 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.572060E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.686 | TFLOPs: 11.82 | 7: iteration 38970/ 173500 | consumed samples: 9976320 | consumed tokens: 20431503360 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.561655E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.004 | TFLOPs: 11.84 | 7: iteration 38980/ 173500 | consumed samples: 9978880 | consumed tokens: 20436746240 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.564951E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.559 | TFLOPs: 11.84 | 7: iteration 38990/ 173500 | consumed samples: 9981440 | consumed tokens: 20441989120 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.591200E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.304 | TFLOPs: 11.82 | 7: iteration 39000/ 173500 | consumed samples: 9984000 | consumed tokens: 20447232000 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.576700E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.419 | TFLOPs: 11.85 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 39000 | lm loss value: 4.468528E+00 | lm loss PPL: 8.722821E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 39000 to checkpoints_14m91b100m 0: [2023-03-17 01:12:50,601] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step39000 is begin to save! 0: [2023-03-17 01:12:50,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:12:50,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:12:50,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:12:50,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:12:50,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:12:50,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:12:50,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:12:50,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:12:50,640] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:12:50,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:12:50,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:12:50,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:12:50,644] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step39000/mp_rank_00_model_states.pt 0: [2023-03-17 01:12:50,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:12:50,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:12:50,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:12:50,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2023-03-17 01:12:50,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:12:50,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2023-03-17 01:12:50,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:12:50,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:12:50,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: successfully saved checkpoint at iteration 39000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 97.29 7: iteration 39010/ 173500 | consumed samples: 9986560 | consumed tokens: 20452474880 | elapsed time per iteration (s): 0.09 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.560307E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.788 | TFLOPs: 10.17 | 7: iteration 39020/ 173500 | consumed samples: 9989120 | consumed tokens: 20457717760 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.567909E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.655 | TFLOPs: 11.67 | 7: iteration 39030/ 173500 | consumed samples: 9991680 | consumed tokens: 20462960640 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.565569E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.057 | TFLOPs: 11.86 | 7: iteration 39040/ 173500 | consumed samples: 9994240 | consumed tokens: 20468203520 | elapsed time per iteration (s): 0.08 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 4.577228E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.338 | TFLOPs: 11.81 | 7: iteration 39050/ 173500 | consumed samples: 9996800 | consumed tokens: 20473446400 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.570785E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.807 | TFLOPs: 11.85 | 7: iteration 39060/ 173500 | consumed samples: 9999360 | consumed tokens: 20478689280 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.576108E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.258 | TFLOPs: 11.88 | 7: iteration 39070/ 173500 | consumed samples: 10001920 | consumed tokens: 20483932160 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.578226E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.875 | TFLOPs: 11.88 | 7: iteration 39080/ 173500 | consumed samples: 10004480 | consumed tokens: 20489175040 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.573277E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.383 | TFLOPs: 11.88 | 7: iteration 39090/ 173500 | consumed samples: 10007040 | consumed tokens: 20494417920 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.571898E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.413 | TFLOPs: 11.82 | 7: iteration 39100/ 173500 | consumed samples: 10009600 | consumed tokens: 20499660800 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.566236E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.513 | TFLOPs: 11.86 | 7: iteration 39110/ 173500 | consumed samples: 10012160 | consumed tokens: 20504903680 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.558520E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.381 | TFLOPs: 11.88 | 7: iteration 39120/ 173500 | consumed samples: 10014720 | consumed tokens: 20510146560 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.575090E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.427 | TFLOPs: 11.87 | 7: iteration 39130/ 173500 | consumed samples: 10017280 | consumed tokens: 20515389440 | elapsed time per iteration (s): 0.08 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 4.582700E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.473 | TFLOPs: 11.83 | 7: iteration 39140/ 173500 | consumed samples: 10019840 | consumed tokens: 20520632320 | elapsed time per iteration (s): 0.08 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.570082E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.570 | TFLOPs: 11.84 | 7: iteration 39150/ 173500 | consumed samples: 10022400 | consumed tokens: 20525875200 | elapsed time per iteration (s): 0.08 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.577352E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.780 | TFLOPs: 11.84 | 7: iteration 39160/ 173500 | consumed samples: 10024960 | consumed tokens: 20531118080 | elapsed time per iteration (s): 0.11 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.572789E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2235.754 | TFLOPs: 8.32 | 7: iteration 39170/ 173500 | consumed samples: 10027520 | consumed tokens: 20536360960 | elapsed time per iteration (s): 0.12 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.574016E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.211 | TFLOPs: 7.75 | 7: iteration 39180/ 173500 | consumed samples: 10030080 | consumed tokens: 20541603840 | elapsed time per iteration (s): 0.13 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.579428E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.477 | TFLOPs: 7.46 | 7: iteration 39190/ 173500 | consumed samples: 10032640 | consumed tokens: 20546846720 | elapsed time per iteration (s): 0.12 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.568803E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.272 | TFLOPs: 7.83 | 7: iteration 39200/ 173500 | consumed samples: 10035200 | consumed tokens: 20552089600 | elapsed time per iteration (s): 0.14 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.573997E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1871.800 | TFLOPs: 6.96 | 7: iteration 39210/ 173500 | consumed samples: 10037760 | consumed tokens: 20557332480 | elapsed time per iteration (s): 0.13 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.580402E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1932.731 | TFLOPs: 7.19 | 7: iteration 39220/ 173500 | consumed samples: 10040320 | consumed tokens: 20562575360 | elapsed time per iteration (s): 0.13 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.582175E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.276 | TFLOPs: 7.37 | 7: iteration 39230/ 173500 | consumed samples: 10042880 | consumed tokens: 20567818240 | elapsed time per iteration (s): 0.13 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 4.576332E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.318 | TFLOPs: 7.26 | 7: iteration 39240/ 173500 | consumed samples: 10045440 | consumed tokens: 20573061120 | elapsed time per iteration (s): 0.13 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.583422E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.238 | TFLOPs: 7.26 | 7: iteration 39250/ 173500 | consumed samples: 10048000 | consumed tokens: 20578304000 | elapsed time per iteration (s): 0.10 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.569258E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2605.181 | TFLOPs: 9.69 | 7: iteration 39260/ 173500 | consumed samples: 10050560 | consumed tokens: 20583546880 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.576546E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.966 | TFLOPs: 11.82 | 7: iteration 39270/ 173500 | consumed samples: 10053120 | consumed tokens: 20588789760 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.574435E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.276 | TFLOPs: 11.84 | 7: iteration 39280/ 173500 | consumed samples: 10055680 | consumed tokens: 20594032640 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.581441E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.643 | TFLOPs: 11.84 | 7: iteration 39290/ 173500 | consumed samples: 10058240 | consumed tokens: 20599275520 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.570298E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.550 | TFLOPs: 11.57 | 7: iteration 39300/ 173500 | consumed samples: 10060800 | consumed tokens: 20604518400 | elapsed time per iteration (s): 0.25 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.579670E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1013.043 | TFLOPs: 3.77 | 7: iteration 39310/ 173500 | consumed samples: 10063360 | consumed tokens: 20609761280 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.568076E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.044 | TFLOPs: 11.55 | 7: iteration 39320/ 173500 | consumed samples: 10065920 | consumed tokens: 20615004160 | elapsed time per iteration (s): 0.08 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 4.570706E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.886 | TFLOPs: 11.85 | 7: iteration 39330/ 173500 | consumed samples: 10068480 | consumed tokens: 20620247040 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.559571E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.838 | TFLOPs: 11.89 | 7: iteration 39340/ 173500 | consumed samples: 10071040 | consumed tokens: 20625489920 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.579033E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.981 | TFLOPs: 11.50 | 7: iteration 39350/ 173500 | consumed samples: 10073600 | consumed tokens: 20630732800 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.563638E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.885 | TFLOPs: 11.88 | 7: iteration 39360/ 173500 | consumed samples: 10076160 | consumed tokens: 20635975680 | elapsed time per iteration (s): 0.09 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.572520E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.226 | TFLOPs: 10.97 | 7: iteration 39370/ 173500 | consumed samples: 10078720 | consumed tokens: 20641218560 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.572203E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.518 | TFLOPs: 11.58 | 7: iteration 39380/ 173500 | consumed samples: 10081280 | consumed tokens: 20646461440 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.558564E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.621 | TFLOPs: 11.52 | 7: iteration 39390/ 173500 | consumed samples: 10083840 | consumed tokens: 20651704320 | elapsed time per iteration (s): 0.09 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.569482E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.983 | TFLOPs: 10.99 | 7: iteration 39400/ 173500 | consumed samples: 10086400 | consumed tokens: 20656947200 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.570375E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.813 | TFLOPs: 11.79 | 7: iteration 39410/ 173500 | consumed samples: 10088960 | consumed tokens: 20662190080 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.576842E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.800 | TFLOPs: 11.81 | 7: iteration 39420/ 173500 | consumed samples: 10091520 | consumed tokens: 20667432960 | elapsed time per iteration (s): 0.08 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 4.580490E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.206 | TFLOPs: 11.61 | 7: iteration 39430/ 173500 | consumed samples: 10094080 | consumed tokens: 20672675840 | elapsed time per iteration (s): 0.08 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.560672E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.634 | TFLOPs: 11.86 | 7: iteration 39440/ 173500 | consumed samples: 10096640 | consumed tokens: 20677918720 | elapsed time per iteration (s): 0.08 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.580136E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.079 | TFLOPs: 11.80 | 7: iteration 39450/ 173500 | consumed samples: 10099200 | consumed tokens: 20683161600 | elapsed time per iteration (s): 0.08 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.569362E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.072 | TFLOPs: 11.58 | 7: iteration 39460/ 173500 | consumed samples: 10101760 | consumed tokens: 20688404480 | elapsed time per iteration (s): 0.08 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.575473E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.375 | TFLOPs: 11.54 | 7: iteration 39470/ 173500 | consumed samples: 10104320 | consumed tokens: 20693647360 | elapsed time per iteration (s): 0.09 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.564054E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.289 | TFLOPs: 10.94 | 7: iteration 39480/ 173500 | consumed samples: 10106880 | consumed tokens: 20698890240 | elapsed time per iteration (s): 0.12 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.562573E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.514 | TFLOPs: 8.27 | 7: iteration 39490/ 173500 | consumed samples: 10109440 | consumed tokens: 20704133120 | elapsed time per iteration (s): 0.13 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.577857E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.100 | TFLOPs: 7.26 | 7: iteration 39500/ 173500 | consumed samples: 10112000 | consumed tokens: 20709376000 | elapsed time per iteration (s): 0.11 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.562459E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2239.595 | TFLOPs: 8.33 | 7: iteration 39510/ 173500 | consumed samples: 10114560 | consumed tokens: 20714618880 | elapsed time per iteration (s): 0.10 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 4.568963E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.510 | TFLOPs: 9.29 | 7: iteration 39520/ 173500 | consumed samples: 10117120 | consumed tokens: 20719861760 | elapsed time per iteration (s): 0.12 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.594748E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2184.297 | TFLOPs: 8.12 | 7: iteration 39530/ 173500 | consumed samples: 10119680 | consumed tokens: 20725104640 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.577196E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.162 | TFLOPs: 7.30 | 7: iteration 39540/ 173500 | consumed samples: 10122240 | consumed tokens: 20730347520 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.555149E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.068 | TFLOPs: 7.32 | 7: iteration 39550/ 173500 | consumed samples: 10124800 | consumed tokens: 20735590400 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.571800E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.306 | TFLOPs: 7.26 | 7: iteration 39560/ 173500 | consumed samples: 10127360 | consumed tokens: 20740833280 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.572810E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.371 | TFLOPs: 7.21 | 7: iteration 39570/ 173500 | consumed samples: 10129920 | consumed tokens: 20746076160 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.583610E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.487 | TFLOPs: 7.35 | 7: iteration 39580/ 173500 | consumed samples: 10132480 | consumed tokens: 20751319040 | elapsed time per iteration (s): 0.13 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.588008E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.148 | TFLOPs: 7.28 | 7: iteration 39590/ 173500 | consumed samples: 10135040 | consumed tokens: 20756561920 | elapsed time per iteration (s): 0.11 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.563155E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.873 | TFLOPs: 8.63 | 7: iteration 39600/ 173500 | consumed samples: 10137600 | consumed tokens: 20761804800 | elapsed time per iteration (s): 0.08 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.557655E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.955 | TFLOPs: 11.78 | 7: iteration 39610/ 173500 | consumed samples: 10140160 | consumed tokens: 20767047680 | elapsed time per iteration (s): 0.08 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 4.562200E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.171 | TFLOPs: 11.80 | 7: iteration 39620/ 173500 | consumed samples: 10142720 | consumed tokens: 20772290560 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.571822E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.615 | TFLOPs: 11.82 | 7: iteration 39630/ 173500 | consumed samples: 10145280 | consumed tokens: 20777533440 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.573876E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.739 | TFLOPs: 11.80 | 7: iteration 39640/ 173500 | consumed samples: 10147840 | consumed tokens: 20782776320 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.572806E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.220 | TFLOPs: 11.86 | 7: iteration 39650/ 173500 | consumed samples: 10150400 | consumed tokens: 20788019200 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.588534E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.302 | TFLOPs: 11.91 | 7: iteration 39660/ 173500 | consumed samples: 10152960 | consumed tokens: 20793262080 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.579087E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.855 | TFLOPs: 11.83 | 7: iteration 39670/ 173500 | consumed samples: 10155520 | consumed tokens: 20798504960 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.577076E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.040 | TFLOPs: 11.80 | 7: iteration 39680/ 173500 | consumed samples: 10158080 | consumed tokens: 20803747840 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.560479E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.900 | TFLOPs: 11.88 | 7: iteration 39690/ 173500 | consumed samples: 10160640 | consumed tokens: 20808990720 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.582571E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.335 | TFLOPs: 11.77 | 7: iteration 39700/ 173500 | consumed samples: 10163200 | consumed tokens: 20814233600 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.576499E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.100 | TFLOPs: 11.81 | 7: iteration 39710/ 173500 | consumed samples: 10165760 | consumed tokens: 20819476480 | elapsed time per iteration (s): 0.08 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 4.583336E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.326 | TFLOPs: 11.67 | 7: iteration 39720/ 173500 | consumed samples: 10168320 | consumed tokens: 20824719360 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.572997E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.324 | TFLOPs: 11.80 | 7: iteration 39730/ 173500 | consumed samples: 10170880 | consumed tokens: 20829962240 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.567614E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.834 | TFLOPs: 11.81 | 7: iteration 39740/ 173500 | consumed samples: 10173440 | consumed tokens: 20835205120 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.582552E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.974 | TFLOPs: 11.76 | 7: iteration 39750/ 173500 | consumed samples: 10176000 | consumed tokens: 20840448000 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.583472E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.714 | TFLOPs: 11.79 | 7: iteration 39760/ 173500 | consumed samples: 10178560 | consumed tokens: 20845690880 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.580009E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.387 | TFLOPs: 11.79 | 7: iteration 39770/ 173500 | consumed samples: 10181120 | consumed tokens: 20850933760 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.564748E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.719 | TFLOPs: 11.80 | 7: iteration 39780/ 173500 | consumed samples: 10183680 | consumed tokens: 20856176640 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.572495E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.527 | TFLOPs: 11.82 | 7: iteration 39790/ 173500 | consumed samples: 10186240 | consumed tokens: 20861419520 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.574318E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.928 | TFLOPs: 11.76 | 7: iteration 39800/ 173500 | consumed samples: 10188800 | consumed tokens: 20866662400 | elapsed time per iteration (s): 0.08 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 4.562470E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.173 | TFLOPs: 11.76 | 7: iteration 39810/ 173500 | consumed samples: 10191360 | consumed tokens: 20871905280 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.568715E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.647 | TFLOPs: 11.79 | 7: iteration 39820/ 173500 | consumed samples: 10193920 | consumed tokens: 20877148160 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.576530E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.201 | TFLOPs: 11.79 | 7: iteration 39830/ 173500 | consumed samples: 10196480 | consumed tokens: 20882391040 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.566146E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.723 | TFLOPs: 11.78 | 7: iteration 39840/ 173500 | consumed samples: 10199040 | consumed tokens: 20887633920 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.564395E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.247 | TFLOPs: 11.77 | 7: iteration 39850/ 173500 | consumed samples: 10201600 | consumed tokens: 20892876800 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.579778E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.503 | TFLOPs: 11.76 | 7: iteration 39860/ 173500 | consumed samples: 10204160 | consumed tokens: 20898119680 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.577324E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.434 | TFLOPs: 11.79 | 7: iteration 39870/ 173500 | consumed samples: 10206720 | consumed tokens: 20903362560 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.566631E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.910 | TFLOPs: 11.79 | 7: iteration 39880/ 173500 | consumed samples: 10209280 | consumed tokens: 20908605440 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.583242E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.795 | TFLOPs: 11.78 | 7: iteration 39890/ 173500 | consumed samples: 10211840 | consumed tokens: 20913848320 | elapsed time per iteration (s): 0.08 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 4.569049E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.380 | TFLOPs: 11.80 | 7: iteration 39900/ 173500 | consumed samples: 10214400 | consumed tokens: 20919091200 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.564524E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.705 | TFLOPs: 11.77 | 7: iteration 39910/ 173500 | consumed samples: 10216960 | consumed tokens: 20924334080 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.573863E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.657 | TFLOPs: 11.79 | 7: iteration 39920/ 173500 | consumed samples: 10219520 | consumed tokens: 20929576960 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.570584E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.824 | TFLOPs: 11.82 | 7: iteration 39930/ 173500 | consumed samples: 10222080 | consumed tokens: 20934819840 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.585148E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.072 | TFLOPs: 11.77 | 7: iteration 39940/ 173500 | consumed samples: 10224640 | consumed tokens: 20940062720 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.566301E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.662 | TFLOPs: 11.71 | 7: iteration 39950/ 173500 | consumed samples: 10227200 | consumed tokens: 20945305600 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.568869E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.194 | TFLOPs: 11.79 | 7: iteration 39960/ 173500 | consumed samples: 10229760 | consumed tokens: 20950548480 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.578044E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.249 | TFLOPs: 11.81 | 7: iteration 39970/ 173500 | consumed samples: 10232320 | consumed tokens: 20955791360 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.578884E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.140 | TFLOPs: 11.82 | 7: iteration 39980/ 173500 | consumed samples: 10234880 | consumed tokens: 20961034240 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.580440E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.558 | TFLOPs: 11.65 | 7: iteration 39990/ 173500 | consumed samples: 10237440 | consumed tokens: 20966277120 | elapsed time per iteration (s): 0.08 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 4.566743E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.896 | TFLOPs: 11.79 | 0: [2023-03-17 01:14:22,844] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=0, lr=[0.0001788435118675357, 0.0001788435118675357, 0.0001788435118675357], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 40000/ 173500 | consumed samples: 10240000 | consumed tokens: 20971520000 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.580972E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.218 | TFLOPs: 11.77 | 0: steps: 40000 loss: 4.6135 iter time (s): 0.087 samples/sec: 2936.262 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 40000 | lm loss value: 4.426629E+00 | lm loss PPL: 8.364893E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 40000 to checkpoints_14m91b100m 0: [2023-03-17 01:14:22,902] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step40000 is begin to save! 0: [2023-03-17 01:14:22,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:14:22,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:14:22,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:14:22,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:14:22,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:14:22,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:14:22,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:14:22,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:14:22,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:14:22,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:14:22,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:14:22,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:14:22,944] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step40000/mp_rank_00_model_states.pt 0: [2023-03-17 01:14:22,944] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:14:22,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:14:22,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:14:22,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2023-03-17 01:14:22,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2023-03-17 01:14:22,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:14:22,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: successfully saved checkpoint at iteration 40000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.32 7: iteration 40010/ 173500 | consumed samples: 10242560 | consumed tokens: 20976762880 | elapsed time per iteration (s): 0.09 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.571649E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.027 | TFLOPs: 10.16 | 7: iteration 40020/ 173500 | consumed samples: 10245120 | consumed tokens: 20982005760 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.579821E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.471 | TFLOPs: 11.79 | 7: iteration 40030/ 173500 | consumed samples: 10247680 | consumed tokens: 20987248640 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.571459E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.322 | TFLOPs: 11.70 | 7: iteration 40040/ 173500 | consumed samples: 10250240 | consumed tokens: 20992491520 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.564886E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.362 | TFLOPs: 11.77 | 7: iteration 40050/ 173500 | consumed samples: 10252800 | consumed tokens: 20997734400 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.563647E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.833 | TFLOPs: 11.73 | 7: iteration 40060/ 173500 | consumed samples: 10255360 | consumed tokens: 21002977280 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.561710E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.015 | TFLOPs: 11.80 | 7: iteration 40070/ 173500 | consumed samples: 10257920 | consumed tokens: 21008220160 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.575895E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.305 | TFLOPs: 11.78 | 7: iteration 40080/ 173500 | consumed samples: 10260480 | consumed tokens: 21013463040 | elapsed time per iteration (s): 0.08 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.568336E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.613 | TFLOPs: 11.74 | 7: iteration 40090/ 173500 | consumed samples: 10263040 | consumed tokens: 21018705920 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.577617E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.702 | TFLOPs: 11.69 | 7: iteration 40100/ 173500 | consumed samples: 10265600 | consumed tokens: 21023948800 | elapsed time per iteration (s): 0.09 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.559096E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.958 | TFLOPs: 11.03 | 7: iteration 40110/ 173500 | consumed samples: 10268160 | consumed tokens: 21029191680 | elapsed time per iteration (s): 0.12 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.578127E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2211.221 | TFLOPs: 8.22 | 7: iteration 40120/ 173500 | consumed samples: 10270720 | consumed tokens: 21034434560 | elapsed time per iteration (s): 0.10 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.570301E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2577.143 | TFLOPs: 9.59 | 7: iteration 40130/ 173500 | consumed samples: 10273280 | consumed tokens: 21039677440 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.574533E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.816 | TFLOPs: 11.68 | 7: iteration 40140/ 173500 | consumed samples: 10275840 | consumed tokens: 21044920320 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.577098E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.776 | TFLOPs: 11.75 | 7: iteration 40150/ 173500 | consumed samples: 10278400 | consumed tokens: 21050163200 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.572003E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.099 | TFLOPs: 11.78 | 7: iteration 40160/ 173500 | consumed samples: 10280960 | consumed tokens: 21055406080 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.567348E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.428 | TFLOPs: 11.77 | 7: iteration 40170/ 173500 | consumed samples: 10283520 | consumed tokens: 21060648960 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.579971E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.556 | TFLOPs: 11.74 | 7: iteration 40180/ 173500 | consumed samples: 10286080 | consumed tokens: 21065891840 | elapsed time per iteration (s): 0.08 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 4.579907E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.077 | TFLOPs: 11.76 | 7: iteration 40190/ 173500 | consumed samples: 10288640 | consumed tokens: 21071134720 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.582423E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.941 | TFLOPs: 11.82 | 7: iteration 40200/ 173500 | consumed samples: 10291200 | consumed tokens: 21076377600 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.573830E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.393 | TFLOPs: 11.67 | 7: iteration 40210/ 173500 | consumed samples: 10293760 | consumed tokens: 21081620480 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.583496E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.536 | TFLOPs: 11.80 | 7: iteration 40220/ 173500 | consumed samples: 10296320 | consumed tokens: 21086863360 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.569289E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.778 | TFLOPs: 11.81 | 7: iteration 40230/ 173500 | consumed samples: 10298880 | consumed tokens: 21092106240 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.571515E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.741 | TFLOPs: 11.78 | 7: iteration 40240/ 173500 | consumed samples: 10301440 | consumed tokens: 21097349120 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.573680E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.939 | TFLOPs: 11.70 | 7: iteration 40250/ 173500 | consumed samples: 10304000 | consumed tokens: 21102592000 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.583538E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.319 | TFLOPs: 11.74 | 7: iteration 40260/ 173500 | consumed samples: 10306560 | consumed tokens: 21107834880 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.569759E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.521 | TFLOPs: 11.64 | 7: iteration 40270/ 173500 | consumed samples: 10309120 | consumed tokens: 21113077760 | elapsed time per iteration (s): 0.08 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 4.570039E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.505 | TFLOPs: 11.74 | 7: iteration 40280/ 173500 | consumed samples: 10311680 | consumed tokens: 21118320640 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.587669E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.981 | TFLOPs: 11.75 | 7: iteration 40290/ 173500 | consumed samples: 10314240 | consumed tokens: 21123563520 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.579513E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.015 | TFLOPs: 11.78 | 7: iteration 40300/ 173500 | consumed samples: 10316800 | consumed tokens: 21128806400 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.575845E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.076 | TFLOPs: 11.70 | 7: iteration 40310/ 173500 | consumed samples: 10319360 | consumed tokens: 21134049280 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.565241E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.719 | TFLOPs: 11.70 | 7: iteration 40320/ 173500 | consumed samples: 10321920 | consumed tokens: 21139292160 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.577752E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.307 | TFLOPs: 11.78 | 7: iteration 40330/ 173500 | consumed samples: 10324480 | consumed tokens: 21144535040 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.567970E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.922 | TFLOPs: 11.78 | 7: iteration 40340/ 173500 | consumed samples: 10327040 | consumed tokens: 21149777920 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.570779E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.708 | TFLOPs: 11.75 | 7: iteration 40350/ 173500 | consumed samples: 10329600 | consumed tokens: 21155020800 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.575644E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.195 | TFLOPs: 11.75 | 7: iteration 40360/ 173500 | consumed samples: 10332160 | consumed tokens: 21160263680 | elapsed time per iteration (s): 0.08 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 4.578300E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.554 | TFLOPs: 11.77 | 7: iteration 40370/ 173500 | consumed samples: 10334720 | consumed tokens: 21165506560 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.570118E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.959 | TFLOPs: 11.77 | 7: iteration 40380/ 173500 | consumed samples: 10337280 | consumed tokens: 21170749440 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.569725E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.559 | TFLOPs: 11.77 | 7: iteration 40390/ 173500 | consumed samples: 10339840 | consumed tokens: 21175992320 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.560814E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.687 | TFLOPs: 11.44 | 7: iteration 40400/ 173500 | consumed samples: 10342400 | consumed tokens: 21181235200 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.578600E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.481 | TFLOPs: 11.69 | 7: iteration 40410/ 173500 | consumed samples: 10344960 | consumed tokens: 21186478080 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.566942E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.613 | TFLOPs: 11.78 | 7: iteration 40420/ 173500 | consumed samples: 10347520 | consumed tokens: 21191720960 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.588531E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.884 | TFLOPs: 11.78 | 7: iteration 40430/ 173500 | consumed samples: 10350080 | consumed tokens: 21196963840 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.576704E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.498 | TFLOPs: 11.85 | 7: iteration 40440/ 173500 | consumed samples: 10352640 | consumed tokens: 21202206720 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.566758E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.827 | TFLOPs: 11.90 | 7: iteration 40450/ 173500 | consumed samples: 10355200 | consumed tokens: 21207449600 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.568266E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.011 | TFLOPs: 11.92 | 7: iteration 40460/ 173500 | consumed samples: 10357760 | consumed tokens: 21212692480 | elapsed time per iteration (s): 0.08 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 4.577145E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.611 | TFLOPs: 11.93 | 7: iteration 40470/ 173500 | consumed samples: 10360320 | consumed tokens: 21217935360 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.567878E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.560 | TFLOPs: 11.86 | 7: iteration 40480/ 173500 | consumed samples: 10362880 | consumed tokens: 21223178240 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.574977E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.373 | TFLOPs: 11.89 | 7: iteration 40490/ 173500 | consumed samples: 10365440 | consumed tokens: 21228421120 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.574087E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.294 | TFLOPs: 11.89 | 7: iteration 40500/ 173500 | consumed samples: 10368000 | consumed tokens: 21233664000 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.568641E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3042.070 | TFLOPs: 11.32 | 7: iteration 40510/ 173500 | consumed samples: 10370560 | consumed tokens: 21238906880 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.566763E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.248 | TFLOPs: 11.88 | 7: iteration 40520/ 173500 | consumed samples: 10373120 | consumed tokens: 21244149760 | elapsed time per iteration (s): 0.11 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.575065E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2347.756 | TFLOPs: 8.73 | 7: iteration 40530/ 173500 | consumed samples: 10375680 | consumed tokens: 21249392640 | elapsed time per iteration (s): 0.11 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.566131E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2309.282 | TFLOPs: 8.59 | 7: iteration 40540/ 173500 | consumed samples: 10378240 | consumed tokens: 21254635520 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.574872E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.955 | TFLOPs: 11.76 | 7: iteration 40550/ 173500 | consumed samples: 10380800 | consumed tokens: 21259878400 | elapsed time per iteration (s): 0.08 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 4.559196E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.910 | TFLOPs: 11.87 | 7: iteration 40560/ 173500 | consumed samples: 10383360 | consumed tokens: 21265121280 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.575392E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.527 | TFLOPs: 11.86 | 7: iteration 40570/ 173500 | consumed samples: 10385920 | consumed tokens: 21270364160 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.563602E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.309 | TFLOPs: 11.78 | 7: iteration 40580/ 173500 | consumed samples: 10388480 | consumed tokens: 21275607040 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.562049E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.057 | TFLOPs: 11.86 | 7: iteration 40590/ 173500 | consumed samples: 10391040 | consumed tokens: 21280849920 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.586725E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.204 | TFLOPs: 11.87 | 7: iteration 40600/ 173500 | consumed samples: 10393600 | consumed tokens: 21286092800 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.571720E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.908 | TFLOPs: 11.82 | 7: iteration 40610/ 173500 | consumed samples: 10396160 | consumed tokens: 21291335680 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.566449E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.832 | TFLOPs: 11.85 | 7: iteration 40620/ 173500 | consumed samples: 10398720 | consumed tokens: 21296578560 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.563372E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.894 | TFLOPs: 11.82 | 7: iteration 40630/ 173500 | consumed samples: 10401280 | consumed tokens: 21301821440 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.568543E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.804 | TFLOPs: 11.86 | 7: iteration 40640/ 173500 | consumed samples: 10403840 | consumed tokens: 21307064320 | elapsed time per iteration (s): 0.08 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 4.567673E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.963 | TFLOPs: 11.78 | 7: iteration 40650/ 173500 | consumed samples: 10406400 | consumed tokens: 21312307200 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.568359E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.430 | TFLOPs: 11.84 | 7: iteration 40660/ 173500 | consumed samples: 10408960 | consumed tokens: 21317550080 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.567274E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.514 | TFLOPs: 11.81 | 7: iteration 40670/ 173500 | consumed samples: 10411520 | consumed tokens: 21322792960 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.578656E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.175 | TFLOPs: 11.42 | 7: iteration 40680/ 173500 | consumed samples: 10414080 | consumed tokens: 21328035840 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.558549E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.011 | TFLOPs: 11.85 | 7: iteration 40690/ 173500 | consumed samples: 10416640 | consumed tokens: 21333278720 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.567487E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.023 | TFLOPs: 11.79 | 7: iteration 40700/ 173500 | consumed samples: 10419200 | consumed tokens: 21338521600 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.579766E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.308 | TFLOPs: 11.87 | 7: iteration 40710/ 173500 | consumed samples: 10421760 | consumed tokens: 21343764480 | elapsed time per iteration (s): 0.10 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.566274E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.842 | TFLOPs: 9.57 | 7: iteration 40720/ 173500 | consumed samples: 10424320 | consumed tokens: 21349007360 | elapsed time per iteration (s): 0.11 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.559628E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.181 | TFLOPs: 8.85 | 7: iteration 40730/ 173500 | consumed samples: 10426880 | consumed tokens: 21354250240 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.563963E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.190 | TFLOPs: 11.81 | 7: iteration 40740/ 173500 | consumed samples: 10429440 | consumed tokens: 21359493120 | elapsed time per iteration (s): 0.08 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 4.580947E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.160 | TFLOPs: 11.78 | 7: iteration 40750/ 173500 | consumed samples: 10432000 | consumed tokens: 21364736000 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.576595E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.633 | TFLOPs: 11.84 | 7: iteration 40760/ 173500 | consumed samples: 10434560 | consumed tokens: 21369978880 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.571701E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.669 | TFLOPs: 11.82 | 7: iteration 40770/ 173500 | consumed samples: 10437120 | consumed tokens: 21375221760 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.567792E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.224 | TFLOPs: 11.85 | 7: iteration 40780/ 173500 | consumed samples: 10439680 | consumed tokens: 21380464640 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.565459E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.637 | TFLOPs: 11.85 | 7: iteration 40790/ 173500 | consumed samples: 10442240 | consumed tokens: 21385707520 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.576274E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.465 | TFLOPs: 11.76 | 7: iteration 40800/ 173500 | consumed samples: 10444800 | consumed tokens: 21390950400 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.575595E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.043 | TFLOPs: 11.87 | 7: iteration 40810/ 173500 | consumed samples: 10447360 | consumed tokens: 21396193280 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.569746E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.067 | TFLOPs: 11.85 | 7: iteration 40820/ 173500 | consumed samples: 10449920 | consumed tokens: 21401436160 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.585397E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.327 | TFLOPs: 11.88 | 7: iteration 40830/ 173500 | consumed samples: 10452480 | consumed tokens: 21406679040 | elapsed time per iteration (s): 0.08 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 4.566190E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.384 | TFLOPs: 11.87 | 7: iteration 40840/ 173500 | consumed samples: 10455040 | consumed tokens: 21411921920 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.581808E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.314 | TFLOPs: 11.85 | 7: iteration 40850/ 173500 | consumed samples: 10457600 | consumed tokens: 21417164800 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.570240E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.034 | TFLOPs: 11.88 | 7: iteration 40860/ 173500 | consumed samples: 10460160 | consumed tokens: 21422407680 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.575606E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.781 | TFLOPs: 11.87 | 7: iteration 40870/ 173500 | consumed samples: 10462720 | consumed tokens: 21427650560 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.571767E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.331 | TFLOPs: 11.89 | 7: iteration 40880/ 173500 | consumed samples: 10465280 | consumed tokens: 21432893440 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.573777E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.836 | TFLOPs: 11.88 | 7: iteration 40890/ 173500 | consumed samples: 10467840 | consumed tokens: 21438136320 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.586818E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.345 | TFLOPs: 11.87 | 7: iteration 40900/ 173500 | consumed samples: 10470400 | consumed tokens: 21443379200 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.567990E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.142 | TFLOPs: 11.84 | 7: iteration 40910/ 173500 | consumed samples: 10472960 | consumed tokens: 21448622080 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.557000E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.799 | TFLOPs: 11.85 | 7: iteration 40920/ 173500 | consumed samples: 10475520 | consumed tokens: 21453864960 | elapsed time per iteration (s): 0.08 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 4.565733E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.232 | TFLOPs: 11.81 | 7: iteration 40930/ 173500 | consumed samples: 10478080 | consumed tokens: 21459107840 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.569198E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.562 | TFLOPs: 11.85 | 7: iteration 40940/ 173500 | consumed samples: 10480640 | consumed tokens: 21464350720 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.571737E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.794 | TFLOPs: 11.89 | 7: iteration 40950/ 173500 | consumed samples: 10483200 | consumed tokens: 21469593600 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.577145E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.890 | TFLOPs: 11.81 | 7: iteration 40960/ 173500 | consumed samples: 10485760 | consumed tokens: 21474836480 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.583790E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.422 | TFLOPs: 11.87 | 7: iteration 40970/ 173500 | consumed samples: 10488320 | consumed tokens: 21480079360 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.570778E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.925 | TFLOPs: 11.86 | 7: iteration 40980/ 173500 | consumed samples: 10490880 | consumed tokens: 21485322240 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.570209E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.601 | TFLOPs: 11.74 | 7: iteration 40990/ 173500 | consumed samples: 10493440 | consumed tokens: 21490565120 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.567913E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.356 | TFLOPs: 11.78 | 7: iteration 41000/ 173500 | consumed samples: 10496000 | consumed tokens: 21495808000 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.563453E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.039 | TFLOPs: 11.80 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 41000 | lm loss value: 4.425120E+00 | lm loss PPL: 8.352282E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 41000 to checkpoints_14m91b100m 0: [2023-03-17 01:15:45,402] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step41000 is begin to save! 0: [2023-03-17 01:15:45,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:15:45,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:15:45,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:15:45,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:15:45,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:15:45,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:15:45,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:15:45,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:15:45,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:15:45,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:15:45,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:15:45,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:15:45,444] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step41000/mp_rank_00_model_states.pt 0: [2023-03-17 01:15:45,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:15:45,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:15:45,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:15:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:15:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:15:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:15:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: successfully saved checkpoint at iteration 41000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.28 7: iteration 41010/ 173500 | consumed samples: 10498560 | consumed tokens: 21501050880 | elapsed time per iteration (s): 0.09 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.567889E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.140 | TFLOPs: 10.31 | 7: iteration 41020/ 173500 | consumed samples: 10501120 | consumed tokens: 21506293760 | elapsed time per iteration (s): 0.08 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 4.557847E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.209 | TFLOPs: 11.82 | 7: iteration 41030/ 173500 | consumed samples: 10503680 | consumed tokens: 21511536640 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.585795E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.043 | TFLOPs: 11.83 | 7: iteration 41040/ 173500 | consumed samples: 10506240 | consumed tokens: 21516779520 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.567142E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.854 | TFLOPs: 11.84 | 7: iteration 41050/ 173500 | consumed samples: 10508800 | consumed tokens: 21522022400 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.577631E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.306 | TFLOPs: 11.76 | 7: iteration 41060/ 173500 | consumed samples: 10511360 | consumed tokens: 21527265280 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.569773E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.062 | TFLOPs: 11.87 | 7: iteration 41070/ 173500 | consumed samples: 10513920 | consumed tokens: 21532508160 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.574247E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.592 | TFLOPs: 11.91 | 7: iteration 41080/ 173500 | consumed samples: 10516480 | consumed tokens: 21537751040 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.582388E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.782 | TFLOPs: 11.88 | 7: iteration 41090/ 173500 | consumed samples: 10519040 | consumed tokens: 21542993920 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.570676E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.123 | TFLOPs: 11.88 | 7: iteration 41100/ 173500 | consumed samples: 10521600 | consumed tokens: 21548236800 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.548020E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.259 | TFLOPs: 11.89 | 7: iteration 41110/ 173500 | consumed samples: 10524160 | consumed tokens: 21553479680 | elapsed time per iteration (s): 0.08 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 4.572789E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.017 | TFLOPs: 11.87 | 7: iteration 41120/ 173500 | consumed samples: 10526720 | consumed tokens: 21558722560 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.570268E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.299 | TFLOPs: 11.84 | 7: iteration 41130/ 173500 | consumed samples: 10529280 | consumed tokens: 21563965440 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.564602E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.373 | TFLOPs: 11.80 | 7: iteration 41140/ 173500 | consumed samples: 10531840 | consumed tokens: 21569208320 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.573969E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.072 | TFLOPs: 11.88 | 7: iteration 41150/ 173500 | consumed samples: 10534400 | consumed tokens: 21574451200 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.564417E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.831 | TFLOPs: 11.85 | 7: iteration 41160/ 173500 | consumed samples: 10536960 | consumed tokens: 21579694080 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.590532E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.662 | TFLOPs: 11.90 | 7: iteration 41170/ 173500 | consumed samples: 10539520 | consumed tokens: 21584936960 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.573340E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.443 | TFLOPs: 11.88 | 7: iteration 41180/ 173500 | consumed samples: 10542080 | consumed tokens: 21590179840 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.578421E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.108 | TFLOPs: 11.90 | 7: iteration 41190/ 173500 | consumed samples: 10544640 | consumed tokens: 21595422720 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.592295E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.889 | TFLOPs: 11.91 | 7: iteration 41200/ 173500 | consumed samples: 10547200 | consumed tokens: 21600665600 | elapsed time per iteration (s): 0.08 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 4.571625E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.918 | TFLOPs: 11.63 | 7: iteration 41210/ 173500 | consumed samples: 10549760 | consumed tokens: 21605908480 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.566183E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.693 | TFLOPs: 11.78 | 7: iteration 41220/ 173500 | consumed samples: 10552320 | consumed tokens: 21611151360 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.578504E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.705 | TFLOPs: 11.83 | 7: iteration 41230/ 173500 | consumed samples: 10554880 | consumed tokens: 21616394240 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.580561E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.518 | TFLOPs: 11.79 | 7: iteration 41240/ 173500 | consumed samples: 10557440 | consumed tokens: 21621637120 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.570884E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.237 | TFLOPs: 11.81 | 7: iteration 41250/ 173500 | consumed samples: 10560000 | consumed tokens: 21626880000 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.571016E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.936 | TFLOPs: 11.85 | 7: iteration 41260/ 173500 | consumed samples: 10562560 | consumed tokens: 21632122880 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.575563E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.726 | TFLOPs: 11.81 | 7: iteration 41270/ 173500 | consumed samples: 10565120 | consumed tokens: 21637365760 | elapsed time per iteration (s): 0.08 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.581714E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.503 | TFLOPs: 11.85 | 7: iteration 41280/ 173500 | consumed samples: 10567680 | consumed tokens: 21642608640 | elapsed time per iteration (s): 0.09 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.586593E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.371 | TFLOPs: 10.39 | 7: iteration 41290/ 173500 | consumed samples: 10570240 | consumed tokens: 21647851520 | elapsed time per iteration (s): 0.10 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 4.571095E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2682.844 | TFLOPs: 9.98 | 7: iteration 41300/ 173500 | consumed samples: 10572800 | consumed tokens: 21653094400 | elapsed time per iteration (s): 0.09 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.566950E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.027 | TFLOPs: 10.14 | 7: iteration 41310/ 173500 | consumed samples: 10575360 | consumed tokens: 21658337280 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.572931E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.285 | TFLOPs: 11.67 | 7: iteration 41320/ 173500 | consumed samples: 10577920 | consumed tokens: 21663580160 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.572549E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.796 | TFLOPs: 11.79 | 7: iteration 41330/ 173500 | consumed samples: 10580480 | consumed tokens: 21668823040 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.560106E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.089 | TFLOPs: 11.84 | 7: iteration 41340/ 173500 | consumed samples: 10583040 | consumed tokens: 21674065920 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.570432E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.120 | TFLOPs: 11.83 | 7: iteration 41350/ 173500 | consumed samples: 10585600 | consumed tokens: 21679308800 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.582286E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.422 | TFLOPs: 11.86 | 7: iteration 41360/ 173500 | consumed samples: 10588160 | consumed tokens: 21684551680 | elapsed time per iteration (s): 0.12 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.567104E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2139.194 | TFLOPs: 7.96 | 7: iteration 41370/ 173500 | consumed samples: 10590720 | consumed tokens: 21689794560 | elapsed time per iteration (s): 0.09 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.577307E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.315 | TFLOPs: 10.61 | 7: iteration 41380/ 173500 | consumed samples: 10593280 | consumed tokens: 21695037440 | elapsed time per iteration (s): 0.08 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 4.570639E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.577 | TFLOPs: 11.75 | 7: iteration 41390/ 173500 | consumed samples: 10595840 | consumed tokens: 21700280320 | elapsed time per iteration (s): 0.09 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.576883E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.035 | TFLOPs: 10.53 | 7: iteration 41400/ 173500 | consumed samples: 10598400 | consumed tokens: 21705523200 | elapsed time per iteration (s): 0.09 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.574928E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.681 | TFLOPs: 10.28 | 7: iteration 41410/ 173500 | consumed samples: 10600960 | consumed tokens: 21710766080 | elapsed time per iteration (s): 0.08 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.565588E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.629 | TFLOPs: 11.90 | 7: iteration 41420/ 173500 | consumed samples: 10603520 | consumed tokens: 21716008960 | elapsed time per iteration (s): 0.08 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.585588E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.472 | TFLOPs: 11.92 | 7: iteration 41430/ 173500 | consumed samples: 10606080 | consumed tokens: 21721251840 | elapsed time per iteration (s): 0.08 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.565567E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.059 | TFLOPs: 11.91 | 7: iteration 41440/ 173500 | consumed samples: 10608640 | consumed tokens: 21726494720 | elapsed time per iteration (s): 0.08 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.572083E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.946 | TFLOPs: 11.98 | 7: iteration 41450/ 173500 | consumed samples: 10611200 | consumed tokens: 21731737600 | elapsed time per iteration (s): 0.12 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.568685E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.155 | TFLOPs: 8.21 | 7: iteration 41460/ 173500 | consumed samples: 10613760 | consumed tokens: 21736980480 | elapsed time per iteration (s): 0.09 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.563348E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.787 | TFLOPs: 11.13 | 7: iteration 41470/ 173500 | consumed samples: 10616320 | consumed tokens: 21742223360 | elapsed time per iteration (s): 0.11 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 4.574311E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.242 | TFLOPs: 8.85 | 7: iteration 41480/ 173500 | consumed samples: 10618880 | consumed tokens: 21747466240 | elapsed time per iteration (s): 0.09 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.573161E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.808 | TFLOPs: 10.69 | 7: iteration 41490/ 173500 | consumed samples: 10621440 | consumed tokens: 21752709120 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.571707E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.006 | TFLOPs: 11.79 | 7: iteration 41500/ 173500 | consumed samples: 10624000 | consumed tokens: 21757952000 | elapsed time per iteration (s): 0.10 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.572772E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2656.735 | TFLOPs: 9.88 | 7: iteration 41510/ 173500 | consumed samples: 10626560 | consumed tokens: 21763194880 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.575101E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.692 | TFLOPs: 11.85 | 7: iteration 41520/ 173500 | consumed samples: 10629120 | consumed tokens: 21768437760 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.583715E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.571 | TFLOPs: 11.90 | 7: iteration 41530/ 173500 | consumed samples: 10631680 | consumed tokens: 21773680640 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.569259E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.225 | TFLOPs: 11.77 | 7: iteration 41540/ 173500 | consumed samples: 10634240 | consumed tokens: 21778923520 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.579358E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.804 | TFLOPs: 11.91 | 7: iteration 41550/ 173500 | consumed samples: 10636800 | consumed tokens: 21784166400 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.574670E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.724 | TFLOPs: 11.94 | 7: iteration 41560/ 173500 | consumed samples: 10639360 | consumed tokens: 21789409280 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.579538E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.806 | TFLOPs: 11.88 | 7: iteration 41570/ 173500 | consumed samples: 10641920 | consumed tokens: 21794652160 | elapsed time per iteration (s): 0.08 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 4.583216E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.578 | TFLOPs: 11.87 | 7: iteration 41580/ 173500 | consumed samples: 10644480 | consumed tokens: 21799895040 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.568400E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.782 | TFLOPs: 11.63 | 7: iteration 41590/ 173500 | consumed samples: 10647040 | consumed tokens: 21805137920 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.563557E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.694 | TFLOPs: 11.94 | 7: iteration 41600/ 173500 | consumed samples: 10649600 | consumed tokens: 21810380800 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.568917E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.874 | TFLOPs: 11.92 | 7: iteration 41610/ 173500 | consumed samples: 10652160 | consumed tokens: 21815623680 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.574954E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.537 | TFLOPs: 11.51 | 7: iteration 41620/ 173500 | consumed samples: 10654720 | consumed tokens: 21820866560 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.578455E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.054 | TFLOPs: 11.93 | 7: iteration 41630/ 173500 | consumed samples: 10657280 | consumed tokens: 21826109440 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.570084E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.879 | TFLOPs: 11.63 | 7: iteration 41640/ 173500 | consumed samples: 10659840 | consumed tokens: 21831352320 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.572168E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.662 | TFLOPs: 11.87 | 7: iteration 41650/ 173500 | consumed samples: 10662400 | consumed tokens: 21836595200 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.573117E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.278 | TFLOPs: 11.92 | 7: iteration 41660/ 173500 | consumed samples: 10664960 | consumed tokens: 21841838080 | elapsed time per iteration (s): 0.08 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.570765E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.054 | TFLOPs: 11.92 | 7: iteration 41670/ 173500 | consumed samples: 10667520 | consumed tokens: 21847080960 | elapsed time per iteration (s): 0.09 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.571426E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.116 | TFLOPs: 10.86 | 7: iteration 41680/ 173500 | consumed samples: 10670080 | consumed tokens: 21852323840 | elapsed time per iteration (s): 0.13 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.573153E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.831 | TFLOPs: 7.42 | 7: iteration 41690/ 173500 | consumed samples: 10672640 | consumed tokens: 21857566720 | elapsed time per iteration (s): 0.13 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.563183E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.367 | TFLOPs: 7.21 | 7: iteration 41700/ 173500 | consumed samples: 10675200 | consumed tokens: 21862809600 | elapsed time per iteration (s): 0.13 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.564991E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2019.078 | TFLOPs: 7.51 | 7: iteration 41710/ 173500 | consumed samples: 10677760 | consumed tokens: 21868052480 | elapsed time per iteration (s): 0.12 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.576908E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2202.033 | TFLOPs: 8.19 | 7: iteration 41720/ 173500 | consumed samples: 10680320 | consumed tokens: 21873295360 | elapsed time per iteration (s): 0.12 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.568961E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2062.013 | TFLOPs: 7.67 | 7: iteration 41730/ 173500 | consumed samples: 10682880 | consumed tokens: 21878538240 | elapsed time per iteration (s): 0.12 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.563046E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2170.795 | TFLOPs: 8.07 | 7: iteration 41740/ 173500 | consumed samples: 10685440 | consumed tokens: 21883781120 | elapsed time per iteration (s): 0.08 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.573987E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.184 | TFLOPs: 11.82 | 7: iteration 41750/ 173500 | consumed samples: 10688000 | consumed tokens: 21889024000 | elapsed time per iteration (s): 0.08 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 4.568218E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.751 | TFLOPs: 11.86 | 7: iteration 41760/ 173500 | consumed samples: 10690560 | consumed tokens: 21894266880 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.571027E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.078 | TFLOPs: 11.88 | 7: iteration 41770/ 173500 | consumed samples: 10693120 | consumed tokens: 21899509760 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.569723E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.463 | TFLOPs: 11.84 | 7: iteration 41780/ 173500 | consumed samples: 10695680 | consumed tokens: 21904752640 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.585177E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.060 | TFLOPs: 11.84 | 7: iteration 41790/ 173500 | consumed samples: 10698240 | consumed tokens: 21909995520 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.580467E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.681 | TFLOPs: 11.87 | 7: iteration 41800/ 173500 | consumed samples: 10700800 | consumed tokens: 21915238400 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.565843E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.225 | TFLOPs: 11.85 | 7: iteration 41810/ 173500 | consumed samples: 10703360 | consumed tokens: 21920481280 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.578025E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.364 | TFLOPs: 11.87 | 7: iteration 41820/ 173500 | consumed samples: 10705920 | consumed tokens: 21925724160 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.575843E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.142 | TFLOPs: 11.84 | 7: iteration 41830/ 173500 | consumed samples: 10708480 | consumed tokens: 21930967040 | elapsed time per iteration (s): 0.08 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.575750E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.014 | TFLOPs: 11.80 | 7: iteration 41840/ 173500 | consumed samples: 10711040 | consumed tokens: 21936209920 | elapsed time per iteration (s): 0.10 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 4.578990E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2455.684 | TFLOPs: 9.13 | 7: iteration 41850/ 173500 | consumed samples: 10713600 | consumed tokens: 21941452800 | elapsed time per iteration (s): 0.09 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.562329E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.487 | TFLOPs: 10.04 | 7: iteration 41860/ 173500 | consumed samples: 10716160 | consumed tokens: 21946695680 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.573521E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.776 | TFLOPs: 11.83 | 7: iteration 41870/ 173500 | consumed samples: 10718720 | consumed tokens: 21951938560 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.581541E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.344 | TFLOPs: 11.83 | 7: iteration 41880/ 173500 | consumed samples: 10721280 | consumed tokens: 21957181440 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.570485E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.451 | TFLOPs: 11.84 | 7: iteration 41890/ 173500 | consumed samples: 10723840 | consumed tokens: 21962424320 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.572322E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.916 | TFLOPs: 11.82 | 7: iteration 41900/ 173500 | consumed samples: 10726400 | consumed tokens: 21967667200 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.569946E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.586 | TFLOPs: 11.71 | 7: iteration 41910/ 173500 | consumed samples: 10728960 | consumed tokens: 21972910080 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.574714E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.575 | TFLOPs: 11.74 | 7: iteration 41920/ 173500 | consumed samples: 10731520 | consumed tokens: 21978152960 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.570298E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.266 | TFLOPs: 11.83 | 7: iteration 41930/ 173500 | consumed samples: 10734080 | consumed tokens: 21983395840 | elapsed time per iteration (s): 0.08 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 4.579659E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.837 | TFLOPs: 11.69 | 7: iteration 41940/ 173500 | consumed samples: 10736640 | consumed tokens: 21988638720 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.576209E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.343 | TFLOPs: 11.85 | 7: iteration 41950/ 173500 | consumed samples: 10739200 | consumed tokens: 21993881600 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.566560E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.120 | TFLOPs: 11.84 | 7: iteration 41960/ 173500 | consumed samples: 10741760 | consumed tokens: 21999124480 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.576076E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.321 | TFLOPs: 11.87 | 7: iteration 41970/ 173500 | consumed samples: 10744320 | consumed tokens: 22004367360 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.581363E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.641 | TFLOPs: 11.58 | 7: iteration 41980/ 173500 | consumed samples: 10746880 | consumed tokens: 22009610240 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.561075E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.900 | TFLOPs: 11.88 | 7: iteration 41990/ 173500 | consumed samples: 10749440 | consumed tokens: 22014853120 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.578967E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.162 | TFLOPs: 11.84 | 0: [2023-03-17 01:17:11,065] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=0, lr=[0.00017667737143212697, 0.00017667737143212697, 0.00017667737143212697], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 42000/ 173500 | consumed samples: 10752000 | consumed tokens: 22020096000 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.565640E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.345 | TFLOPs: 11.85 | 0: steps: 42000 loss: 4.5650 iter time (s): 0.083 samples/sec: 3093.666 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 42000 | lm loss value: 4.479606E+00 | lm loss PPL: 8.819989E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 42000 to checkpoints_14m91b100m 0: [2023-03-17 01:17:11,130] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step42000 is begin to save! 0: [2023-03-17 01:17:11,134] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:17:11,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:17:11,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:17:11,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:17:11,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:17:11,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:17:11,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:17:11,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:17:11,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:17:11,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:17:11,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:17:11,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:17:11,181] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step42000/mp_rank_00_model_states.pt 0: [2023-03-17 01:17:11,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:17:11,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:17:11,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:17:11,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:17:11,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:17:11,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:17:11,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2023-03-17 01:17:11,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2023-03-17 01:17:11,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: successfully saved checkpoint at iteration 42000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 87.09 7: iteration 42010/ 173500 | consumed samples: 10754560 | consumed tokens: 22025338880 | elapsed time per iteration (s): 0.10 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.584937E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.513 | TFLOPs: 10.00 | 7: iteration 42020/ 173500 | consumed samples: 10757120 | consumed tokens: 22030581760 | elapsed time per iteration (s): 0.08 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 4.586253E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.013 | TFLOPs: 11.87 | 7: iteration 42030/ 173500 | consumed samples: 10759680 | consumed tokens: 22035824640 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.580510E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.540 | TFLOPs: 11.76 | 7: iteration 42040/ 173500 | consumed samples: 10762240 | consumed tokens: 22041067520 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.574530E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.881 | TFLOPs: 11.85 | 7: iteration 42050/ 173500 | consumed samples: 10764800 | consumed tokens: 22046310400 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.577329E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.585 | TFLOPs: 11.77 | 7: iteration 42060/ 173500 | consumed samples: 10767360 | consumed tokens: 22051553280 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.579108E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.036 | TFLOPs: 11.84 | 7: iteration 42070/ 173500 | consumed samples: 10769920 | consumed tokens: 22056796160 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.566798E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.445 | TFLOPs: 11.84 | 7: iteration 42080/ 173500 | consumed samples: 10772480 | consumed tokens: 22062039040 | elapsed time per iteration (s): 0.09 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.573680E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.792 | TFLOPs: 10.04 | 7: iteration 42090/ 173500 | consumed samples: 10775040 | consumed tokens: 22067281920 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.577251E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.994 | TFLOPs: 11.81 | 7: iteration 42100/ 173500 | consumed samples: 10777600 | consumed tokens: 22072524800 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.582529E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.602 | TFLOPs: 11.82 | 7: iteration 42110/ 173500 | consumed samples: 10780160 | consumed tokens: 22077767680 | elapsed time per iteration (s): 0.08 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 4.569302E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.323 | TFLOPs: 11.85 | 7: iteration 42120/ 173500 | consumed samples: 10782720 | consumed tokens: 22083010560 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.567287E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.873 | TFLOPs: 11.84 | 7: iteration 42130/ 173500 | consumed samples: 10785280 | consumed tokens: 22088253440 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.568893E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.391 | TFLOPs: 11.61 | 7: iteration 42140/ 173500 | consumed samples: 10787840 | consumed tokens: 22093496320 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.593831E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.788 | TFLOPs: 11.88 | 7: iteration 42150/ 173500 | consumed samples: 10790400 | consumed tokens: 22098739200 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.569550E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.312 | TFLOPs: 12.03 | 7: iteration 42160/ 173500 | consumed samples: 10792960 | consumed tokens: 22103982080 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.576237E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.512 | TFLOPs: 12.03 | 7: iteration 42170/ 173500 | consumed samples: 10795520 | consumed tokens: 22109224960 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.574880E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.849 | TFLOPs: 12.02 | 7: iteration 42180/ 173500 | consumed samples: 10798080 | consumed tokens: 22114467840 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.572314E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.542 | TFLOPs: 11.96 | 7: iteration 42190/ 173500 | consumed samples: 10800640 | consumed tokens: 22119710720 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.569837E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.905 | TFLOPs: 11.73 | 7: iteration 42200/ 173500 | consumed samples: 10803200 | consumed tokens: 22124953600 | elapsed time per iteration (s): 0.08 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 4.570585E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.760 | TFLOPs: 11.92 | 7: iteration 42210/ 173500 | consumed samples: 10805760 | consumed tokens: 22130196480 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.569450E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.032 | TFLOPs: 11.90 | 7: iteration 42220/ 173500 | consumed samples: 10808320 | consumed tokens: 22135439360 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.581210E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.489 | TFLOPs: 12.01 | 7: iteration 42230/ 173500 | consumed samples: 10810880 | consumed tokens: 22140682240 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.573506E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.401 | TFLOPs: 11.64 | 7: iteration 42240/ 173500 | consumed samples: 10813440 | consumed tokens: 22145925120 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.580751E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.960 | TFLOPs: 11.70 | 7: iteration 42250/ 173500 | consumed samples: 10816000 | consumed tokens: 22151168000 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.575924E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.891 | TFLOPs: 12.01 | 7: iteration 42260/ 173500 | consumed samples: 10818560 | consumed tokens: 22156410880 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.586600E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.347 | TFLOPs: 12.01 | 7: iteration 42270/ 173500 | consumed samples: 10821120 | consumed tokens: 22161653760 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.581550E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.716 | TFLOPs: 12.03 | 7: iteration 42280/ 173500 | consumed samples: 10823680 | consumed tokens: 22166896640 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.584069E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.992 | TFLOPs: 12.01 | 7: iteration 42290/ 173500 | consumed samples: 10826240 | consumed tokens: 22172139520 | elapsed time per iteration (s): 0.08 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 4.575597E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.475 | TFLOPs: 11.96 | 7: iteration 42300/ 173500 | consumed samples: 10828800 | consumed tokens: 22177382400 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.559446E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.646 | TFLOPs: 11.99 | 7: iteration 42310/ 173500 | consumed samples: 10831360 | consumed tokens: 22182625280 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.574594E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.602 | TFLOPs: 12.00 | 7: iteration 42320/ 173500 | consumed samples: 10833920 | consumed tokens: 22187868160 | elapsed time per iteration (s): 0.11 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.563659E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2303.242 | TFLOPs: 8.57 | 7: iteration 42330/ 173500 | consumed samples: 10836480 | consumed tokens: 22193111040 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.590067E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.578 | TFLOPs: 11.73 | 7: iteration 42340/ 173500 | consumed samples: 10839040 | consumed tokens: 22198353920 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.575418E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.540 | TFLOPs: 11.85 | 7: iteration 42350/ 173500 | consumed samples: 10841600 | consumed tokens: 22203596800 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.568446E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.553 | TFLOPs: 11.87 | 7: iteration 42360/ 173500 | consumed samples: 10844160 | consumed tokens: 22208839680 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.575826E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.324 | TFLOPs: 11.92 | 7: iteration 42370/ 173500 | consumed samples: 10846720 | consumed tokens: 22214082560 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.589141E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.407 | TFLOPs: 11.90 | 7: iteration 42380/ 173500 | consumed samples: 10849280 | consumed tokens: 22219325440 | elapsed time per iteration (s): 0.08 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 4.589105E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.846 | TFLOPs: 11.92 | 7: iteration 42390/ 173500 | consumed samples: 10851840 | consumed tokens: 22224568320 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.572351E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.740 | TFLOPs: 11.91 | 7: iteration 42400/ 173500 | consumed samples: 10854400 | consumed tokens: 22229811200 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.587753E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.359 | TFLOPs: 11.87 | 7: iteration 42410/ 173500 | consumed samples: 10856960 | consumed tokens: 22235054080 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.566610E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.672 | TFLOPs: 11.97 | 7: iteration 42420/ 173500 | consumed samples: 10859520 | consumed tokens: 22240296960 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.579283E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.093 | TFLOPs: 11.96 | 7: iteration 42430/ 173500 | consumed samples: 10862080 | consumed tokens: 22245539840 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.581690E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.931 | TFLOPs: 11.79 | 7: iteration 42440/ 173500 | consumed samples: 10864640 | consumed tokens: 22250782720 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.570692E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.176 | TFLOPs: 11.80 | 7: iteration 42450/ 173500 | consumed samples: 10867200 | consumed tokens: 22256025600 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.574075E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.931 | TFLOPs: 11.54 | 7: iteration 42460/ 173500 | consumed samples: 10869760 | consumed tokens: 22261268480 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.571633E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.116 | TFLOPs: 11.84 | 7: iteration 42470/ 173500 | consumed samples: 10872320 | consumed tokens: 22266511360 | elapsed time per iteration (s): 0.08 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 4.572714E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.199 | TFLOPs: 11.56 | 7: iteration 42480/ 173500 | consumed samples: 10874880 | consumed tokens: 22271754240 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.576125E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.692 | TFLOPs: 11.63 | 7: iteration 42490/ 173500 | consumed samples: 10877440 | consumed tokens: 22276997120 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.577864E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.951 | TFLOPs: 11.82 | 7: iteration 42500/ 173500 | consumed samples: 10880000 | consumed tokens: 22282240000 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.571521E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.678 | TFLOPs: 11.95 | 7: iteration 42510/ 173500 | consumed samples: 10882560 | consumed tokens: 22287482880 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.583159E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.301 | TFLOPs: 11.80 | 7: iteration 42520/ 173500 | consumed samples: 10885120 | consumed tokens: 22292725760 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.585036E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.274 | TFLOPs: 11.64 | 7: iteration 42530/ 173500 | consumed samples: 10887680 | consumed tokens: 22297968640 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.577497E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.735 | TFLOPs: 12.01 | 7: iteration 42540/ 173500 | consumed samples: 10890240 | consumed tokens: 22303211520 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.586949E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.520 | TFLOPs: 11.78 | 7: iteration 42550/ 173500 | consumed samples: 10892800 | consumed tokens: 22308454400 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.593679E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.096 | TFLOPs: 11.65 | 7: iteration 42560/ 173500 | consumed samples: 10895360 | consumed tokens: 22313697280 | elapsed time per iteration (s): 0.08 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 4.580481E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.655 | TFLOPs: 11.93 | 7: iteration 42570/ 173500 | consumed samples: 10897920 | consumed tokens: 22318940160 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.567385E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.257 | TFLOPs: 11.93 | 7: iteration 42580/ 173500 | consumed samples: 10900480 | consumed tokens: 22324183040 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.574709E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.954 | TFLOPs: 11.92 | 7: iteration 42590/ 173500 | consumed samples: 10903040 | consumed tokens: 22329425920 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.580191E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.405 | TFLOPs: 11.90 | 7: iteration 42600/ 173500 | consumed samples: 10905600 | consumed tokens: 22334668800 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.582903E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.966 | TFLOPs: 11.94 | 7: iteration 42610/ 173500 | consumed samples: 10908160 | consumed tokens: 22339911680 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.588206E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.035 | TFLOPs: 11.94 | 7: iteration 42620/ 173500 | consumed samples: 10910720 | consumed tokens: 22345154560 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.570691E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.446 | TFLOPs: 12.02 | 7: iteration 42630/ 173500 | consumed samples: 10913280 | consumed tokens: 22350397440 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.571578E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.739 | TFLOPs: 12.04 | 7: iteration 42640/ 173500 | consumed samples: 10915840 | consumed tokens: 22355640320 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.580852E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.660 | TFLOPs: 12.05 | 7: iteration 42650/ 173500 | consumed samples: 10918400 | consumed tokens: 22360883200 | elapsed time per iteration (s): 0.08 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 4.577652E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.798 | TFLOPs: 12.02 | 7: iteration 42660/ 173500 | consumed samples: 10920960 | consumed tokens: 22366126080 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.580218E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.596 | TFLOPs: 12.05 | 7: iteration 42670/ 173500 | consumed samples: 10923520 | consumed tokens: 22371368960 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.574010E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.556 | TFLOPs: 11.98 | 7: iteration 42680/ 173500 | consumed samples: 10926080 | consumed tokens: 22376611840 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.566947E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.681 | TFLOPs: 12.05 | 7: iteration 42690/ 173500 | consumed samples: 10928640 | consumed tokens: 22381854720 | elapsed time per iteration (s): 0.09 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.576217E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.627 | TFLOPs: 11.04 | 7: iteration 42700/ 173500 | consumed samples: 10931200 | consumed tokens: 22387097600 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.569005E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.009 | TFLOPs: 12.03 | 7: iteration 42710/ 173500 | consumed samples: 10933760 | consumed tokens: 22392340480 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.569356E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.162 | TFLOPs: 11.43 | 7: iteration 42720/ 173500 | consumed samples: 10936320 | consumed tokens: 22397583360 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.581041E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.553 | TFLOPs: 12.03 | 7: iteration 42730/ 173500 | consumed samples: 10938880 | consumed tokens: 22402826240 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.574366E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.557 | TFLOPs: 12.05 | 7: iteration 42740/ 173500 | consumed samples: 10941440 | consumed tokens: 22408069120 | elapsed time per iteration (s): 0.08 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 4.579152E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.648 | TFLOPs: 11.81 | 7: iteration 42750/ 173500 | consumed samples: 10944000 | consumed tokens: 22413312000 | elapsed time per iteration (s): 0.08 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.578859E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.889 | TFLOPs: 11.49 | 7: iteration 42760/ 173500 | consumed samples: 10946560 | consumed tokens: 22418554880 | elapsed time per iteration (s): 0.08 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.561365E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.229 | TFLOPs: 12.03 | 7: iteration 42770/ 173500 | consumed samples: 10949120 | consumed tokens: 22423797760 | elapsed time per iteration (s): 0.08 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.566924E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.458 | TFLOPs: 11.45 | 7: iteration 42780/ 173500 | consumed samples: 10951680 | consumed tokens: 22429040640 | elapsed time per iteration (s): 0.09 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.582055E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.006 | TFLOPs: 10.56 | 7: iteration 42790/ 173500 | consumed samples: 10954240 | consumed tokens: 22434283520 | elapsed time per iteration (s): 0.10 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.572196E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.770 | TFLOPs: 9.19 | 7: iteration 42800/ 173500 | consumed samples: 10956800 | consumed tokens: 22439526400 | elapsed time per iteration (s): 0.11 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.572577E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.602 | TFLOPs: 8.82 | 7: iteration 42810/ 173500 | consumed samples: 10959360 | consumed tokens: 22444769280 | elapsed time per iteration (s): 0.10 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.572308E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.430 | TFLOPs: 9.74 | 7: iteration 42820/ 173500 | consumed samples: 10961920 | consumed tokens: 22450012160 | elapsed time per iteration (s): 0.08 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.586255E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.819 | TFLOPs: 11.60 | 7: iteration 42830/ 173500 | consumed samples: 10964480 | consumed tokens: 22455255040 | elapsed time per iteration (s): 0.08 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 4.565234E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.849 | TFLOPs: 11.52 | 7: iteration 42840/ 173500 | consumed samples: 10967040 | consumed tokens: 22460497920 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.580873E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.708 | TFLOPs: 11.83 | 7: iteration 42850/ 173500 | consumed samples: 10969600 | consumed tokens: 22465740800 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.562960E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.935 | TFLOPs: 11.80 | 7: iteration 42860/ 173500 | consumed samples: 10972160 | consumed tokens: 22470983680 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.586021E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.334 | TFLOPs: 11.78 | 7: iteration 42870/ 173500 | consumed samples: 10974720 | consumed tokens: 22476226560 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.574536E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.395 | TFLOPs: 11.85 | 7: iteration 42880/ 173500 | consumed samples: 10977280 | consumed tokens: 22481469440 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.581582E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.175 | TFLOPs: 11.76 | 7: iteration 42890/ 173500 | consumed samples: 10979840 | consumed tokens: 22486712320 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.565374E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.063 | TFLOPs: 11.80 | 7: iteration 42900/ 173500 | consumed samples: 10982400 | consumed tokens: 22491955200 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.565311E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.851 | TFLOPs: 11.87 | 7: iteration 42910/ 173500 | consumed samples: 10984960 | consumed tokens: 22497198080 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.572469E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.774 | TFLOPs: 11.85 | 7: iteration 42920/ 173500 | consumed samples: 10987520 | consumed tokens: 22502440960 | elapsed time per iteration (s): 0.08 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 4.578968E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.760 | TFLOPs: 11.87 | 7: iteration 42930/ 173500 | consumed samples: 10990080 | consumed tokens: 22507683840 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.569817E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.215 | TFLOPs: 11.82 | 7: iteration 42940/ 173500 | consumed samples: 10992640 | consumed tokens: 22512926720 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.575682E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.172 | TFLOPs: 11.85 | 7: iteration 42950/ 173500 | consumed samples: 10995200 | consumed tokens: 22518169600 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.584467E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.106 | TFLOPs: 11.90 | 7: iteration 42960/ 173500 | consumed samples: 10997760 | consumed tokens: 22523412480 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.575896E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.688 | TFLOPs: 11.91 | 7: iteration 42970/ 173500 | consumed samples: 11000320 | consumed tokens: 22528655360 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.586424E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.113 | TFLOPs: 11.94 | 7: iteration 42980/ 173500 | consumed samples: 11002880 | consumed tokens: 22533898240 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.584004E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.879 | TFLOPs: 11.93 | 7: iteration 42990/ 173500 | consumed samples: 11005440 | consumed tokens: 22539141120 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.567888E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.583 | TFLOPs: 11.90 | 7: iteration 43000/ 173500 | consumed samples: 11008000 | consumed tokens: 22544384000 | elapsed time per iteration (s): 0.08 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 4.569686E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.899 | TFLOPs: 11.94 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 43000 | lm loss value: 4.408333E+00 | lm loss PPL: 8.213246E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 43000 to checkpoints_14m91b100m 0: [2023-03-17 01:18:32,842] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step43000 is begin to save! 0: [2023-03-17 01:18:32,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:18:32,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:18:32,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:18:32,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:18:32,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:18:32,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:18:32,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:18:32,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:18:32,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:18:32,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:18:32,883] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:18:32,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:18:32,884] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step43000/mp_rank_00_model_states.pt 0: [2023-03-17 01:18:32,884] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:18:32,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:18:32,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:18:32,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2023-03-17 01:18:32,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2023-03-17 01:18:32,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: successfully saved checkpoint at iteration 43000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.03 7: iteration 43010/ 173500 | consumed samples: 11010560 | consumed tokens: 22549626880 | elapsed time per iteration (s): 0.09 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.571580E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.529 | TFLOPs: 10.57 | 7: iteration 43020/ 173500 | consumed samples: 11013120 | consumed tokens: 22554869760 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.585295E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.265 | TFLOPs: 11.96 | 7: iteration 43030/ 173500 | consumed samples: 11015680 | consumed tokens: 22560112640 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.575180E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.983 | TFLOPs: 11.83 | 7: iteration 43040/ 173500 | consumed samples: 11018240 | consumed tokens: 22565355520 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.562627E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.888 | TFLOPs: 11.69 | 7: iteration 43050/ 173500 | consumed samples: 11020800 | consumed tokens: 22570598400 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.580573E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.991 | TFLOPs: 11.62 | 7: iteration 43060/ 173500 | consumed samples: 11023360 | consumed tokens: 22575841280 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.570690E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.966 | TFLOPs: 11.25 | 7: iteration 43070/ 173500 | consumed samples: 11025920 | consumed tokens: 22581084160 | elapsed time per iteration (s): 0.09 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.573679E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2718.182 | TFLOPs: 10.11 | 7: iteration 43080/ 173500 | consumed samples: 11028480 | consumed tokens: 22586327040 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.570606E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.684 | TFLOPs: 11.88 | 7: iteration 43090/ 173500 | consumed samples: 11031040 | consumed tokens: 22591569920 | elapsed time per iteration (s): 0.08 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 4.563688E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.758 | TFLOPs: 11.88 | 7: iteration 43100/ 173500 | consumed samples: 11033600 | consumed tokens: 22596812800 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.570801E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.711 | TFLOPs: 11.84 | 7: iteration 43110/ 173500 | consumed samples: 11036160 | consumed tokens: 22602055680 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.572932E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.823 | TFLOPs: 11.85 | 7: iteration 43120/ 173500 | consumed samples: 11038720 | consumed tokens: 22607298560 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.580492E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.383 | TFLOPs: 11.91 | 7: iteration 43130/ 173500 | consumed samples: 11041280 | consumed tokens: 22612541440 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.590588E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.719 | TFLOPs: 11.82 | 7: iteration 43140/ 173500 | consumed samples: 11043840 | consumed tokens: 22617784320 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.561182E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.422 | TFLOPs: 11.88 | 7: iteration 43150/ 173500 | consumed samples: 11046400 | consumed tokens: 22623027200 | elapsed time per iteration (s): 0.09 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.575483E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2782.578 | TFLOPs: 10.35 | 7: iteration 43160/ 173500 | consumed samples: 11048960 | consumed tokens: 22628270080 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.574825E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.010 | TFLOPs: 11.75 | 7: iteration 43170/ 173500 | consumed samples: 11051520 | consumed tokens: 22633512960 | elapsed time per iteration (s): 0.08 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.570940E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.920 | TFLOPs: 11.84 | 7: iteration 43180/ 173500 | consumed samples: 11054080 | consumed tokens: 22638755840 | elapsed time per iteration (s): 0.09 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 4.579736E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.052 | TFLOPs: 10.96 | 7: iteration 43190/ 173500 | consumed samples: 11056640 | consumed tokens: 22643998720 | elapsed time per iteration (s): 0.08 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.579988E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.882 | TFLOPs: 11.59 | 7: iteration 43200/ 173500 | consumed samples: 11059200 | consumed tokens: 22649241600 | elapsed time per iteration (s): 0.10 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.570726E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2594.846 | TFLOPs: 9.65 | 7: iteration 43210/ 173500 | consumed samples: 11061760 | consumed tokens: 22654484480 | elapsed time per iteration (s): 0.24 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.584113E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1071.143 | TFLOPs: 3.98 | 7: iteration 43220/ 173500 | consumed samples: 11064320 | consumed tokens: 22659727360 | elapsed time per iteration (s): 0.09 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.579121E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2934.417 | TFLOPs: 10.91 | 7: iteration 43230/ 173500 | consumed samples: 11066880 | consumed tokens: 22664970240 | elapsed time per iteration (s): 0.09 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.585915E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.051 | TFLOPs: 10.38 | 7: iteration 43240/ 173500 | consumed samples: 11069440 | consumed tokens: 22670213120 | elapsed time per iteration (s): 0.09 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.574166E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.417 | TFLOPs: 10.17 | 7: iteration 43250/ 173500 | consumed samples: 11072000 | consumed tokens: 22675456000 | elapsed time per iteration (s): 0.09 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.574375E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.733 | TFLOPs: 11.06 | 7: iteration 43260/ 173500 | consumed samples: 11074560 | consumed tokens: 22680698880 | elapsed time per iteration (s): 0.08 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.580832E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.035 | TFLOPs: 11.65 | 7: iteration 43270/ 173500 | consumed samples: 11077120 | consumed tokens: 22685941760 | elapsed time per iteration (s): 0.08 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.589838E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.090 | TFLOPs: 11.88 | 7: iteration 43280/ 173500 | consumed samples: 11079680 | consumed tokens: 22691184640 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.584698E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.973 | TFLOPs: 11.84 | 7: iteration 43290/ 173500 | consumed samples: 11082240 | consumed tokens: 22696427520 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.588787E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.192 | TFLOPs: 11.85 | 7: iteration 43300/ 173500 | consumed samples: 11084800 | consumed tokens: 22701670400 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.575720E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.431 | TFLOPs: 11.85 | 7: iteration 43310/ 173500 | consumed samples: 11087360 | consumed tokens: 22706913280 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.588450E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.250 | TFLOPs: 11.85 | 7: iteration 43320/ 173500 | consumed samples: 11089920 | consumed tokens: 22712156160 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.560648E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.771 | TFLOPs: 11.83 | 7: iteration 43330/ 173500 | consumed samples: 11092480 | consumed tokens: 22717399040 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.579490E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.177 | TFLOPs: 11.59 | 7: iteration 43340/ 173500 | consumed samples: 11095040 | consumed tokens: 22722641920 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.574958E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.168 | TFLOPs: 11.62 | 7: iteration 43350/ 173500 | consumed samples: 11097600 | consumed tokens: 22727884800 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.565141E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.972 | TFLOPs: 11.51 | 7: iteration 43360/ 173500 | consumed samples: 11100160 | consumed tokens: 22733127680 | elapsed time per iteration (s): 0.08 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 4.591310E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.026 | TFLOPs: 11.76 | 7: iteration 43370/ 173500 | consumed samples: 11102720 | consumed tokens: 22738370560 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.566418E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.643 | TFLOPs: 11.82 | 7: iteration 43380/ 173500 | consumed samples: 11105280 | consumed tokens: 22743613440 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.565600E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.841 | TFLOPs: 11.77 | 7: iteration 43390/ 173500 | consumed samples: 11107840 | consumed tokens: 22748856320 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.584928E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.418 | TFLOPs: 11.80 | 7: iteration 43400/ 173500 | consumed samples: 11110400 | consumed tokens: 22754099200 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.574604E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.900 | TFLOPs: 11.70 | 7: iteration 43410/ 173500 | consumed samples: 11112960 | consumed tokens: 22759342080 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.571633E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.989 | TFLOPs: 11.74 | 7: iteration 43420/ 173500 | consumed samples: 11115520 | consumed tokens: 22764584960 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.569482E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.378 | TFLOPs: 11.63 | 7: iteration 43430/ 173500 | consumed samples: 11118080 | consumed tokens: 22769827840 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.582073E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.741 | TFLOPs: 11.73 | 7: iteration 43440/ 173500 | consumed samples: 11120640 | consumed tokens: 22775070720 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.576320E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.668 | TFLOPs: 11.80 | 7: iteration 43450/ 173500 | consumed samples: 11123200 | consumed tokens: 22780313600 | elapsed time per iteration (s): 0.08 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 4.572444E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.007 | TFLOPs: 11.53 | 7: iteration 43460/ 173500 | consumed samples: 11125760 | consumed tokens: 22785556480 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.569277E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.374 | TFLOPs: 11.60 | 7: iteration 43470/ 173500 | consumed samples: 11128320 | consumed tokens: 22790799360 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.588070E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.860 | TFLOPs: 11.66 | 7: iteration 43480/ 173500 | consumed samples: 11130880 | consumed tokens: 22796042240 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.577026E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.358 | TFLOPs: 11.54 | 7: iteration 43490/ 173500 | consumed samples: 11133440 | consumed tokens: 22801285120 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.574763E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.633 | TFLOPs: 11.57 | 7: iteration 43500/ 173500 | consumed samples: 11136000 | consumed tokens: 22806528000 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.568064E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.103 | TFLOPs: 11.74 | 7: iteration 43510/ 173500 | consumed samples: 11138560 | consumed tokens: 22811770880 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.559513E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.648 | TFLOPs: 11.76 | 7: iteration 43520/ 173500 | consumed samples: 11141120 | consumed tokens: 22817013760 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.564746E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.155 | TFLOPs: 11.60 | 7: iteration 43530/ 173500 | consumed samples: 11143680 | consumed tokens: 22822256640 | elapsed time per iteration (s): 0.08 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 4.579877E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.012 | TFLOPs: 11.73 | 7: iteration 43540/ 173500 | consumed samples: 11146240 | consumed tokens: 22827499520 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.580347E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.561 | TFLOPs: 11.72 | 7: iteration 43550/ 173500 | consumed samples: 11148800 | consumed tokens: 22832742400 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.570273E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.583 | TFLOPs: 11.72 | 7: iteration 43560/ 173500 | consumed samples: 11151360 | consumed tokens: 22837985280 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.583656E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.711 | TFLOPs: 11.66 | 7: iteration 43570/ 173500 | consumed samples: 11153920 | consumed tokens: 22843228160 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.573885E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.295 | TFLOPs: 11.68 | 7: iteration 43580/ 173500 | consumed samples: 11156480 | consumed tokens: 22848471040 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.580902E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.813 | TFLOPs: 11.45 | 7: iteration 43590/ 173500 | consumed samples: 11159040 | consumed tokens: 22853713920 | elapsed time per iteration (s): 0.09 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.583480E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.110 | TFLOPs: 10.72 | 7: iteration 43600/ 173500 | consumed samples: 11161600 | consumed tokens: 22858956800 | elapsed time per iteration (s): 0.10 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.570505E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2663.276 | TFLOPs: 9.91 | 7: iteration 43610/ 173500 | consumed samples: 11164160 | consumed tokens: 22864199680 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.569391E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.536 | TFLOPs: 11.85 | 7: iteration 43620/ 173500 | consumed samples: 11166720 | consumed tokens: 22869442560 | elapsed time per iteration (s): 0.08 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 4.574266E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.288 | TFLOPs: 11.85 | 7: iteration 43630/ 173500 | consumed samples: 11169280 | consumed tokens: 22874685440 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.576385E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.018 | TFLOPs: 11.63 | 7: iteration 43640/ 173500 | consumed samples: 11171840 | consumed tokens: 22879928320 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.568542E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.936 | TFLOPs: 11.26 | 7: iteration 43650/ 173500 | consumed samples: 11174400 | consumed tokens: 22885171200 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.583160E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.568 | TFLOPs: 11.75 | 7: iteration 43660/ 173500 | consumed samples: 11176960 | consumed tokens: 22890414080 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.573158E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.590 | TFLOPs: 11.33 | 7: iteration 43670/ 173500 | consumed samples: 11179520 | consumed tokens: 22895656960 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.566710E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.872 | TFLOPs: 11.65 | 7: iteration 43680/ 173500 | consumed samples: 11182080 | consumed tokens: 22900899840 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.562314E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.128 | TFLOPs: 11.78 | 7: iteration 43690/ 173500 | consumed samples: 11184640 | consumed tokens: 22906142720 | elapsed time per iteration (s): 0.08 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.578047E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.015 | TFLOPs: 11.78 | 7: iteration 43700/ 173500 | consumed samples: 11187200 | consumed tokens: 22911385600 | elapsed time per iteration (s): 0.10 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.569608E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2659.569 | TFLOPs: 9.89 | 7: iteration 43710/ 173500 | consumed samples: 11189760 | consumed tokens: 22916628480 | elapsed time per iteration (s): 0.11 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 4.582418E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2255.543 | TFLOPs: 8.39 | 7: iteration 43720/ 173500 | consumed samples: 11192320 | consumed tokens: 22921871360 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.568541E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.705 | TFLOPs: 11.30 | 7: iteration 43730/ 173500 | consumed samples: 11194880 | consumed tokens: 22927114240 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.573805E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.042 | TFLOPs: 11.87 | 7: iteration 43740/ 173500 | consumed samples: 11197440 | consumed tokens: 22932357120 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.566607E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.276 | TFLOPs: 11.29 | 7: iteration 43750/ 173500 | consumed samples: 11200000 | consumed tokens: 22937600000 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.566990E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.658 | TFLOPs: 11.93 | 7: iteration 43760/ 173500 | consumed samples: 11202560 | consumed tokens: 22942842880 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.580869E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.284 | TFLOPs: 11.90 | 7: iteration 43770/ 173500 | consumed samples: 11205120 | consumed tokens: 22948085760 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.558286E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.072 | TFLOPs: 11.86 | 7: iteration 43780/ 173500 | consumed samples: 11207680 | consumed tokens: 22953328640 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.578640E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.206 | TFLOPs: 11.88 | 7: iteration 43790/ 173500 | consumed samples: 11210240 | consumed tokens: 22958571520 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.576872E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.063 | TFLOPs: 11.87 | 7: iteration 43800/ 173500 | consumed samples: 11212800 | consumed tokens: 22963814400 | elapsed time per iteration (s): 0.08 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 4.574237E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.484 | TFLOPs: 11.82 | 7: iteration 43810/ 173500 | consumed samples: 11215360 | consumed tokens: 22969057280 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.565914E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.259 | TFLOPs: 11.85 | 7: iteration 43820/ 173500 | consumed samples: 11217920 | consumed tokens: 22974300160 | elapsed time per iteration (s): 0.09 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.573355E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.634 | TFLOPs: 11.08 | 7: iteration 43830/ 173500 | consumed samples: 11220480 | consumed tokens: 22979543040 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.576347E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.981 | TFLOPs: 11.39 | 7: iteration 43840/ 173500 | consumed samples: 11223040 | consumed tokens: 22984785920 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.582713E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.148 | TFLOPs: 11.95 | 7: iteration 43850/ 173500 | consumed samples: 11225600 | consumed tokens: 22990028800 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.572652E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.387 | TFLOPs: 11.81 | 7: iteration 43860/ 173500 | consumed samples: 11228160 | consumed tokens: 22995271680 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.559372E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.211 | TFLOPs: 11.94 | 7: iteration 43870/ 173500 | consumed samples: 11230720 | consumed tokens: 23000514560 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.557596E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.873 | TFLOPs: 11.87 | 7: iteration 43880/ 173500 | consumed samples: 11233280 | consumed tokens: 23005757440 | elapsed time per iteration (s): 0.08 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 4.575730E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.232 | TFLOPs: 11.86 | 7: iteration 43890/ 173500 | consumed samples: 11235840 | consumed tokens: 23011000320 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.570638E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.254 | TFLOPs: 11.98 | 7: iteration 43900/ 173500 | consumed samples: 11238400 | consumed tokens: 23016243200 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.573367E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.269 | TFLOPs: 11.84 | 7: iteration 43910/ 173500 | consumed samples: 11240960 | consumed tokens: 23021486080 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.559781E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.835 | TFLOPs: 11.91 | 7: iteration 43920/ 173500 | consumed samples: 11243520 | consumed tokens: 23026728960 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.586452E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.388 | TFLOPs: 11.90 | 7: iteration 43930/ 173500 | consumed samples: 11246080 | consumed tokens: 23031971840 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.565062E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.584 | TFLOPs: 11.93 | 7: iteration 43940/ 173500 | consumed samples: 11248640 | consumed tokens: 23037214720 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.570444E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.773 | TFLOPs: 11.98 | 7: iteration 43950/ 173500 | consumed samples: 11251200 | consumed tokens: 23042457600 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.580407E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.266 | TFLOPs: 11.94 | 7: iteration 43960/ 173500 | consumed samples: 11253760 | consumed tokens: 23047700480 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.577855E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.635 | TFLOPs: 11.93 | 7: iteration 43970/ 173500 | consumed samples: 11256320 | consumed tokens: 23052943360 | elapsed time per iteration (s): 0.08 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 4.578968E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.253 | TFLOPs: 11.96 | 7: iteration 43980/ 173500 | consumed samples: 11258880 | consumed tokens: 23058186240 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.575307E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.043 | TFLOPs: 11.95 | 7: iteration 43990/ 173500 | consumed samples: 11261440 | consumed tokens: 23063429120 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.582415E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.696 | TFLOPs: 11.35 | 0: [2023-03-17 01:19:57,116] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=0, lr=[0.00017442202015704406, 0.00017442202015704406, 0.00017442202015704406], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 44000/ 173500 | consumed samples: 11264000 | consumed tokens: 23068672000 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.555849E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.489 | TFLOPs: 11.90 | 0: steps: 44000 loss: 4.5564 iter time (s): 0.081 samples/sec: 3143.779 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 44000 | lm loss value: 4.476929E+00 | lm loss PPL: 8.796414E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 44000 to checkpoints_14m91b100m 0: [2023-03-17 01:19:57,176] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step44000 is begin to save! 0: [2023-03-17 01:19:57,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:19:57,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:19:57,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:19:57,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:19:57,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:19:57,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:19:57,214] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:19:57,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:19:57,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:19:57,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:19:57,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:19:57,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:19:57,223] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step44000/mp_rank_00_model_states.pt 0: [2023-03-17 01:19:57,223] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:19:57,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:19:57,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:19:57,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:19:57,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2023-03-17 01:19:57,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:19:57,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 01:19:57,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2023-03-17 01:19:57,297] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2023-03-17 01:19:57,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: successfully saved checkpoint at iteration 44000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 141.01 7: iteration 44010/ 173500 | consumed samples: 11266560 | consumed tokens: 23073914880 | elapsed time per iteration (s): 0.11 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.566057E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2384.327 | TFLOPs: 8.87 | 7: iteration 44020/ 173500 | consumed samples: 11269120 | consumed tokens: 23079157760 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.585927E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.258 | TFLOPs: 11.53 | 7: iteration 44030/ 173500 | consumed samples: 11271680 | consumed tokens: 23084400640 | elapsed time per iteration (s): 0.09 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.561477E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.933 | TFLOPs: 10.86 | 7: iteration 44040/ 173500 | consumed samples: 11274240 | consumed tokens: 23089643520 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.574863E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.093 | TFLOPs: 11.82 | 7: iteration 44050/ 173500 | consumed samples: 11276800 | consumed tokens: 23094886400 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.571857E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.249 | TFLOPs: 11.90 | 7: iteration 44060/ 173500 | consumed samples: 11279360 | consumed tokens: 23100129280 | elapsed time per iteration (s): 0.08 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 4.566296E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.685 | TFLOPs: 11.46 | 7: iteration 44070/ 173500 | consumed samples: 11281920 | consumed tokens: 23105372160 | elapsed time per iteration (s): 0.12 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.571434E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.081 | TFLOPs: 8.04 | 7: iteration 44080/ 173500 | consumed samples: 11284480 | consumed tokens: 23110615040 | elapsed time per iteration (s): 0.10 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.582546E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2640.425 | TFLOPs: 9.82 | 7: iteration 44090/ 173500 | consumed samples: 11287040 | consumed tokens: 23115857920 | elapsed time per iteration (s): 0.08 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.567476E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.935 | TFLOPs: 11.62 | 7: iteration 44100/ 173500 | consumed samples: 11289600 | consumed tokens: 23121100800 | elapsed time per iteration (s): 0.09 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.574458E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.181 | TFLOPs: 11.07 | 7: iteration 44110/ 173500 | consumed samples: 11292160 | consumed tokens: 23126343680 | elapsed time per iteration (s): 0.08 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.573739E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.137 | TFLOPs: 11.66 | 7: iteration 44120/ 173500 | consumed samples: 11294720 | consumed tokens: 23131586560 | elapsed time per iteration (s): 0.08 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.567923E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.419 | TFLOPs: 11.97 | 7: iteration 44130/ 173500 | consumed samples: 11297280 | consumed tokens: 23136829440 | elapsed time per iteration (s): 0.08 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.571399E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.912 | TFLOPs: 11.97 | 7: iteration 44140/ 173500 | consumed samples: 11299840 | consumed tokens: 23142072320 | elapsed time per iteration (s): 0.08 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 4.566549E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.585 | TFLOPs: 12.03 | 7: iteration 44150/ 173500 | consumed samples: 11302400 | consumed tokens: 23147315200 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.574710E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.969 | TFLOPs: 12.04 | 7: iteration 44160/ 173500 | consumed samples: 11304960 | consumed tokens: 23152558080 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.568206E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.230 | TFLOPs: 12.07 | 7: iteration 44170/ 173500 | consumed samples: 11307520 | consumed tokens: 23157800960 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.576585E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.525 | TFLOPs: 11.79 | 7: iteration 44180/ 173500 | consumed samples: 11310080 | consumed tokens: 23163043840 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.570470E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.706 | TFLOPs: 12.05 | 7: iteration 44190/ 173500 | consumed samples: 11312640 | consumed tokens: 23168286720 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.578460E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.748 | TFLOPs: 11.87 | 7: iteration 44200/ 173500 | consumed samples: 11315200 | consumed tokens: 23173529600 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.572409E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.532 | TFLOPs: 11.69 | 7: iteration 44210/ 173500 | consumed samples: 11317760 | consumed tokens: 23178772480 | elapsed time per iteration (s): 0.09 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.574931E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.588 | TFLOPs: 11.18 | 7: iteration 44220/ 173500 | consumed samples: 11320320 | consumed tokens: 23184015360 | elapsed time per iteration (s): 0.09 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.584181E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.148 | TFLOPs: 10.40 | 7: iteration 44230/ 173500 | consumed samples: 11322880 | consumed tokens: 23189258240 | elapsed time per iteration (s): 0.08 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 4.573306E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.105 | TFLOPs: 12.03 | 7: iteration 44240/ 173500 | consumed samples: 11325440 | consumed tokens: 23194501120 | elapsed time per iteration (s): 0.09 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.578123E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.286 | TFLOPs: 10.21 | 7: iteration 44250/ 173500 | consumed samples: 11328000 | consumed tokens: 23199744000 | elapsed time per iteration (s): 0.09 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.576652E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.779 | TFLOPs: 10.88 | 7: iteration 44260/ 173500 | consumed samples: 11330560 | consumed tokens: 23204986880 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.575327E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.721 | TFLOPs: 11.50 | 7: iteration 44270/ 173500 | consumed samples: 11333120 | consumed tokens: 23210229760 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.549796E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.111 | TFLOPs: 12.03 | 7: iteration 44280/ 173500 | consumed samples: 11335680 | consumed tokens: 23215472640 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.561896E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.856 | TFLOPs: 12.01 | 7: iteration 44290/ 173500 | consumed samples: 11338240 | consumed tokens: 23220715520 | elapsed time per iteration (s): 0.10 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.580132E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2534.074 | TFLOPs: 9.43 | 7: iteration 44300/ 173500 | consumed samples: 11340800 | consumed tokens: 23225958400 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.568256E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.338 | TFLOPs: 11.94 | 7: iteration 44310/ 173500 | consumed samples: 11343360 | consumed tokens: 23231201280 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.577150E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3021.766 | TFLOPs: 11.24 | 7: iteration 44320/ 173500 | consumed samples: 11345920 | consumed tokens: 23236444160 | elapsed time per iteration (s): 0.08 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 4.568517E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.906 | TFLOPs: 11.23 | 7: iteration 44330/ 173500 | consumed samples: 11348480 | consumed tokens: 23241687040 | elapsed time per iteration (s): 0.10 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.573241E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.205 | TFLOPs: 9.37 | 7: iteration 44340/ 173500 | consumed samples: 11351040 | consumed tokens: 23246929920 | elapsed time per iteration (s): 0.10 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.574732E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2475.594 | TFLOPs: 9.21 | 7: iteration 44350/ 173500 | consumed samples: 11353600 | consumed tokens: 23252172800 | elapsed time per iteration (s): 0.14 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.574456E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1881.292 | TFLOPs: 7.00 | 7: iteration 44360/ 173500 | consumed samples: 11356160 | consumed tokens: 23257415680 | elapsed time per iteration (s): 0.11 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.578030E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2315.927 | TFLOPs: 8.61 | 7: iteration 44370/ 173500 | consumed samples: 11358720 | consumed tokens: 23262658560 | elapsed time per iteration (s): 0.10 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.576967E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2581.016 | TFLOPs: 9.60 | 7: iteration 44380/ 173500 | consumed samples: 11361280 | consumed tokens: 23267901440 | elapsed time per iteration (s): 0.09 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.571406E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2708.120 | TFLOPs: 10.07 | 7: iteration 44390/ 173500 | consumed samples: 11363840 | consumed tokens: 23273144320 | elapsed time per iteration (s): 0.11 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.584108E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2435.970 | TFLOPs: 9.06 | 7: iteration 44400/ 173500 | consumed samples: 11366400 | consumed tokens: 23278387200 | elapsed time per iteration (s): 0.09 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 4.570598E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.224 | TFLOPs: 10.07 | 7: iteration 44410/ 173500 | consumed samples: 11368960 | consumed tokens: 23283630080 | elapsed time per iteration (s): 0.08 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.567468E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.543 | TFLOPs: 11.35 | 7: iteration 44420/ 173500 | consumed samples: 11371520 | consumed tokens: 23288872960 | elapsed time per iteration (s): 0.09 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.574964E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.400 | TFLOPs: 10.23 | 7: iteration 44430/ 173500 | consumed samples: 11374080 | consumed tokens: 23294115840 | elapsed time per iteration (s): 0.09 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.560423E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2745.502 | TFLOPs: 10.21 | 7: iteration 44440/ 173500 | consumed samples: 11376640 | consumed tokens: 23299358720 | elapsed time per iteration (s): 0.10 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.572449E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2603.558 | TFLOPs: 9.68 | 7: iteration 44450/ 173500 | consumed samples: 11379200 | consumed tokens: 23304601600 | elapsed time per iteration (s): 0.09 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.563581E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2780.547 | TFLOPs: 10.34 | 7: iteration 44460/ 173500 | consumed samples: 11381760 | consumed tokens: 23309844480 | elapsed time per iteration (s): 0.09 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.582584E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.409 | TFLOPs: 10.94 | 7: iteration 44470/ 173500 | consumed samples: 11384320 | consumed tokens: 23315087360 | elapsed time per iteration (s): 0.09 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.571960E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.173 | TFLOPs: 10.20 | 7: iteration 44480/ 173500 | consumed samples: 11386880 | consumed tokens: 23320330240 | elapsed time per iteration (s): 0.10 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.573207E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.259 | TFLOPs: 9.92 | 7: iteration 44490/ 173500 | consumed samples: 11389440 | consumed tokens: 23325573120 | elapsed time per iteration (s): 0.11 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 4.568162E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.532 | TFLOPs: 8.56 | 7: iteration 44500/ 173500 | consumed samples: 11392000 | consumed tokens: 23330816000 | elapsed time per iteration (s): 0.11 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.565427E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2339.975 | TFLOPs: 8.70 | 7: iteration 44510/ 173500 | consumed samples: 11394560 | consumed tokens: 23336058880 | elapsed time per iteration (s): 0.13 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.558449E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2013.363 | TFLOPs: 7.49 | 7: iteration 44520/ 173500 | consumed samples: 11397120 | consumed tokens: 23341301760 | elapsed time per iteration (s): 0.13 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.574483E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1974.516 | TFLOPs: 7.34 | 7: iteration 44530/ 173500 | consumed samples: 11399680 | consumed tokens: 23346544640 | elapsed time per iteration (s): 0.11 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.561010E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.929 | TFLOPs: 8.86 | 7: iteration 44540/ 173500 | consumed samples: 11402240 | consumed tokens: 23351787520 | elapsed time per iteration (s): 0.11 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.581649E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.077 | TFLOPs: 8.44 | 7: iteration 44550/ 173500 | consumed samples: 11404800 | consumed tokens: 23357030400 | elapsed time per iteration (s): 0.09 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.567552E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.698 | TFLOPs: 11.12 | 7: iteration 44560/ 173500 | consumed samples: 11407360 | consumed tokens: 23362273280 | elapsed time per iteration (s): 0.10 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.560819E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.714 | TFLOPs: 9.41 | 7: iteration 44570/ 173500 | consumed samples: 11409920 | consumed tokens: 23367516160 | elapsed time per iteration (s): 0.13 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.560904E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2037.150 | TFLOPs: 7.58 | 7: iteration 44580/ 173500 | consumed samples: 11412480 | consumed tokens: 23372759040 | elapsed time per iteration (s): 0.09 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 4.554923E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.814 | TFLOPs: 10.69 | 7: iteration 44590/ 173500 | consumed samples: 11415040 | consumed tokens: 23378001920 | elapsed time per iteration (s): 0.10 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.576153E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.116 | TFLOPs: 9.69 | 7: iteration 44600/ 173500 | consumed samples: 11417600 | consumed tokens: 23383244800 | elapsed time per iteration (s): 0.09 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.585487E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.586 | TFLOPs: 10.47 | 7: iteration 44610/ 173500 | consumed samples: 11420160 | consumed tokens: 23388487680 | elapsed time per iteration (s): 0.09 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.581668E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.861 | TFLOPs: 10.69 | 7: iteration 44620/ 173500 | consumed samples: 11422720 | consumed tokens: 23393730560 | elapsed time per iteration (s): 0.11 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.563997E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.774 | TFLOPs: 8.46 | 7: iteration 44630/ 173500 | consumed samples: 11425280 | consumed tokens: 23398973440 | elapsed time per iteration (s): 0.09 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.567782E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.850 | TFLOPs: 10.51 | 7: iteration 44640/ 173500 | consumed samples: 11427840 | consumed tokens: 23404216320 | elapsed time per iteration (s): 0.08 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.577230E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.269 | TFLOPs: 11.35 | 7: iteration 44650/ 173500 | consumed samples: 11430400 | consumed tokens: 23409459200 | elapsed time per iteration (s): 0.09 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.563726E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.921 | TFLOPs: 10.42 | 7: iteration 44660/ 173500 | consumed samples: 11432960 | consumed tokens: 23414702080 | elapsed time per iteration (s): 0.09 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 4.568056E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2910.405 | TFLOPs: 10.83 | 7: iteration 44670/ 173500 | consumed samples: 11435520 | consumed tokens: 23419944960 | elapsed time per iteration (s): 0.08 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.570674E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.142 | TFLOPs: 11.37 | 7: iteration 44680/ 173500 | consumed samples: 11438080 | consumed tokens: 23425187840 | elapsed time per iteration (s): 0.10 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.589151E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2645.469 | TFLOPs: 9.84 | 7: iteration 44690/ 173500 | consumed samples: 11440640 | consumed tokens: 23430430720 | elapsed time per iteration (s): 0.08 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.556984E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.877 | TFLOPs: 11.78 | 7: iteration 44700/ 173500 | consumed samples: 11443200 | consumed tokens: 23435673600 | elapsed time per iteration (s): 0.10 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.567378E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.366 | TFLOPs: 9.93 | 7: iteration 44710/ 173500 | consumed samples: 11445760 | consumed tokens: 23440916480 | elapsed time per iteration (s): 0.08 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.560926E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.296 | TFLOPs: 11.21 | 7: iteration 44720/ 173500 | consumed samples: 11448320 | consumed tokens: 23446159360 | elapsed time per iteration (s): 0.08 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.582381E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.518 | TFLOPs: 11.94 | 7: iteration 44730/ 173500 | consumed samples: 11450880 | consumed tokens: 23451402240 | elapsed time per iteration (s): 0.10 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.561810E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2567.975 | TFLOPs: 9.55 | 7: iteration 44740/ 173500 | consumed samples: 11453440 | consumed tokens: 23456645120 | elapsed time per iteration (s): 0.10 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.561580E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2661.649 | TFLOPs: 9.90 | 7: iteration 44750/ 173500 | consumed samples: 11456000 | consumed tokens: 23461888000 | elapsed time per iteration (s): 0.10 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.560997E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2631.819 | TFLOPs: 9.79 | 7: iteration 44760/ 173500 | consumed samples: 11458560 | consumed tokens: 23467130880 | elapsed time per iteration (s): 0.09 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.571375E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.626 | TFLOPs: 10.16 | 7: iteration 44770/ 173500 | consumed samples: 11461120 | consumed tokens: 23472373760 | elapsed time per iteration (s): 0.10 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.558457E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.498 | TFLOPs: 9.56 | 7: iteration 44780/ 173500 | consumed samples: 11463680 | consumed tokens: 23477616640 | elapsed time per iteration (s): 0.08 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.576009E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.493 | TFLOPs: 11.94 | 7: iteration 44790/ 173500 | consumed samples: 11466240 | consumed tokens: 23482859520 | elapsed time per iteration (s): 0.08 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.573524E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.299 | TFLOPs: 12.00 | 7: iteration 44800/ 173500 | consumed samples: 11468800 | consumed tokens: 23488102400 | elapsed time per iteration (s): 0.08 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.566269E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.056 | TFLOPs: 11.92 | 7: iteration 44810/ 173500 | consumed samples: 11471360 | consumed tokens: 23493345280 | elapsed time per iteration (s): 0.08 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.573607E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.081 | TFLOPs: 11.74 | 7: iteration 44820/ 173500 | consumed samples: 11473920 | consumed tokens: 23498588160 | elapsed time per iteration (s): 0.10 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.581580E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2606.094 | TFLOPs: 9.69 | 7: iteration 44830/ 173500 | consumed samples: 11476480 | consumed tokens: 23503831040 | elapsed time per iteration (s): 0.08 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 4.570247E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.249 | TFLOPs: 11.94 | 7: iteration 44840/ 173500 | consumed samples: 11479040 | consumed tokens: 23509073920 | elapsed time per iteration (s): 0.08 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.569730E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.172 | TFLOPs: 11.42 | 7: iteration 44850/ 173500 | consumed samples: 11481600 | consumed tokens: 23514316800 | elapsed time per iteration (s): 0.09 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.572060E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.518 | TFLOPs: 10.98 | 7: iteration 44860/ 173500 | consumed samples: 11484160 | consumed tokens: 23519559680 | elapsed time per iteration (s): 0.09 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.572260E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.362 | TFLOPs: 10.66 | 7: iteration 44870/ 173500 | consumed samples: 11486720 | consumed tokens: 23524802560 | elapsed time per iteration (s): 0.10 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.567105E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.447 | TFLOPs: 9.61 | 7: iteration 44880/ 173500 | consumed samples: 11489280 | consumed tokens: 23530045440 | elapsed time per iteration (s): 0.13 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.567738E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1958.479 | TFLOPs: 7.28 | 7: iteration 44890/ 173500 | consumed samples: 11491840 | consumed tokens: 23535288320 | elapsed time per iteration (s): 0.10 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.559033E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2535.490 | TFLOPs: 9.43 | 7: iteration 44900/ 173500 | consumed samples: 11494400 | consumed tokens: 23540531200 | elapsed time per iteration (s): 0.08 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.573305E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.557 | TFLOPs: 11.38 | 7: iteration 44910/ 173500 | consumed samples: 11496960 | consumed tokens: 23545774080 | elapsed time per iteration (s): 0.10 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.567564E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.521 | TFLOPs: 9.15 | 7: iteration 44920/ 173500 | consumed samples: 11499520 | consumed tokens: 23551016960 | elapsed time per iteration (s): 0.12 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 4.571585E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.163 | TFLOPs: 7.63 | 7: iteration 44930/ 173500 | consumed samples: 11502080 | consumed tokens: 23556259840 | elapsed time per iteration (s): 0.11 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.566491E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2246.641 | TFLOPs: 8.36 | 7: iteration 44940/ 173500 | consumed samples: 11504640 | consumed tokens: 23561502720 | elapsed time per iteration (s): 0.11 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.569106E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2287.968 | TFLOPs: 8.51 | 7: iteration 44950/ 173500 | consumed samples: 11507200 | consumed tokens: 23566745600 | elapsed time per iteration (s): 0.10 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.571745E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2478.771 | TFLOPs: 9.22 | 7: iteration 44960/ 173500 | consumed samples: 11509760 | consumed tokens: 23571988480 | elapsed time per iteration (s): 0.09 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.576530E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.102 | TFLOPs: 10.47 | 7: iteration 44970/ 173500 | consumed samples: 11512320 | consumed tokens: 23577231360 | elapsed time per iteration (s): 0.09 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.565862E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2745.907 | TFLOPs: 10.21 | 7: iteration 44980/ 173500 | consumed samples: 11514880 | consumed tokens: 23582474240 | elapsed time per iteration (s): 0.08 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.573444E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.460 | TFLOPs: 11.50 | 7: iteration 44990/ 173500 | consumed samples: 11517440 | consumed tokens: 23587717120 | elapsed time per iteration (s): 0.09 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.566533E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.823 | TFLOPs: 10.57 | 7: iteration 45000/ 173500 | consumed samples: 11520000 | consumed tokens: 23592960000 | elapsed time per iteration (s): 0.08 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.559805E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.264 | TFLOPs: 11.84 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 45000 | lm loss value: 4.418228E+00 | lm loss PPL: 8.294914E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 45000 to checkpoints_14m91b100m 0: [2023-03-17 01:21:30,497] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step45000 is begin to save! 0: [2023-03-17 01:21:30,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:21:30,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:21:30,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:21:30,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:21:30,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:21:30,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:21:30,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:21:30,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:21:30,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:21:30,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:21:30,551] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:21:30,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:21:30,552] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step45000/mp_rank_00_model_states.pt 0: [2023-03-17 01:21:30,552] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:21:30,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:21:30,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:21:30,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2023-03-17 01:21:30,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:21:30,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:21:30,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2023-03-17 01:21:30,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:21:30,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:21:30,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2023-03-17 01:21:30,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2023-03-17 01:21:30,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: successfully saved checkpoint at iteration 45000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 93.64 7: iteration 45010/ 173500 | consumed samples: 11522560 | consumed tokens: 23598202880 | elapsed time per iteration (s): 0.09 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 4.567996E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.566 | TFLOPs: 10.31 | 7: iteration 45020/ 173500 | consumed samples: 11525120 | consumed tokens: 23603445760 | elapsed time per iteration (s): 0.08 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.564640E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.657 | TFLOPs: 11.34 | 7: iteration 45030/ 173500 | consumed samples: 11527680 | consumed tokens: 23608688640 | elapsed time per iteration (s): 0.10 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.570892E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.173 | TFLOPs: 9.36 | 7: iteration 45040/ 173500 | consumed samples: 11530240 | consumed tokens: 23613931520 | elapsed time per iteration (s): 0.11 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.567880E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.659 | TFLOPs: 8.41 | 7: iteration 45050/ 173500 | consumed samples: 11532800 | consumed tokens: 23619174400 | elapsed time per iteration (s): 0.08 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.564471E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.145 | TFLOPs: 11.84 | 7: iteration 45060/ 173500 | consumed samples: 11535360 | consumed tokens: 23624417280 | elapsed time per iteration (s): 0.09 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.575188E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2869.535 | TFLOPs: 10.67 | 7: iteration 45070/ 173500 | consumed samples: 11537920 | consumed tokens: 23629660160 | elapsed time per iteration (s): 0.11 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.595796E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2399.421 | TFLOPs: 8.92 | 7: iteration 45080/ 173500 | consumed samples: 11540480 | consumed tokens: 23634903040 | elapsed time per iteration (s): 0.08 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.565416E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.171 | TFLOPs: 11.43 | 7: iteration 45090/ 173500 | consumed samples: 11543040 | consumed tokens: 23640145920 | elapsed time per iteration (s): 0.10 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 4.572691E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2651.466 | TFLOPs: 9.86 | 7: iteration 45100/ 173500 | consumed samples: 11545600 | consumed tokens: 23645388800 | elapsed time per iteration (s): 0.12 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.566443E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2088.933 | TFLOPs: 7.77 | 7: iteration 45110/ 173500 | consumed samples: 11548160 | consumed tokens: 23650631680 | elapsed time per iteration (s): 0.13 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.566865E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1933.484 | TFLOPs: 7.19 | 7: iteration 45120/ 173500 | consumed samples: 11550720 | consumed tokens: 23655874560 | elapsed time per iteration (s): 0.13 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.573108E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1970.082 | TFLOPs: 7.33 | 7: iteration 45130/ 173500 | consumed samples: 11553280 | consumed tokens: 23661117440 | elapsed time per iteration (s): 0.10 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.563400E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2649.317 | TFLOPs: 9.85 | 7: iteration 45140/ 173500 | consumed samples: 11555840 | consumed tokens: 23666360320 | elapsed time per iteration (s): 0.10 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.560037E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.234 | TFLOPs: 10.00 | 7: iteration 45150/ 173500 | consumed samples: 11558400 | consumed tokens: 23671603200 | elapsed time per iteration (s): 0.11 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.565811E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.901 | TFLOPs: 8.85 | 7: iteration 45160/ 173500 | consumed samples: 11560960 | consumed tokens: 23676846080 | elapsed time per iteration (s): 0.12 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.569737E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2192.315 | TFLOPs: 8.15 | 7: iteration 45170/ 173500 | consumed samples: 11563520 | consumed tokens: 23682088960 | elapsed time per iteration (s): 0.08 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.570660E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.429 | TFLOPs: 11.57 | 7: iteration 45180/ 173500 | consumed samples: 11566080 | consumed tokens: 23687331840 | elapsed time per iteration (s): 0.10 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 4.573888E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2692.616 | TFLOPs: 10.02 | 7: iteration 45190/ 173500 | consumed samples: 11568640 | consumed tokens: 23692574720 | elapsed time per iteration (s): 0.08 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.583013E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.102 | TFLOPs: 11.25 | 7: iteration 45200/ 173500 | consumed samples: 11571200 | consumed tokens: 23697817600 | elapsed time per iteration (s): 0.08 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.569496E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.905 | TFLOPs: 11.87 | 7: iteration 45210/ 173500 | consumed samples: 11573760 | consumed tokens: 23703060480 | elapsed time per iteration (s): 0.09 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.567788E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.248 | TFLOPs: 10.13 | 7: iteration 45220/ 173500 | consumed samples: 11576320 | consumed tokens: 23708303360 | elapsed time per iteration (s): 0.09 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.566891E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.842 | TFLOPs: 11.10 | 7: iteration 45230/ 173500 | consumed samples: 11578880 | consumed tokens: 23713546240 | elapsed time per iteration (s): 0.09 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.577440E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.712 | TFLOPs: 10.99 | 7: iteration 45240/ 173500 | consumed samples: 11581440 | consumed tokens: 23718789120 | elapsed time per iteration (s): 0.09 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.561268E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.116 | TFLOPs: 10.43 | 7: iteration 45250/ 173500 | consumed samples: 11584000 | consumed tokens: 23724032000 | elapsed time per iteration (s): 0.09 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.572363E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.719 | TFLOPs: 10.70 | 7: iteration 45260/ 173500 | consumed samples: 11586560 | consumed tokens: 23729274880 | elapsed time per iteration (s): 0.08 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 4.568476E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.457 | TFLOPs: 11.68 | 7: iteration 45270/ 173500 | consumed samples: 11589120 | consumed tokens: 23734517760 | elapsed time per iteration (s): 0.11 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.573101E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2417.157 | TFLOPs: 8.99 | 7: iteration 45280/ 173500 | consumed samples: 11591680 | consumed tokens: 23739760640 | elapsed time per iteration (s): 0.08 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.567834E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.083 | TFLOPs: 11.83 | 7: iteration 45290/ 173500 | consumed samples: 11594240 | consumed tokens: 23745003520 | elapsed time per iteration (s): 0.08 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.552014E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.454 | TFLOPs: 11.91 | 7: iteration 45300/ 173500 | consumed samples: 11596800 | consumed tokens: 23750246400 | elapsed time per iteration (s): 0.11 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.551789E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.970 | TFLOPs: 8.86 | 7: iteration 45310/ 173500 | consumed samples: 11599360 | consumed tokens: 23755489280 | elapsed time per iteration (s): 0.10 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.561690E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.662 | TFLOPs: 9.19 | 7: iteration 45320/ 173500 | consumed samples: 11601920 | consumed tokens: 23760732160 | elapsed time per iteration (s): 0.11 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.552610E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.226 | TFLOPs: 8.63 | 7: iteration 45330/ 173500 | consumed samples: 11604480 | consumed tokens: 23765975040 | elapsed time per iteration (s): 0.10 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.576551E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.192 | TFLOPs: 9.72 | 7: iteration 45340/ 173500 | consumed samples: 11607040 | consumed tokens: 23771217920 | elapsed time per iteration (s): 0.13 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.573338E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1967.649 | TFLOPs: 7.32 | 7: iteration 45350/ 173500 | consumed samples: 11609600 | consumed tokens: 23776460800 | elapsed time per iteration (s): 0.12 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 4.570396E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2120.196 | TFLOPs: 7.89 | 7: iteration 45360/ 173500 | consumed samples: 11612160 | consumed tokens: 23781703680 | elapsed time per iteration (s): 0.12 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.569946E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2072.405 | TFLOPs: 7.71 | 7: iteration 45370/ 173500 | consumed samples: 11614720 | consumed tokens: 23786946560 | elapsed time per iteration (s): 0.08 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.572984E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.974 | TFLOPs: 11.88 | 7: iteration 45380/ 173500 | consumed samples: 11617280 | consumed tokens: 23792189440 | elapsed time per iteration (s): 0.12 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.571227E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.414 | TFLOPs: 7.81 | 7: iteration 45390/ 173500 | consumed samples: 11619840 | consumed tokens: 23797432320 | elapsed time per iteration (s): 0.11 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.577274E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2435.233 | TFLOPs: 9.06 | 7: iteration 45400/ 173500 | consumed samples: 11622400 | consumed tokens: 23802675200 | elapsed time per iteration (s): 0.13 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.565419E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.698 | TFLOPs: 7.50 | 7: iteration 45410/ 173500 | consumed samples: 11624960 | consumed tokens: 23807918080 | elapsed time per iteration (s): 0.09 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.565422E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.753 | TFLOPs: 10.25 | 7: iteration 45420/ 173500 | consumed samples: 11627520 | consumed tokens: 23813160960 | elapsed time per iteration (s): 0.10 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.570101E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2632.274 | TFLOPs: 9.79 | 7: iteration 45430/ 173500 | consumed samples: 11630080 | consumed tokens: 23818403840 | elapsed time per iteration (s): 0.09 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 4.568056E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.171 | TFLOPs: 10.86 | 7: iteration 45440/ 173500 | consumed samples: 11632640 | consumed tokens: 23823646720 | elapsed time per iteration (s): 0.10 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.562459E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2610.267 | TFLOPs: 9.71 | 7: iteration 45450/ 173500 | consumed samples: 11635200 | consumed tokens: 23828889600 | elapsed time per iteration (s): 0.08 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.572027E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.319 | TFLOPs: 11.32 | 7: iteration 45460/ 173500 | consumed samples: 11637760 | consumed tokens: 23834132480 | elapsed time per iteration (s): 0.09 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.567934E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.712 | TFLOPs: 10.70 | 7: iteration 45470/ 173500 | consumed samples: 11640320 | consumed tokens: 23839375360 | elapsed time per iteration (s): 0.08 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.572621E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.309 | TFLOPs: 11.68 | 7: iteration 45480/ 173500 | consumed samples: 11642880 | consumed tokens: 23844618240 | elapsed time per iteration (s): 0.11 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.566570E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2435.344 | TFLOPs: 9.06 | 7: iteration 45490/ 173500 | consumed samples: 11645440 | consumed tokens: 23849861120 | elapsed time per iteration (s): 0.09 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.566613E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.639 | TFLOPs: 10.25 | 7: iteration 45500/ 173500 | consumed samples: 11648000 | consumed tokens: 23855104000 | elapsed time per iteration (s): 0.10 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.569476E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2656.796 | TFLOPs: 9.88 | 7: iteration 45510/ 173500 | consumed samples: 11650560 | consumed tokens: 23860346880 | elapsed time per iteration (s): 0.08 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.567542E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.848 | TFLOPs: 11.60 | 7: iteration 45520/ 173500 | consumed samples: 11653120 | consumed tokens: 23865589760 | elapsed time per iteration (s): 0.08 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 4.568446E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.734 | TFLOPs: 11.67 | 7: iteration 45530/ 173500 | consumed samples: 11655680 | consumed tokens: 23870832640 | elapsed time per iteration (s): 0.10 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.554735E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.367 | TFLOPs: 10.00 | 7: iteration 45540/ 173500 | consumed samples: 11658240 | consumed tokens: 23876075520 | elapsed time per iteration (s): 0.09 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.564320E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2773.116 | TFLOPs: 10.31 | 7: iteration 45550/ 173500 | consumed samples: 11660800 | consumed tokens: 23881318400 | elapsed time per iteration (s): 0.09 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.576632E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.625 | TFLOPs: 11.06 | 7: iteration 45560/ 173500 | consumed samples: 11663360 | consumed tokens: 23886561280 | elapsed time per iteration (s): 0.09 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.564401E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.887 | TFLOPs: 10.86 | 7: iteration 45570/ 173500 | consumed samples: 11665920 | consumed tokens: 23891804160 | elapsed time per iteration (s): 0.10 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.555420E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.948 | TFLOPs: 9.43 | 7: iteration 45580/ 173500 | consumed samples: 11668480 | consumed tokens: 23897047040 | elapsed time per iteration (s): 0.12 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.569151E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2089.154 | TFLOPs: 7.77 | 7: iteration 45590/ 173500 | consumed samples: 11671040 | consumed tokens: 23902289920 | elapsed time per iteration (s): 0.09 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.563897E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.790 | TFLOPs: 10.32 | 7: iteration 45600/ 173500 | consumed samples: 11673600 | consumed tokens: 23907532800 | elapsed time per iteration (s): 0.10 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 4.566605E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2617.363 | TFLOPs: 9.74 | 7: iteration 45610/ 173500 | consumed samples: 11676160 | consumed tokens: 23912775680 | elapsed time per iteration (s): 0.09 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.562392E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.264 | TFLOPs: 10.80 | 7: iteration 45620/ 173500 | consumed samples: 11678720 | consumed tokens: 23918018560 | elapsed time per iteration (s): 0.13 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.563263E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1959.780 | TFLOPs: 7.29 | 7: iteration 45630/ 173500 | consumed samples: 11681280 | consumed tokens: 23923261440 | elapsed time per iteration (s): 0.11 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.567286E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2431.400 | TFLOPs: 9.04 | 7: iteration 45640/ 173500 | consumed samples: 11683840 | consumed tokens: 23928504320 | elapsed time per iteration (s): 0.09 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.555779E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.077 | TFLOPs: 11.08 | 7: iteration 45650/ 173500 | consumed samples: 11686400 | consumed tokens: 23933747200 | elapsed time per iteration (s): 0.08 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.577379E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.971 | TFLOPs: 11.65 | 7: iteration 45660/ 173500 | consumed samples: 11688960 | consumed tokens: 23938990080 | elapsed time per iteration (s): 0.09 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.561396E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2854.274 | TFLOPs: 10.62 | 7: iteration 45670/ 173500 | consumed samples: 11691520 | consumed tokens: 23944232960 | elapsed time per iteration (s): 0.08 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.568109E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.417 | TFLOPs: 11.38 | 7: iteration 45680/ 173500 | consumed samples: 11694080 | consumed tokens: 23949475840 | elapsed time per iteration (s): 0.08 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 4.545468E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.219 | TFLOPs: 11.66 | 7: iteration 45690/ 173500 | consumed samples: 11696640 | consumed tokens: 23954718720 | elapsed time per iteration (s): 0.09 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.573762E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.234 | TFLOPs: 10.69 | 7: iteration 45700/ 173500 | consumed samples: 11699200 | consumed tokens: 23959961600 | elapsed time per iteration (s): 0.08 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.559662E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.100 | TFLOPs: 11.24 | 7: iteration 45710/ 173500 | consumed samples: 11701760 | consumed tokens: 23965204480 | elapsed time per iteration (s): 0.08 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.562044E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.390 | TFLOPs: 11.89 | 7: iteration 45720/ 173500 | consumed samples: 11704320 | consumed tokens: 23970447360 | elapsed time per iteration (s): 0.09 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.579096E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.085 | TFLOPs: 10.44 | 7: iteration 45730/ 173500 | consumed samples: 11706880 | consumed tokens: 23975690240 | elapsed time per iteration (s): 0.10 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.574466E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.496 | TFLOPs: 9.76 | 7: iteration 45740/ 173500 | consumed samples: 11709440 | consumed tokens: 23980933120 | elapsed time per iteration (s): 0.09 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.563534E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.745 | TFLOPs: 10.92 | 7: iteration 45750/ 173500 | consumed samples: 11712000 | consumed tokens: 23986176000 | elapsed time per iteration (s): 0.08 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.567072E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.805 | TFLOPs: 11.64 | 7: iteration 45760/ 173500 | consumed samples: 11714560 | consumed tokens: 23991418880 | elapsed time per iteration (s): 0.08 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.544988E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.539 | TFLOPs: 11.89 | 7: iteration 45770/ 173500 | consumed samples: 11717120 | consumed tokens: 23996661760 | elapsed time per iteration (s): 0.10 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 4.573123E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2611.729 | TFLOPs: 9.71 | 7: iteration 45780/ 173500 | consumed samples: 11719680 | consumed tokens: 24001904640 | elapsed time per iteration (s): 0.09 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.562630E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.325 | TFLOPs: 10.05 | 7: iteration 45790/ 173500 | consumed samples: 11722240 | consumed tokens: 24007147520 | elapsed time per iteration (s): 0.08 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.557152E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.729 | TFLOPs: 11.95 | 7: iteration 45800/ 173500 | consumed samples: 11724800 | consumed tokens: 24012390400 | elapsed time per iteration (s): 0.09 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.578540E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.002 | TFLOPs: 10.20 | 7: iteration 45810/ 173500 | consumed samples: 11727360 | consumed tokens: 24017633280 | elapsed time per iteration (s): 0.09 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.569003E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2786.282 | TFLOPs: 10.36 | 7: iteration 45820/ 173500 | consumed samples: 11729920 | consumed tokens: 24022876160 | elapsed time per iteration (s): 0.10 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.563134E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2619.790 | TFLOPs: 9.74 | 7: iteration 45830/ 173500 | consumed samples: 11732480 | consumed tokens: 24028119040 | elapsed time per iteration (s): 0.09 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.573137E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.033 | TFLOPs: 10.23 | 7: iteration 45840/ 173500 | consumed samples: 11735040 | consumed tokens: 24033361920 | elapsed time per iteration (s): 0.08 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.555824E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.657 | TFLOPs: 11.22 | 7: iteration 45850/ 173500 | consumed samples: 11737600 | consumed tokens: 24038604800 | elapsed time per iteration (s): 0.09 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 4.569736E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.935 | TFLOPs: 10.61 | 7: iteration 45860/ 173500 | consumed samples: 11740160 | consumed tokens: 24043847680 | elapsed time per iteration (s): 0.08 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.576129E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.094 | TFLOPs: 11.97 | 7: iteration 45870/ 173500 | consumed samples: 11742720 | consumed tokens: 24049090560 | elapsed time per iteration (s): 0.08 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.563414E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.254 | TFLOPs: 11.61 | 7: iteration 45880/ 173500 | consumed samples: 11745280 | consumed tokens: 24054333440 | elapsed time per iteration (s): 0.10 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.574970E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2585.303 | TFLOPs: 9.62 | 7: iteration 45890/ 173500 | consumed samples: 11747840 | consumed tokens: 24059576320 | elapsed time per iteration (s): 0.08 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.551603E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.280 | TFLOPs: 11.68 | 7: iteration 45900/ 173500 | consumed samples: 11750400 | consumed tokens: 24064819200 | elapsed time per iteration (s): 0.10 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.552797E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2557.753 | TFLOPs: 9.51 | 7: iteration 45910/ 173500 | consumed samples: 11752960 | consumed tokens: 24070062080 | elapsed time per iteration (s): 0.09 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.568402E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2740.423 | TFLOPs: 10.19 | 7: iteration 45920/ 173500 | consumed samples: 11755520 | consumed tokens: 24075304960 | elapsed time per iteration (s): 0.10 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.571407E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.267 | TFLOPs: 9.60 | 7: iteration 45930/ 173500 | consumed samples: 11758080 | consumed tokens: 24080547840 | elapsed time per iteration (s): 0.11 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.573112E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.703 | TFLOPs: 8.80 | 7: iteration 45940/ 173500 | consumed samples: 11760640 | consumed tokens: 24085790720 | elapsed time per iteration (s): 0.10 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 4.562348E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2523.013 | TFLOPs: 9.38 | 7: iteration 45950/ 173500 | consumed samples: 11763200 | consumed tokens: 24091033600 | elapsed time per iteration (s): 0.09 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.558347E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.383 | TFLOPs: 10.74 | 7: iteration 45960/ 173500 | consumed samples: 11765760 | consumed tokens: 24096276480 | elapsed time per iteration (s): 0.09 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.568345E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.991 | TFLOPs: 11.04 | 7: iteration 45970/ 173500 | consumed samples: 11768320 | consumed tokens: 24101519360 | elapsed time per iteration (s): 0.09 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.556240E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.071 | TFLOPs: 10.45 | 7: iteration 45980/ 173500 | consumed samples: 11770880 | consumed tokens: 24106762240 | elapsed time per iteration (s): 0.08 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.579230E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.374 | TFLOPs: 11.77 | 7: iteration 45990/ 173500 | consumed samples: 11773440 | consumed tokens: 24112005120 | elapsed time per iteration (s): 0.10 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.560126E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2599.341 | TFLOPs: 9.67 | 0: [2023-03-17 01:23:05,585] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=0, lr=[0.00017208047558447097, 0.00017208047558447097, 0.00017208047558447097], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 46000/ 173500 | consumed samples: 11776000 | consumed tokens: 24117248000 | elapsed time per iteration (s): 0.08 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.580461E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.982 | TFLOPs: 11.23 | 0: steps: 46000 loss: 4.5820 iter time (s): 0.093 samples/sec: 2765.000 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 46000 | lm loss value: 4.441301E+00 | lm loss PPL: 8.488529E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 46000 to checkpoints_14m91b100m 0: [2023-03-17 01:23:05,644] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step46000 is begin to save! 0: [2023-03-17 01:23:05,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:23:05,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:23:05,673] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:23:05,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:23:05,676] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:23:05,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:23:05,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:23:05,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:23:05,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:23:05,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:23:05,685] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:23:05,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:23:05,686] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step46000/mp_rank_00_model_states.pt 0: [2023-03-17 01:23:05,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:23:05,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:23:05,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:23:05,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:23:05,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:23:05,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2023-03-17 01:23:05,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2023-03-17 01:23:05,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:23:05,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:23:05,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: successfully saved checkpoint at iteration 46000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.88 7: iteration 46010/ 173500 | consumed samples: 11778560 | consumed tokens: 24122490880 | elapsed time per iteration (s): 0.10 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.550806E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2551.462 | TFLOPs: 9.49 | 7: iteration 46020/ 173500 | consumed samples: 11781120 | consumed tokens: 24127733760 | elapsed time per iteration (s): 0.08 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 4.571932E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.952 | TFLOPs: 11.98 | 7: iteration 46030/ 173500 | consumed samples: 11783680 | consumed tokens: 24132976640 | elapsed time per iteration (s): 0.08 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.578276E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.412 | TFLOPs: 12.02 | 7: iteration 46040/ 173500 | consumed samples: 11786240 | consumed tokens: 24138219520 | elapsed time per iteration (s): 0.09 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.568722E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.327 | TFLOPs: 10.74 | 7: iteration 46050/ 173500 | consumed samples: 11788800 | consumed tokens: 24143462400 | elapsed time per iteration (s): 0.08 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.546774E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.509 | TFLOPs: 11.50 | 7: iteration 46060/ 173500 | consumed samples: 11791360 | consumed tokens: 24148705280 | elapsed time per iteration (s): 0.08 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.566503E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.458 | TFLOPs: 11.83 | 7: iteration 46070/ 173500 | consumed samples: 11793920 | consumed tokens: 24153948160 | elapsed time per iteration (s): 0.08 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.568867E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.530 | TFLOPs: 11.85 | 7: iteration 46080/ 173500 | consumed samples: 11796480 | consumed tokens: 24159191040 | elapsed time per iteration (s): 0.08 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.575542E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.097 | TFLOPs: 11.90 | 7: iteration 46090/ 173500 | consumed samples: 11799040 | consumed tokens: 24164433920 | elapsed time per iteration (s): 0.10 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.575704E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2669.280 | TFLOPs: 9.93 | 7: iteration 46100/ 173500 | consumed samples: 11801600 | consumed tokens: 24169676800 | elapsed time per iteration (s): 0.09 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 4.564857E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.117 | TFLOPs: 10.82 | 7: iteration 46110/ 173500 | consumed samples: 11804160 | consumed tokens: 24174919680 | elapsed time per iteration (s): 0.12 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.569306E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2188.815 | TFLOPs: 8.14 | 7: iteration 46120/ 173500 | consumed samples: 11806720 | consumed tokens: 24180162560 | elapsed time per iteration (s): 0.09 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.575729E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.748 | TFLOPs: 10.33 | 7: iteration 46130/ 173500 | consumed samples: 11809280 | consumed tokens: 24185405440 | elapsed time per iteration (s): 0.09 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.566483E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.500 | TFLOPs: 10.09 | 7: iteration 46140/ 173500 | consumed samples: 11811840 | consumed tokens: 24190648320 | elapsed time per iteration (s): 0.08 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.579521E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.734 | TFLOPs: 12.01 | 7: iteration 46150/ 173500 | consumed samples: 11814400 | consumed tokens: 24195891200 | elapsed time per iteration (s): 0.08 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.571087E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.427 | TFLOPs: 11.96 | 7: iteration 46160/ 173500 | consumed samples: 11816960 | consumed tokens: 24201134080 | elapsed time per iteration (s): 0.09 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.570165E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.038 | TFLOPs: 10.65 | 7: iteration 46170/ 173500 | consumed samples: 11819520 | consumed tokens: 24206376960 | elapsed time per iteration (s): 0.08 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.563911E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.776 | TFLOPs: 11.89 | 7: iteration 46180/ 173500 | consumed samples: 11822080 | consumed tokens: 24211619840 | elapsed time per iteration (s): 0.09 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.560357E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.687 | TFLOPs: 10.04 | 7: iteration 46190/ 173500 | consumed samples: 11824640 | consumed tokens: 24216862720 | elapsed time per iteration (s): 0.13 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.560972E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.521 | TFLOPs: 7.49 | 7: iteration 46200/ 173500 | consumed samples: 11827200 | consumed tokens: 24222105600 | elapsed time per iteration (s): 0.10 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.560631E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.971 | TFLOPs: 9.41 | 7: iteration 46210/ 173500 | consumed samples: 11829760 | consumed tokens: 24227348480 | elapsed time per iteration (s): 0.12 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.567555E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2182.263 | TFLOPs: 8.12 | 7: iteration 46220/ 173500 | consumed samples: 11832320 | consumed tokens: 24232591360 | elapsed time per iteration (s): 0.10 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.569044E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2447.479 | TFLOPs: 9.10 | 7: iteration 46230/ 173500 | consumed samples: 11834880 | consumed tokens: 24237834240 | elapsed time per iteration (s): 0.08 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.564505E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.473 | TFLOPs: 11.56 | 7: iteration 46240/ 173500 | consumed samples: 11837440 | consumed tokens: 24243077120 | elapsed time per iteration (s): 0.10 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.559448E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.572 | TFLOPs: 9.64 | 7: iteration 46250/ 173500 | consumed samples: 11840000 | consumed tokens: 24248320000 | elapsed time per iteration (s): 0.11 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.564919E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2253.245 | TFLOPs: 8.38 | 7: iteration 46260/ 173500 | consumed samples: 11842560 | consumed tokens: 24253562880 | elapsed time per iteration (s): 0.08 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.557891E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.115 | TFLOPs: 11.37 | 7: iteration 46270/ 173500 | consumed samples: 11845120 | consumed tokens: 24258805760 | elapsed time per iteration (s): 0.08 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 4.573288E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.976 | TFLOPs: 12.02 | 7: iteration 46280/ 173500 | consumed samples: 11847680 | consumed tokens: 24264048640 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.558089E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.653 | TFLOPs: 11.96 | 7: iteration 46290/ 173500 | consumed samples: 11850240 | consumed tokens: 24269291520 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.576763E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.053 | TFLOPs: 11.91 | 7: iteration 46300/ 173500 | consumed samples: 11852800 | consumed tokens: 24274534400 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.580928E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.391 | TFLOPs: 11.99 | 7: iteration 46310/ 173500 | consumed samples: 11855360 | consumed tokens: 24279777280 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.575788E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.957 | TFLOPs: 11.98 | 7: iteration 46320/ 173500 | consumed samples: 11857920 | consumed tokens: 24285020160 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.545228E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.506 | TFLOPs: 11.92 | 7: iteration 46330/ 173500 | consumed samples: 11860480 | consumed tokens: 24290263040 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.567073E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.585 | TFLOPs: 11.95 | 7: iteration 46340/ 173500 | consumed samples: 11863040 | consumed tokens: 24295505920 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.578643E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.003 | TFLOPs: 11.93 | 7: iteration 46350/ 173500 | consumed samples: 11865600 | consumed tokens: 24300748800 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.565888E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.520 | TFLOPs: 11.90 | 7: iteration 46360/ 173500 | consumed samples: 11868160 | consumed tokens: 24305991680 | elapsed time per iteration (s): 0.08 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 4.562355E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.913 | TFLOPs: 11.98 | 7: iteration 46370/ 173500 | consumed samples: 11870720 | consumed tokens: 24311234560 | elapsed time per iteration (s): 0.09 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.564175E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.252 | TFLOPs: 11.06 | 7: iteration 46380/ 173500 | consumed samples: 11873280 | consumed tokens: 24316477440 | elapsed time per iteration (s): 0.09 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.571331E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.757 | TFLOPs: 10.04 | 7: iteration 46390/ 173500 | consumed samples: 11875840 | consumed tokens: 24321720320 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.574930E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.484 | TFLOPs: 11.23 | 7: iteration 46400/ 173500 | consumed samples: 11878400 | consumed tokens: 24326963200 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.571326E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.360 | TFLOPs: 11.92 | 7: iteration 46410/ 173500 | consumed samples: 11880960 | consumed tokens: 24332206080 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.567282E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.734 | TFLOPs: 11.95 | 7: iteration 46420/ 173500 | consumed samples: 11883520 | consumed tokens: 24337448960 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.546940E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.163 | TFLOPs: 11.47 | 7: iteration 46430/ 173500 | consumed samples: 11886080 | consumed tokens: 24342691840 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.568872E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.819 | TFLOPs: 11.69 | 7: iteration 46440/ 173500 | consumed samples: 11888640 | consumed tokens: 24347934720 | elapsed time per iteration (s): 0.08 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 4.558158E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.601 | TFLOPs: 11.93 | 7: iteration 46450/ 173500 | consumed samples: 11891200 | consumed tokens: 24353177600 | elapsed time per iteration (s): 0.09 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.560512E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.482 | TFLOPs: 10.51 | 7: iteration 46460/ 173500 | consumed samples: 11893760 | consumed tokens: 24358420480 | elapsed time per iteration (s): 0.11 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.558426E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.759 | TFLOPs: 8.78 | 7: iteration 46470/ 173500 | consumed samples: 11896320 | consumed tokens: 24363663360 | elapsed time per iteration (s): 0.08 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.572772E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.754 | TFLOPs: 11.55 | 7: iteration 46480/ 173500 | consumed samples: 11898880 | consumed tokens: 24368906240 | elapsed time per iteration (s): 0.12 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.571961E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.950 | TFLOPs: 7.89 | 7: iteration 46490/ 173500 | consumed samples: 11901440 | consumed tokens: 24374149120 | elapsed time per iteration (s): 0.14 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.558072E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1859.502 | TFLOPs: 6.92 | 7: iteration 46500/ 173500 | consumed samples: 11904000 | consumed tokens: 24379392000 | elapsed time per iteration (s): 0.11 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.554849E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2427.388 | TFLOPs: 9.03 | 7: iteration 46510/ 173500 | consumed samples: 11906560 | consumed tokens: 24384634880 | elapsed time per iteration (s): 0.10 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.549195E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.296 | TFLOPs: 9.56 | 7: iteration 46520/ 173500 | consumed samples: 11909120 | consumed tokens: 24389877760 | elapsed time per iteration (s): 0.11 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 4.570778E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.059 | TFLOPs: 8.98 | 7: iteration 46530/ 173500 | consumed samples: 11911680 | consumed tokens: 24395120640 | elapsed time per iteration (s): 0.10 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.566177E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2670.067 | TFLOPs: 9.93 | 7: iteration 46540/ 173500 | consumed samples: 11914240 | consumed tokens: 24400363520 | elapsed time per iteration (s): 0.10 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.569970E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.253 | TFLOPs: 9.47 | 7: iteration 46550/ 173500 | consumed samples: 11916800 | consumed tokens: 24405606400 | elapsed time per iteration (s): 0.10 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.574123E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.531 | TFLOPs: 9.52 | 7: iteration 46560/ 173500 | consumed samples: 11919360 | consumed tokens: 24410849280 | elapsed time per iteration (s): 0.10 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.567520E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.422 | TFLOPs: 9.83 | 7: iteration 46570/ 173500 | consumed samples: 11921920 | consumed tokens: 24416092160 | elapsed time per iteration (s): 0.14 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.564613E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1833.819 | TFLOPs: 6.82 | 7: iteration 46580/ 173500 | consumed samples: 11924480 | consumed tokens: 24421335040 | elapsed time per iteration (s): 0.13 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.558050E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1968.052 | TFLOPs: 7.32 | 7: iteration 46590/ 173500 | consumed samples: 11927040 | consumed tokens: 24426577920 | elapsed time per iteration (s): 0.12 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.560207E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2081.515 | TFLOPs: 7.74 | 7: iteration 46600/ 173500 | consumed samples: 11929600 | consumed tokens: 24431820800 | elapsed time per iteration (s): 0.08 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 4.567653E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.653 | TFLOPs: 11.96 | 7: iteration 46610/ 173500 | consumed samples: 11932160 | consumed tokens: 24437063680 | elapsed time per iteration (s): 0.08 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.569968E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.445 | TFLOPs: 11.76 | 7: iteration 46620/ 173500 | consumed samples: 11934720 | consumed tokens: 24442306560 | elapsed time per iteration (s): 0.09 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.570594E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.704 | TFLOPs: 11.08 | 7: iteration 46630/ 173500 | consumed samples: 11937280 | consumed tokens: 24447549440 | elapsed time per iteration (s): 0.09 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.573241E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.719 | TFLOPs: 10.38 | 7: iteration 46640/ 173500 | consumed samples: 11939840 | consumed tokens: 24452792320 | elapsed time per iteration (s): 0.09 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.566595E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.244 | TFLOPs: 10.93 | 7: iteration 46650/ 173500 | consumed samples: 11942400 | consumed tokens: 24458035200 | elapsed time per iteration (s): 0.08 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.555190E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.525 | TFLOPs: 11.88 | 7: iteration 46660/ 173500 | consumed samples: 11944960 | consumed tokens: 24463278080 | elapsed time per iteration (s): 0.09 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.553537E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.500 | TFLOPs: 10.96 | 7: iteration 46670/ 173500 | consumed samples: 11947520 | consumed tokens: 24468520960 | elapsed time per iteration (s): 0.10 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.561700E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.000 | TFLOPs: 9.96 | 7: iteration 46680/ 173500 | consumed samples: 11950080 | consumed tokens: 24473763840 | elapsed time per iteration (s): 0.10 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.571402E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2621.807 | TFLOPs: 9.75 | 7: iteration 46690/ 173500 | consumed samples: 11952640 | consumed tokens: 24479006720 | elapsed time per iteration (s): 0.10 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 4.559149E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2577.324 | TFLOPs: 9.59 | 7: iteration 46700/ 173500 | consumed samples: 11955200 | consumed tokens: 24484249600 | elapsed time per iteration (s): 0.08 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.558370E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.738 | TFLOPs: 11.62 | 7: iteration 46710/ 173500 | consumed samples: 11957760 | consumed tokens: 24489492480 | elapsed time per iteration (s): 0.09 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.567231E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.333 | TFLOPs: 11.19 | 7: iteration 46720/ 173500 | consumed samples: 11960320 | consumed tokens: 24494735360 | elapsed time per iteration (s): 0.08 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.559908E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.163 | TFLOPs: 11.36 | 7: iteration 46730/ 173500 | consumed samples: 11962880 | consumed tokens: 24499978240 | elapsed time per iteration (s): 0.09 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.553814E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.371 | TFLOPs: 10.74 | 7: iteration 46740/ 173500 | consumed samples: 11965440 | consumed tokens: 24505221120 | elapsed time per iteration (s): 0.09 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.577779E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.437 | TFLOPs: 11.18 | 7: iteration 46750/ 173500 | consumed samples: 11968000 | consumed tokens: 24510464000 | elapsed time per iteration (s): 0.08 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.563783E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.724 | TFLOPs: 11.96 | 7: iteration 46760/ 173500 | consumed samples: 11970560 | consumed tokens: 24515706880 | elapsed time per iteration (s): 0.08 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.555095E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.418 | TFLOPs: 11.94 | 7: iteration 46770/ 173500 | consumed samples: 11973120 | consumed tokens: 24520949760 | elapsed time per iteration (s): 0.08 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 4.572504E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.270 | TFLOPs: 11.99 | 7: iteration 46780/ 173500 | consumed samples: 11975680 | consumed tokens: 24526192640 | elapsed time per iteration (s): 0.09 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.561196E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.429 | TFLOPs: 10.44 | 7: iteration 46790/ 173500 | consumed samples: 11978240 | consumed tokens: 24531435520 | elapsed time per iteration (s): 0.08 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.558475E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.774 | TFLOPs: 11.33 | 7: iteration 46800/ 173500 | consumed samples: 11980800 | consumed tokens: 24536678400 | elapsed time per iteration (s): 0.08 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.573902E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.314 | TFLOPs: 11.89 | 7: iteration 46810/ 173500 | consumed samples: 11983360 | consumed tokens: 24541921280 | elapsed time per iteration (s): 0.08 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.568344E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.199 | TFLOPs: 11.88 | 7: iteration 46820/ 173500 | consumed samples: 11985920 | consumed tokens: 24547164160 | elapsed time per iteration (s): 0.09 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.568990E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2719.201 | TFLOPs: 10.11 | 7: iteration 46830/ 173500 | consumed samples: 11988480 | consumed tokens: 24552407040 | elapsed time per iteration (s): 0.08 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.576235E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.410 | TFLOPs: 11.22 | 7: iteration 46840/ 173500 | consumed samples: 11991040 | consumed tokens: 24557649920 | elapsed time per iteration (s): 0.09 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.568995E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.760 | TFLOPs: 10.38 | 7: iteration 46850/ 173500 | consumed samples: 11993600 | consumed tokens: 24562892800 | elapsed time per iteration (s): 0.11 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 4.550739E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2333.225 | TFLOPs: 8.68 | 7: iteration 46860/ 173500 | consumed samples: 11996160 | consumed tokens: 24568135680 | elapsed time per iteration (s): 0.12 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.555801E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2163.819 | TFLOPs: 8.05 | 7: iteration 46870/ 173500 | consumed samples: 11998720 | consumed tokens: 24573378560 | elapsed time per iteration (s): 0.09 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.565579E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.406 | TFLOPs: 11.12 | 7: iteration 46880/ 173500 | consumed samples: 12001280 | consumed tokens: 24578621440 | elapsed time per iteration (s): 0.08 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.572609E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.450 | TFLOPs: 11.98 | 7: iteration 46890/ 173500 | consumed samples: 12003840 | consumed tokens: 24583864320 | elapsed time per iteration (s): 0.08 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.569724E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.331 | TFLOPs: 11.95 | 7: iteration 46900/ 173500 | consumed samples: 12006400 | consumed tokens: 24589107200 | elapsed time per iteration (s): 0.08 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.568080E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.539 | TFLOPs: 12.03 | 7: iteration 46910/ 173500 | consumed samples: 12008960 | consumed tokens: 24594350080 | elapsed time per iteration (s): 0.09 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.569764E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.517 | TFLOPs: 11.00 | 7: iteration 46920/ 173500 | consumed samples: 12011520 | consumed tokens: 24599592960 | elapsed time per iteration (s): 0.09 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.551956E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2733.067 | TFLOPs: 10.17 | 7: iteration 46930/ 173500 | consumed samples: 12014080 | consumed tokens: 24604835840 | elapsed time per iteration (s): 0.10 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.563946E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.281 | TFLOPs: 10.00 | 7: iteration 46940/ 173500 | consumed samples: 12016640 | consumed tokens: 24610078720 | elapsed time per iteration (s): 0.08 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 4.568618E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.453 | TFLOPs: 11.66 | 7: iteration 46950/ 173500 | consumed samples: 12019200 | consumed tokens: 24615321600 | elapsed time per iteration (s): 0.08 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.584189E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.579 | TFLOPs: 11.66 | 7: iteration 46960/ 173500 | consumed samples: 12021760 | consumed tokens: 24620564480 | elapsed time per iteration (s): 0.09 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.543919E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.463 | TFLOPs: 10.39 | 7: iteration 46970/ 173500 | consumed samples: 12024320 | consumed tokens: 24625807360 | elapsed time per iteration (s): 0.09 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.571826E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.715 | TFLOPs: 10.21 | 7: iteration 46980/ 173500 | consumed samples: 12026880 | consumed tokens: 24631050240 | elapsed time per iteration (s): 0.08 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.569145E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.387 | TFLOPs: 11.53 | 7: iteration 46990/ 173500 | consumed samples: 12029440 | consumed tokens: 24636293120 | elapsed time per iteration (s): 0.08 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.568391E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.656 | TFLOPs: 11.96 | 7: iteration 47000/ 173500 | consumed samples: 12032000 | consumed tokens: 24641536000 | elapsed time per iteration (s): 0.10 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.558507E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2486.048 | TFLOPs: 9.25 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 47000 | lm loss value: 4.441710E+00 | lm loss PPL: 8.492007E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 47000 to checkpoints_14m91b100m 0: [2023-03-17 01:24:36,719] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step47000 is begin to save! 0: [2023-03-17 01:24:36,722] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:24:36,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:24:36,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:24:36,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:24:36,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:24:36,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:24:36,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:24:36,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:24:36,758] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:24:36,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:24:36,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:24:36,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:24:36,762] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step47000/mp_rank_00_model_states.pt 0: [2023-03-17 01:24:36,762] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:24:36,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:24:36,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:24:36,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:24:36,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2023-03-17 01:24:36,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 01:24:36,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:24:36,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2023-03-17 01:24:36,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:24:36,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2023-03-17 01:24:36,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:24:36,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:24:36,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: successfully saved checkpoint at iteration 47000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.62 7: iteration 47010/ 173500 | consumed samples: 12034560 | consumed tokens: 24646778880 | elapsed time per iteration (s): 0.11 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.566867E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2422.759 | TFLOPs: 9.01 | 7: iteration 47020/ 173500 | consumed samples: 12037120 | consumed tokens: 24652021760 | elapsed time per iteration (s): 0.08 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 4.558620E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.630 | TFLOPs: 11.80 | 7: iteration 47030/ 173500 | consumed samples: 12039680 | consumed tokens: 24657264640 | elapsed time per iteration (s): 0.08 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.563566E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.332 | TFLOPs: 11.97 | 7: iteration 47040/ 173500 | consumed samples: 12042240 | consumed tokens: 24662507520 | elapsed time per iteration (s): 0.08 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.564069E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.450 | TFLOPs: 11.50 | 7: iteration 47050/ 173500 | consumed samples: 12044800 | consumed tokens: 24667750400 | elapsed time per iteration (s): 0.09 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.582672E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.962 | TFLOPs: 10.14 | 7: iteration 47060/ 173500 | consumed samples: 12047360 | consumed tokens: 24672993280 | elapsed time per iteration (s): 0.09 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.562363E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.855 | TFLOPs: 10.22 | 7: iteration 47070/ 173500 | consumed samples: 12049920 | consumed tokens: 24678236160 | elapsed time per iteration (s): 0.09 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.559133E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.027 | TFLOPs: 10.74 | 7: iteration 47080/ 173500 | consumed samples: 12052480 | consumed tokens: 24683479040 | elapsed time per iteration (s): 0.08 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.562713E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.361 | TFLOPs: 11.84 | 7: iteration 47090/ 173500 | consumed samples: 12055040 | consumed tokens: 24688721920 | elapsed time per iteration (s): 0.09 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.571449E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.690 | TFLOPs: 11.16 | 7: iteration 47100/ 173500 | consumed samples: 12057600 | consumed tokens: 24693964800 | elapsed time per iteration (s): 0.08 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 4.556080E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.948 | TFLOPs: 11.88 | 7: iteration 47110/ 173500 | consumed samples: 12060160 | consumed tokens: 24699207680 | elapsed time per iteration (s): 0.09 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.548592E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.774 | TFLOPs: 11.20 | 7: iteration 47120/ 173500 | consumed samples: 12062720 | consumed tokens: 24704450560 | elapsed time per iteration (s): 0.09 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.551855E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.028 | TFLOPs: 11.17 | 7: iteration 47130/ 173500 | consumed samples: 12065280 | consumed tokens: 24709693440 | elapsed time per iteration (s): 0.08 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.571640E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.696 | TFLOPs: 11.66 | 7: iteration 47140/ 173500 | consumed samples: 12067840 | consumed tokens: 24714936320 | elapsed time per iteration (s): 0.08 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.572691E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.936 | TFLOPs: 11.84 | 7: iteration 47150/ 173500 | consumed samples: 12070400 | consumed tokens: 24720179200 | elapsed time per iteration (s): 0.08 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.555708E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.835 | TFLOPs: 11.92 | 7: iteration 47160/ 173500 | consumed samples: 12072960 | consumed tokens: 24725422080 | elapsed time per iteration (s): 0.10 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.569328E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2475.454 | TFLOPs: 9.21 | 7: iteration 47170/ 173500 | consumed samples: 12075520 | consumed tokens: 24730664960 | elapsed time per iteration (s): 0.12 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.560983E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2200.993 | TFLOPs: 8.19 | 7: iteration 47180/ 173500 | consumed samples: 12078080 | consumed tokens: 24735907840 | elapsed time per iteration (s): 0.11 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 4.566280E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.943 | TFLOPs: 8.79 | 7: iteration 47190/ 173500 | consumed samples: 12080640 | consumed tokens: 24741150720 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.560097E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.578 | TFLOPs: 8.78 | 7: iteration 47200/ 173500 | consumed samples: 12083200 | consumed tokens: 24746393600 | elapsed time per iteration (s): 0.13 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.559145E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2013.560 | TFLOPs: 7.49 | 7: iteration 47210/ 173500 | consumed samples: 12085760 | consumed tokens: 24751636480 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.553119E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2270.836 | TFLOPs: 8.45 | 7: iteration 47220/ 173500 | consumed samples: 12088320 | consumed tokens: 24756879360 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.565529E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2394.015 | TFLOPs: 8.90 | 7: iteration 47230/ 173500 | consumed samples: 12090880 | consumed tokens: 24762122240 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.578355E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.544 | TFLOPs: 8.66 | 7: iteration 47240/ 173500 | consumed samples: 12093440 | consumed tokens: 24767365120 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.562640E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2436.866 | TFLOPs: 9.06 | 7: iteration 47250/ 173500 | consumed samples: 12096000 | consumed tokens: 24772608000 | elapsed time per iteration (s): 0.11 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.560947E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2307.401 | TFLOPs: 8.58 | 7: iteration 47260/ 173500 | consumed samples: 12098560 | consumed tokens: 24777850880 | elapsed time per iteration (s): 0.12 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.540136E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.370 | TFLOPs: 8.22 | 7: iteration 47270/ 173500 | consumed samples: 12101120 | consumed tokens: 24783093760 | elapsed time per iteration (s): 0.12 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 4.566415E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2123.188 | TFLOPs: 7.90 | 7: iteration 47280/ 173500 | consumed samples: 12103680 | consumed tokens: 24788336640 | elapsed time per iteration (s): 0.12 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.570210E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2121.281 | TFLOPs: 7.89 | 7: iteration 47290/ 173500 | consumed samples: 12106240 | consumed tokens: 24793579520 | elapsed time per iteration (s): 0.12 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.556411E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2062.802 | TFLOPs: 7.67 | 7: iteration 47300/ 173500 | consumed samples: 12108800 | consumed tokens: 24798822400 | elapsed time per iteration (s): 0.11 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.564030E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2339.606 | TFLOPs: 8.70 | 7: iteration 47310/ 173500 | consumed samples: 12111360 | consumed tokens: 24804065280 | elapsed time per iteration (s): 0.11 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.571758E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2367.487 | TFLOPs: 8.81 | 7: iteration 47320/ 173500 | consumed samples: 12113920 | consumed tokens: 24809308160 | elapsed time per iteration (s): 0.13 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.553933E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2016.638 | TFLOPs: 7.50 | 7: iteration 47330/ 173500 | consumed samples: 12116480 | consumed tokens: 24814551040 | elapsed time per iteration (s): 0.13 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.567781E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.426 | TFLOPs: 7.35 | 7: iteration 47340/ 173500 | consumed samples: 12119040 | consumed tokens: 24819793920 | elapsed time per iteration (s): 0.12 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.566603E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.857 | TFLOPs: 8.04 | 7: iteration 47350/ 173500 | consumed samples: 12121600 | consumed tokens: 24825036800 | elapsed time per iteration (s): 0.12 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 4.576787E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2126.324 | TFLOPs: 7.91 | 7: iteration 47360/ 173500 | consumed samples: 12124160 | consumed tokens: 24830279680 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.569172E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.129 | TFLOPs: 11.97 | 7: iteration 47370/ 173500 | consumed samples: 12126720 | consumed tokens: 24835522560 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.558294E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.981 | TFLOPs: 11.84 | 7: iteration 47380/ 173500 | consumed samples: 12129280 | consumed tokens: 24840765440 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.561476E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.985 | TFLOPs: 11.94 | 7: iteration 47390/ 173500 | consumed samples: 12131840 | consumed tokens: 24846008320 | elapsed time per iteration (s): 0.09 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.558348E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.630 | TFLOPs: 10.24 | 7: iteration 47400/ 173500 | consumed samples: 12134400 | consumed tokens: 24851251200 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.554486E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.891 | TFLOPs: 11.98 | 7: iteration 47410/ 173500 | consumed samples: 12136960 | consumed tokens: 24856494080 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.564629E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.274 | TFLOPs: 11.90 | 7: iteration 47420/ 173500 | consumed samples: 12139520 | consumed tokens: 24861736960 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.573247E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.635 | TFLOPs: 11.91 | 7: iteration 47430/ 173500 | consumed samples: 12142080 | consumed tokens: 24866979840 | elapsed time per iteration (s): 0.08 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 4.562457E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.876 | TFLOPs: 11.82 | 7: iteration 47440/ 173500 | consumed samples: 12144640 | consumed tokens: 24872222720 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.537714E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.076 | TFLOPs: 11.99 | 7: iteration 47450/ 173500 | consumed samples: 12147200 | consumed tokens: 24877465600 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.553684E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.218 | TFLOPs: 11.93 | 7: iteration 47460/ 173500 | consumed samples: 12149760 | consumed tokens: 24882708480 | elapsed time per iteration (s): 0.10 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.561242E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2653.384 | TFLOPs: 9.87 | 7: iteration 47470/ 173500 | consumed samples: 12152320 | consumed tokens: 24887951360 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.569688E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.410 | TFLOPs: 11.99 | 7: iteration 47480/ 173500 | consumed samples: 12154880 | consumed tokens: 24893194240 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.554706E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.552 | TFLOPs: 11.99 | 7: iteration 47490/ 173500 | consumed samples: 12157440 | consumed tokens: 24898437120 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.569230E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.397 | TFLOPs: 11.84 | 7: iteration 47500/ 173500 | consumed samples: 12160000 | consumed tokens: 24903680000 | elapsed time per iteration (s): 0.09 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.571170E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.082 | TFLOPs: 11.19 | 7: iteration 47510/ 173500 | consumed samples: 12162560 | consumed tokens: 24908922880 | elapsed time per iteration (s): 0.08 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 4.557942E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.302 | TFLOPs: 11.99 | 7: iteration 47520/ 173500 | consumed samples: 12165120 | consumed tokens: 24914165760 | elapsed time per iteration (s): 0.09 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.562066E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.030 | TFLOPs: 10.35 | 7: iteration 47530/ 173500 | consumed samples: 12167680 | consumed tokens: 24919408640 | elapsed time per iteration (s): 0.08 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.572994E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.761 | TFLOPs: 11.80 | 7: iteration 47540/ 173500 | consumed samples: 12170240 | consumed tokens: 24924651520 | elapsed time per iteration (s): 0.08 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.555923E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.301 | TFLOPs: 11.87 | 7: iteration 47550/ 173500 | consumed samples: 12172800 | consumed tokens: 24929894400 | elapsed time per iteration (s): 0.09 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.554856E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.547 | TFLOPs: 10.87 | 7: iteration 47560/ 173500 | consumed samples: 12175360 | consumed tokens: 24935137280 | elapsed time per iteration (s): 0.10 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.562755E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.821 | TFLOPs: 9.22 | 7: iteration 47570/ 173500 | consumed samples: 12177920 | consumed tokens: 24940380160 | elapsed time per iteration (s): 0.08 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.560966E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.452 | TFLOPs: 11.86 | 7: iteration 47580/ 173500 | consumed samples: 12180480 | consumed tokens: 24945623040 | elapsed time per iteration (s): 0.08 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.559768E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.094 | TFLOPs: 11.82 | 7: iteration 47590/ 173500 | consumed samples: 12183040 | consumed tokens: 24950865920 | elapsed time per iteration (s): 0.08 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 4.554554E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.236 | TFLOPs: 11.83 | 7: iteration 47600/ 173500 | consumed samples: 12185600 | consumed tokens: 24956108800 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.578611E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.269 | TFLOPs: 11.81 | 7: iteration 47610/ 173500 | consumed samples: 12188160 | consumed tokens: 24961351680 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.561038E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.709 | TFLOPs: 11.83 | 7: iteration 47620/ 173500 | consumed samples: 12190720 | consumed tokens: 24966594560 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.557201E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.678 | TFLOPs: 11.72 | 7: iteration 47630/ 173500 | consumed samples: 12193280 | consumed tokens: 24971837440 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.572926E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.942 | TFLOPs: 11.81 | 7: iteration 47640/ 173500 | consumed samples: 12195840 | consumed tokens: 24977080320 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.561636E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.421 | TFLOPs: 11.78 | 7: iteration 47650/ 173500 | consumed samples: 12198400 | consumed tokens: 24982323200 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.563657E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.785 | TFLOPs: 11.85 | 7: iteration 47660/ 173500 | consumed samples: 12200960 | consumed tokens: 24987566080 | elapsed time per iteration (s): 0.08 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.563107E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.725 | TFLOPs: 11.82 | 7: iteration 47670/ 173500 | consumed samples: 12203520 | consumed tokens: 24992808960 | elapsed time per iteration (s): 0.09 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.558300E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.217 | TFLOPs: 11.18 | 7: iteration 47680/ 173500 | consumed samples: 12206080 | consumed tokens: 24998051840 | elapsed time per iteration (s): 0.08 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.559846E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.933 | TFLOPs: 11.51 | 7: iteration 47690/ 173500 | consumed samples: 12208640 | consumed tokens: 25003294720 | elapsed time per iteration (s): 0.08 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.563259E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.054 | TFLOPs: 11.84 | 7: iteration 47700/ 173500 | consumed samples: 12211200 | consumed tokens: 25008537600 | elapsed time per iteration (s): 0.09 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.562733E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.229 | TFLOPs: 10.04 | 7: iteration 47710/ 173500 | consumed samples: 12213760 | consumed tokens: 25013780480 | elapsed time per iteration (s): 0.13 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.563549E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.794 | TFLOPs: 7.49 | 7: iteration 47720/ 173500 | consumed samples: 12216320 | consumed tokens: 25019023360 | elapsed time per iteration (s): 0.13 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.559130E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.573 | TFLOPs: 7.53 | 7: iteration 47730/ 173500 | consumed samples: 12218880 | consumed tokens: 25024266240 | elapsed time per iteration (s): 0.13 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.551584E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.650 | TFLOPs: 7.61 | 7: iteration 47740/ 173500 | consumed samples: 12221440 | consumed tokens: 25029509120 | elapsed time per iteration (s): 0.12 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.569216E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.599 | TFLOPs: 7.81 | 7: iteration 47750/ 173500 | consumed samples: 12224000 | consumed tokens: 25034752000 | elapsed time per iteration (s): 0.13 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.573435E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.197 | TFLOPs: 7.24 | 7: iteration 47760/ 173500 | consumed samples: 12226560 | consumed tokens: 25039994880 | elapsed time per iteration (s): 0.13 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 4.559201E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.572 | TFLOPs: 7.39 | 7: iteration 47770/ 173500 | consumed samples: 12229120 | consumed tokens: 25045237760 | elapsed time per iteration (s): 0.12 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.560396E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2225.387 | TFLOPs: 8.28 | 7: iteration 47780/ 173500 | consumed samples: 12231680 | consumed tokens: 25050480640 | elapsed time per iteration (s): 0.10 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.563891E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.034 | TFLOPs: 9.12 | 7: iteration 47790/ 173500 | consumed samples: 12234240 | consumed tokens: 25055723520 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.563155E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.884 | TFLOPs: 8.53 | 7: iteration 47800/ 173500 | consumed samples: 12236800 | consumed tokens: 25060966400 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.564297E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.196 | TFLOPs: 8.88 | 7: iteration 47810/ 173500 | consumed samples: 12239360 | consumed tokens: 25066209280 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.565501E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.035 | TFLOPs: 8.95 | 7: iteration 47820/ 173500 | consumed samples: 12241920 | consumed tokens: 25071452160 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.564365E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2230.023 | TFLOPs: 8.29 | 7: iteration 47830/ 173500 | consumed samples: 12244480 | consumed tokens: 25076695040 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.551552E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2362.908 | TFLOPs: 8.79 | 7: iteration 47840/ 173500 | consumed samples: 12247040 | consumed tokens: 25081937920 | elapsed time per iteration (s): 0.11 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 4.574648E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.242 | TFLOPs: 8.91 | 7: iteration 47850/ 173500 | consumed samples: 12249600 | consumed tokens: 25087180800 | elapsed time per iteration (s): 0.11 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.562899E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.343 | TFLOPs: 8.82 | 7: iteration 47860/ 173500 | consumed samples: 12252160 | consumed tokens: 25092423680 | elapsed time per iteration (s): 0.11 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.551458E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2349.580 | TFLOPs: 8.74 | 7: iteration 47870/ 173500 | consumed samples: 12254720 | consumed tokens: 25097666560 | elapsed time per iteration (s): 0.09 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.576133E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2951.165 | TFLOPs: 10.98 | 7: iteration 47880/ 173500 | consumed samples: 12257280 | consumed tokens: 25102909440 | elapsed time per iteration (s): 0.08 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.559468E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.239 | TFLOPs: 11.77 | 7: iteration 47890/ 173500 | consumed samples: 12259840 | consumed tokens: 25108152320 | elapsed time per iteration (s): 0.08 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.564644E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.404 | TFLOPs: 11.74 | 7: iteration 47900/ 173500 | consumed samples: 12262400 | consumed tokens: 25113395200 | elapsed time per iteration (s): 0.08 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.553041E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.606 | TFLOPs: 11.73 | 7: iteration 47910/ 173500 | consumed samples: 12264960 | consumed tokens: 25118638080 | elapsed time per iteration (s): 0.08 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.562826E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.177 | TFLOPs: 11.85 | 7: iteration 47920/ 173500 | consumed samples: 12267520 | consumed tokens: 25123880960 | elapsed time per iteration (s): 0.08 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 4.549101E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.775 | TFLOPs: 11.90 | 7: iteration 47930/ 173500 | consumed samples: 12270080 | consumed tokens: 25129123840 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.562229E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.686 | TFLOPs: 11.88 | 7: iteration 47940/ 173500 | consumed samples: 12272640 | consumed tokens: 25134366720 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.555828E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.053 | TFLOPs: 11.83 | 7: iteration 47950/ 173500 | consumed samples: 12275200 | consumed tokens: 25139609600 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.570173E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.085 | TFLOPs: 11.82 | 7: iteration 47960/ 173500 | consumed samples: 12277760 | consumed tokens: 25144852480 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.558211E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.469 | TFLOPs: 11.86 | 7: iteration 47970/ 173500 | consumed samples: 12280320 | consumed tokens: 25150095360 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.566167E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.328 | TFLOPs: 11.89 | 7: iteration 47980/ 173500 | consumed samples: 12282880 | consumed tokens: 25155338240 | elapsed time per iteration (s): 0.11 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.548082E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.272 | TFLOPs: 8.98 | 7: iteration 47990/ 173500 | consumed samples: 12285440 | consumed tokens: 25160581120 | elapsed time per iteration (s): 0.10 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.553212E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2610.367 | TFLOPs: 9.71 | 0: [2023-03-17 01:26:11,883] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=0, lr=[0.00016965587057872074, 0.00016965587057872074, 0.00016965587057872074], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 48000/ 173500 | consumed samples: 12288000 | consumed tokens: 25165824000 | elapsed time per iteration (s): 0.08 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 4.549684E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.619 | TFLOPs: 11.88 | 0: steps: 48000 loss: 4.5276 iter time (s): 0.092 samples/sec: 2792.208 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 48000 | lm loss value: 4.483655E+00 | lm loss PPL: 8.855776E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 48000 to checkpoints_14m91b100m 0: [2023-03-17 01:26:11,940] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step48000 is begin to save! 0: [2023-03-17 01:26:11,944] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:26:11,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:26:11,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:26:11,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:26:11,973] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:26:11,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:26:11,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:26:11,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:26:11,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:26:11,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:26:11,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:26:11,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:26:11,983] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step48000/mp_rank_00_model_states.pt 0: [2023-03-17 01:26:11,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:26:11,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:26:12,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:26:12,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:26:12,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2023-03-17 01:26:12,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2023-03-17 01:26:12,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:26:12,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: successfully saved checkpoint at iteration 48000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.53 7: iteration 48010/ 173500 | consumed samples: 12290560 | consumed tokens: 25171066880 | elapsed time per iteration (s): 0.11 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.553007E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2301.347 | TFLOPs: 8.56 | 7: iteration 48020/ 173500 | consumed samples: 12293120 | consumed tokens: 25176309760 | elapsed time per iteration (s): 0.09 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.556687E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.444 | TFLOPs: 11.20 | 7: iteration 48030/ 173500 | consumed samples: 12295680 | consumed tokens: 25181552640 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.550931E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.668 | TFLOPs: 11.65 | 7: iteration 48040/ 173500 | consumed samples: 12298240 | consumed tokens: 25186795520 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.547283E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.178 | TFLOPs: 11.80 | 7: iteration 48050/ 173500 | consumed samples: 12300800 | consumed tokens: 25192038400 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.571388E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.648 | TFLOPs: 11.85 | 7: iteration 48060/ 173500 | consumed samples: 12303360 | consumed tokens: 25197281280 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.553677E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.412 | TFLOPs: 11.90 | 7: iteration 48070/ 173500 | consumed samples: 12305920 | consumed tokens: 25202524160 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.543344E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.100 | TFLOPs: 11.90 | 7: iteration 48080/ 173500 | consumed samples: 12308480 | consumed tokens: 25207767040 | elapsed time per iteration (s): 0.08 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 4.570140E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.685 | TFLOPs: 11.83 | 7: iteration 48090/ 173500 | consumed samples: 12311040 | consumed tokens: 25213009920 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.555218E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.115 | TFLOPs: 11.88 | 7: iteration 48100/ 173500 | consumed samples: 12313600 | consumed tokens: 25218252800 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.578308E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.586 | TFLOPs: 11.84 | 7: iteration 48110/ 173500 | consumed samples: 12316160 | consumed tokens: 25223495680 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.563088E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.478 | TFLOPs: 11.85 | 7: iteration 48120/ 173500 | consumed samples: 12318720 | consumed tokens: 25228738560 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.569199E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.350 | TFLOPs: 11.85 | 7: iteration 48130/ 173500 | consumed samples: 12321280 | consumed tokens: 25233981440 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.554806E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.354 | TFLOPs: 11.84 | 7: iteration 48140/ 173500 | consumed samples: 12323840 | consumed tokens: 25239224320 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.560595E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.550 | TFLOPs: 11.87 | 7: iteration 48150/ 173500 | consumed samples: 12326400 | consumed tokens: 25244467200 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.560091E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.459 | TFLOPs: 11.79 | 7: iteration 48160/ 173500 | consumed samples: 12328960 | consumed tokens: 25249710080 | elapsed time per iteration (s): 0.08 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 4.560768E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.542 | TFLOPs: 11.85 | 7: iteration 48170/ 173500 | consumed samples: 12331520 | consumed tokens: 25254952960 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.550426E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.568 | TFLOPs: 11.88 | 7: iteration 48180/ 173500 | consumed samples: 12334080 | consumed tokens: 25260195840 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.567459E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.570 | TFLOPs: 11.87 | 7: iteration 48190/ 173500 | consumed samples: 12336640 | consumed tokens: 25265438720 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.565545E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.000 | TFLOPs: 11.87 | 7: iteration 48200/ 173500 | consumed samples: 12339200 | consumed tokens: 25270681600 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.567966E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.552 | TFLOPs: 11.88 | 7: iteration 48210/ 173500 | consumed samples: 12341760 | consumed tokens: 25275924480 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.557899E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.011 | TFLOPs: 11.87 | 7: iteration 48220/ 173500 | consumed samples: 12344320 | consumed tokens: 25281167360 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.579075E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.468 | TFLOPs: 11.82 | 7: iteration 48230/ 173500 | consumed samples: 12346880 | consumed tokens: 25286410240 | elapsed time per iteration (s): 0.09 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.566244E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.978 | TFLOPs: 10.32 | 7: iteration 48240/ 173500 | consumed samples: 12349440 | consumed tokens: 25291653120 | elapsed time per iteration (s): 0.08 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 4.563023E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.189 | TFLOPs: 11.83 | 7: iteration 48250/ 173500 | consumed samples: 12352000 | consumed tokens: 25296896000 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.567438E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.905 | TFLOPs: 11.90 | 7: iteration 48260/ 173500 | consumed samples: 12354560 | consumed tokens: 25302138880 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.553763E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.265 | TFLOPs: 11.91 | 7: iteration 48270/ 173500 | consumed samples: 12357120 | consumed tokens: 25307381760 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.561433E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.129 | TFLOPs: 11.89 | 7: iteration 48280/ 173500 | consumed samples: 12359680 | consumed tokens: 25312624640 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.546228E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.968 | TFLOPs: 11.85 | 7: iteration 48290/ 173500 | consumed samples: 12362240 | consumed tokens: 25317867520 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.562346E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.012 | TFLOPs: 11.90 | 7: iteration 48300/ 173500 | consumed samples: 12364800 | consumed tokens: 25323110400 | elapsed time per iteration (s): 0.12 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.555792E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2210.469 | TFLOPs: 8.22 | 7: iteration 48310/ 173500 | consumed samples: 12367360 | consumed tokens: 25328353280 | elapsed time per iteration (s): 0.10 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.569435E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2673.358 | TFLOPs: 9.94 | 7: iteration 48320/ 173500 | consumed samples: 12369920 | consumed tokens: 25333596160 | elapsed time per iteration (s): 0.08 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 4.564969E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.378 | TFLOPs: 11.73 | 7: iteration 48330/ 173500 | consumed samples: 12372480 | consumed tokens: 25338839040 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.541014E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.621 | TFLOPs: 11.89 | 7: iteration 48340/ 173500 | consumed samples: 12375040 | consumed tokens: 25344081920 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.566937E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.146 | TFLOPs: 11.90 | 7: iteration 48350/ 173500 | consumed samples: 12377600 | consumed tokens: 25349324800 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.572470E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.012 | TFLOPs: 11.83 | 7: iteration 48360/ 173500 | consumed samples: 12380160 | consumed tokens: 25354567680 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.572581E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.973 | TFLOPs: 11.78 | 7: iteration 48370/ 173500 | consumed samples: 12382720 | consumed tokens: 25359810560 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.564822E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.637 | TFLOPs: 11.86 | 7: iteration 48380/ 173500 | consumed samples: 12385280 | consumed tokens: 25365053440 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.570007E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.333 | TFLOPs: 11.89 | 7: iteration 48390/ 173500 | consumed samples: 12387840 | consumed tokens: 25370296320 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.548899E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.617 | TFLOPs: 11.80 | 7: iteration 48400/ 173500 | consumed samples: 12390400 | consumed tokens: 25375539200 | elapsed time per iteration (s): 0.08 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 4.557099E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.625 | TFLOPs: 11.87 | 7: iteration 48410/ 173500 | consumed samples: 12392960 | consumed tokens: 25380782080 | elapsed time per iteration (s): 0.10 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.551857E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.062 | TFLOPs: 9.60 | 7: iteration 48420/ 173500 | consumed samples: 12395520 | consumed tokens: 25386024960 | elapsed time per iteration (s): 0.10 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.556492E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2653.630 | TFLOPs: 9.87 | 7: iteration 48430/ 173500 | consumed samples: 12398080 | consumed tokens: 25391267840 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.557651E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.225 | TFLOPs: 11.88 | 7: iteration 48440/ 173500 | consumed samples: 12400640 | consumed tokens: 25396510720 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.567428E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.504 | TFLOPs: 11.90 | 7: iteration 48450/ 173500 | consumed samples: 12403200 | consumed tokens: 25401753600 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.568465E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.318 | TFLOPs: 11.84 | 7: iteration 48460/ 173500 | consumed samples: 12405760 | consumed tokens: 25406996480 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.566638E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.108 | TFLOPs: 11.86 | 7: iteration 48470/ 173500 | consumed samples: 12408320 | consumed tokens: 25412239360 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.558129E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.750 | TFLOPs: 11.88 | 7: iteration 48480/ 173500 | consumed samples: 12410880 | consumed tokens: 25417482240 | elapsed time per iteration (s): 0.08 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 4.561245E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.665 | TFLOPs: 11.90 | 7: iteration 48490/ 173500 | consumed samples: 12413440 | consumed tokens: 25422725120 | elapsed time per iteration (s): 0.08 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.552448E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.610 | TFLOPs: 11.89 | 7: iteration 48500/ 173500 | consumed samples: 12416000 | consumed tokens: 25427968000 | elapsed time per iteration (s): 0.13 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.563968E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1983.656 | TFLOPs: 7.38 | 7: iteration 48510/ 173500 | consumed samples: 12418560 | consumed tokens: 25433210880 | elapsed time per iteration (s): 0.13 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.571149E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1992.486 | TFLOPs: 7.41 | 7: iteration 48520/ 173500 | consumed samples: 12421120 | consumed tokens: 25438453760 | elapsed time per iteration (s): 0.08 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.562749E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.815 | TFLOPs: 11.44 | 7: iteration 48530/ 173500 | consumed samples: 12423680 | consumed tokens: 25443696640 | elapsed time per iteration (s): 0.08 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.571571E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.732 | TFLOPs: 11.73 | 7: iteration 48540/ 173500 | consumed samples: 12426240 | consumed tokens: 25448939520 | elapsed time per iteration (s): 0.11 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.562535E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2355.940 | TFLOPs: 8.76 | 7: iteration 48550/ 173500 | consumed samples: 12428800 | consumed tokens: 25454182400 | elapsed time per iteration (s): 0.13 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.556497E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.619 | TFLOPs: 7.61 | 7: iteration 48560/ 173500 | consumed samples: 12431360 | consumed tokens: 25459425280 | elapsed time per iteration (s): 0.13 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.557908E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.957 | TFLOPs: 7.42 | 7: iteration 48570/ 173500 | consumed samples: 12433920 | consumed tokens: 25464668160 | elapsed time per iteration (s): 0.13 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 4.562497E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1946.584 | TFLOPs: 7.24 | 7: iteration 48580/ 173500 | consumed samples: 12436480 | consumed tokens: 25469911040 | elapsed time per iteration (s): 0.11 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.554635E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2240.663 | TFLOPs: 8.33 | 7: iteration 48590/ 173500 | consumed samples: 12439040 | consumed tokens: 25475153920 | elapsed time per iteration (s): 0.13 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.553001E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2026.198 | TFLOPs: 7.54 | 7: iteration 48600/ 173500 | consumed samples: 12441600 | consumed tokens: 25480396800 | elapsed time per iteration (s): 0.12 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.572445E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2102.221 | TFLOPs: 7.82 | 7: iteration 48610/ 173500 | consumed samples: 12444160 | consumed tokens: 25485639680 | elapsed time per iteration (s): 0.13 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.558281E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1904.611 | TFLOPs: 7.08 | 7: iteration 48620/ 173500 | consumed samples: 12446720 | consumed tokens: 25490882560 | elapsed time per iteration (s): 0.12 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.561800E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.966 | TFLOPs: 8.21 | 7: iteration 48630/ 173500 | consumed samples: 12449280 | consumed tokens: 25496125440 | elapsed time per iteration (s): 0.12 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.554518E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.781 | TFLOPs: 8.21 | 7: iteration 48640/ 173500 | consumed samples: 12451840 | consumed tokens: 25501368320 | elapsed time per iteration (s): 0.10 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.558558E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.190 | TFLOPs: 9.12 | 7: iteration 48650/ 173500 | consumed samples: 12454400 | consumed tokens: 25506611200 | elapsed time per iteration (s): 0.12 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 4.568570E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.746 | TFLOPs: 7.99 | 7: iteration 48660/ 173500 | consumed samples: 12456960 | consumed tokens: 25511854080 | elapsed time per iteration (s): 0.13 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.551321E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1944.804 | TFLOPs: 7.23 | 7: iteration 48670/ 173500 | consumed samples: 12459520 | consumed tokens: 25517096960 | elapsed time per iteration (s): 0.12 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.574731E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.877 | TFLOPs: 7.99 | 7: iteration 48680/ 173500 | consumed samples: 12462080 | consumed tokens: 25522339840 | elapsed time per iteration (s): 0.09 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.551246E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.176 | TFLOPs: 10.24 | 7: iteration 48690/ 173500 | consumed samples: 12464640 | consumed tokens: 25527582720 | elapsed time per iteration (s): 0.08 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.571938E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.316 | TFLOPs: 11.87 | 7: iteration 48700/ 173500 | consumed samples: 12467200 | consumed tokens: 25532825600 | elapsed time per iteration (s): 0.08 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.562395E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.424 | TFLOPs: 11.91 | 7: iteration 48710/ 173500 | consumed samples: 12469760 | consumed tokens: 25538068480 | elapsed time per iteration (s): 0.08 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.560152E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.504 | TFLOPs: 11.87 | 7: iteration 48720/ 173500 | consumed samples: 12472320 | consumed tokens: 25543311360 | elapsed time per iteration (s): 0.08 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.574126E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.669 | TFLOPs: 11.77 | 7: iteration 48730/ 173500 | consumed samples: 12474880 | consumed tokens: 25548554240 | elapsed time per iteration (s): 0.08 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 4.561498E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.029 | TFLOPs: 11.88 | 7: iteration 48740/ 173500 | consumed samples: 12477440 | consumed tokens: 25553797120 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.559974E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.570 | TFLOPs: 11.91 | 7: iteration 48750/ 173500 | consumed samples: 12480000 | consumed tokens: 25559040000 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.556159E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.033 | TFLOPs: 11.96 | 7: iteration 48760/ 173500 | consumed samples: 12482560 | consumed tokens: 25564282880 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.570383E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.361 | TFLOPs: 11.95 | 7: iteration 48770/ 173500 | consumed samples: 12485120 | consumed tokens: 25569525760 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.561381E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.970 | TFLOPs: 11.88 | 7: iteration 48780/ 173500 | consumed samples: 12487680 | consumed tokens: 25574768640 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.565717E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.500 | TFLOPs: 11.67 | 7: iteration 48790/ 173500 | consumed samples: 12490240 | consumed tokens: 25580011520 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.559063E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.956 | TFLOPs: 11.93 | 7: iteration 48800/ 173500 | consumed samples: 12492800 | consumed tokens: 25585254400 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.553432E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.364 | TFLOPs: 11.56 | 7: iteration 48810/ 173500 | consumed samples: 12495360 | consumed tokens: 25590497280 | elapsed time per iteration (s): 0.08 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 4.562185E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.031 | TFLOPs: 11.93 | 7: iteration 48820/ 173500 | consumed samples: 12497920 | consumed tokens: 25595740160 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.556840E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.000 | TFLOPs: 11.95 | 7: iteration 48830/ 173500 | consumed samples: 12500480 | consumed tokens: 25600983040 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.567299E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.980 | TFLOPs: 11.88 | 7: iteration 48840/ 173500 | consumed samples: 12503040 | consumed tokens: 25606225920 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.562335E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.379 | TFLOPs: 11.90 | 7: iteration 48850/ 173500 | consumed samples: 12505600 | consumed tokens: 25611468800 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.557179E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.505 | TFLOPs: 11.88 | 7: iteration 48860/ 173500 | consumed samples: 12508160 | consumed tokens: 25616711680 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.564981E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.859 | TFLOPs: 11.75 | 7: iteration 48870/ 173500 | consumed samples: 12510720 | consumed tokens: 25621954560 | elapsed time per iteration (s): 0.11 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.558616E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2323.853 | TFLOPs: 8.64 | 7: iteration 48880/ 173500 | consumed samples: 12513280 | consumed tokens: 25627197440 | elapsed time per iteration (s): 0.12 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.547635E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.014 | TFLOPs: 8.02 | 7: iteration 48890/ 173500 | consumed samples: 12515840 | consumed tokens: 25632440320 | elapsed time per iteration (s): 0.08 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 4.565216E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.008 | TFLOPs: 11.89 | 7: iteration 48900/ 173500 | consumed samples: 12518400 | consumed tokens: 25637683200 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.567817E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.893 | TFLOPs: 11.84 | 7: iteration 48910/ 173500 | consumed samples: 12520960 | consumed tokens: 25642926080 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.568955E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.538 | TFLOPs: 11.83 | 7: iteration 48920/ 173500 | consumed samples: 12523520 | consumed tokens: 25648168960 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.553889E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.808 | TFLOPs: 11.84 | 7: iteration 48930/ 173500 | consumed samples: 12526080 | consumed tokens: 25653411840 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.568320E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.224 | TFLOPs: 11.86 | 7: iteration 48940/ 173500 | consumed samples: 12528640 | consumed tokens: 25658654720 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.560676E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.679 | TFLOPs: 11.88 | 7: iteration 48950/ 173500 | consumed samples: 12531200 | consumed tokens: 25663897600 | elapsed time per iteration (s): 0.08 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.568924E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.227 | TFLOPs: 11.88 | 7: iteration 48960/ 173500 | consumed samples: 12533760 | consumed tokens: 25669140480 | elapsed time per iteration (s): 0.09 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.566369E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2856.605 | TFLOPs: 10.63 | 7: iteration 48970/ 173500 | consumed samples: 12536320 | consumed tokens: 25674383360 | elapsed time per iteration (s): 0.13 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 4.558048E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.627 | TFLOPs: 7.39 | 7: iteration 48980/ 173500 | consumed samples: 12538880 | consumed tokens: 25679626240 | elapsed time per iteration (s): 0.13 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.562664E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.233 | TFLOPs: 7.37 | 7: iteration 48990/ 173500 | consumed samples: 12541440 | consumed tokens: 25684869120 | elapsed time per iteration (s): 0.13 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.558113E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.473 | TFLOPs: 7.30 | 7: iteration 49000/ 173500 | consumed samples: 12544000 | consumed tokens: 25690112000 | elapsed time per iteration (s): 0.10 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.564514E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.972 | TFLOPs: 9.61 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 49000 | lm loss value: 4.418526E+00 | lm loss PPL: 8.297387E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 49000 to checkpoints_14m91b100m 0: [2023-03-17 01:27:42,875] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step49000 is begin to save! 0: [2023-03-17 01:27:42,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:27:42,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:27:42,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:27:42,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:27:42,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:27:42,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:27:42,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:27:42,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:27:42,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:27:42,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:27:42,916] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:27:42,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:27:42,918] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step49000/mp_rank_00_model_states.pt 0: [2023-03-17 01:27:42,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:27:42,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:27:42,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:27:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,946] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,947] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,948] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,948] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,949] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,949] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,950] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:27:42,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:27:42,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: successfully saved checkpoint at iteration 49000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.12 7: iteration 49010/ 173500 | consumed samples: 12546560 | consumed tokens: 25695354880 | elapsed time per iteration (s): 0.09 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.561860E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2793.341 | TFLOPs: 10.39 | 7: iteration 49020/ 173500 | consumed samples: 12549120 | consumed tokens: 25700597760 | elapsed time per iteration (s): 0.08 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.567801E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.956 | TFLOPs: 11.90 | 7: iteration 49030/ 173500 | consumed samples: 12551680 | consumed tokens: 25705840640 | elapsed time per iteration (s): 0.08 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.546120E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.329 | TFLOPs: 11.50 | 7: iteration 49040/ 173500 | consumed samples: 12554240 | consumed tokens: 25711083520 | elapsed time per iteration (s): 0.10 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.566648E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.851 | TFLOPs: 9.96 | 7: iteration 49050/ 173500 | consumed samples: 12556800 | consumed tokens: 25716326400 | elapsed time per iteration (s): 0.08 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.577123E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.883 | TFLOPs: 11.63 | 7: iteration 49060/ 173500 | consumed samples: 12559360 | consumed tokens: 25721569280 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.563311E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.819 | TFLOPs: 11.88 | 7: iteration 49070/ 173500 | consumed samples: 12561920 | consumed tokens: 25726812160 | elapsed time per iteration (s): 0.09 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.565877E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.964 | TFLOPs: 10.33 | 7: iteration 49080/ 173500 | consumed samples: 12564480 | consumed tokens: 25732055040 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.555560E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.082 | TFLOPs: 11.93 | 7: iteration 49090/ 173500 | consumed samples: 12567040 | consumed tokens: 25737297920 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.560833E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.526 | TFLOPs: 11.90 | 7: iteration 49100/ 173500 | consumed samples: 12569600 | consumed tokens: 25742540800 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.572318E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.357 | TFLOPs: 11.94 | 7: iteration 49110/ 173500 | consumed samples: 12572160 | consumed tokens: 25747783680 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.553449E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.899 | TFLOPs: 11.92 | 7: iteration 49120/ 173500 | consumed samples: 12574720 | consumed tokens: 25753026560 | elapsed time per iteration (s): 0.08 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.568931E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.422 | TFLOPs: 11.90 | 7: iteration 49130/ 173500 | consumed samples: 12577280 | consumed tokens: 25758269440 | elapsed time per iteration (s): 0.09 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 4.562227E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.827 | TFLOPs: 10.84 | 7: iteration 49140/ 173500 | consumed samples: 12579840 | consumed tokens: 25763512320 | elapsed time per iteration (s): 0.08 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.547203E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.908 | TFLOPs: 11.84 | 7: iteration 49150/ 173500 | consumed samples: 12582400 | consumed tokens: 25768755200 | elapsed time per iteration (s): 0.08 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.556412E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.800 | TFLOPs: 11.63 | 7: iteration 49160/ 173500 | consumed samples: 12584960 | consumed tokens: 25773998080 | elapsed time per iteration (s): 0.12 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.561933E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2210.952 | TFLOPs: 8.22 | 7: iteration 49170/ 173500 | consumed samples: 12587520 | consumed tokens: 25779240960 | elapsed time per iteration (s): 0.12 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.571879E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2100.696 | TFLOPs: 7.81 | 7: iteration 49180/ 173500 | consumed samples: 12590080 | consumed tokens: 25784483840 | elapsed time per iteration (s): 0.13 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.569054E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2005.621 | TFLOPs: 7.46 | 7: iteration 49190/ 173500 | consumed samples: 12592640 | consumed tokens: 25789726720 | elapsed time per iteration (s): 0.14 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.549324E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1881.755 | TFLOPs: 7.00 | 7: iteration 49200/ 173500 | consumed samples: 12595200 | consumed tokens: 25794969600 | elapsed time per iteration (s): 0.13 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.555114E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1902.326 | TFLOPs: 7.08 | 7: iteration 49210/ 173500 | consumed samples: 12597760 | consumed tokens: 25800212480 | elapsed time per iteration (s): 0.13 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 4.547168E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1930.935 | TFLOPs: 7.18 | 7: iteration 49220/ 173500 | consumed samples: 12600320 | consumed tokens: 25805455360 | elapsed time per iteration (s): 0.14 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.570151E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1887.938 | TFLOPs: 7.02 | 7: iteration 49230/ 173500 | consumed samples: 12602880 | consumed tokens: 25810698240 | elapsed time per iteration (s): 0.12 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.567390E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2063.956 | TFLOPs: 7.68 | 7: iteration 49240/ 173500 | consumed samples: 12605440 | consumed tokens: 25815941120 | elapsed time per iteration (s): 0.12 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.561773E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.111 | TFLOPs: 8.02 | 7: iteration 49250/ 173500 | consumed samples: 12608000 | consumed tokens: 25821184000 | elapsed time per iteration (s): 0.11 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.553297E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2313.578 | TFLOPs: 8.61 | 7: iteration 49260/ 173500 | consumed samples: 12610560 | consumed tokens: 25826426880 | elapsed time per iteration (s): 0.08 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.568023E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.248 | TFLOPs: 11.98 | 7: iteration 49270/ 173500 | consumed samples: 12613120 | consumed tokens: 25831669760 | elapsed time per iteration (s): 0.08 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.543378E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.426 | TFLOPs: 11.96 | 7: iteration 49280/ 173500 | consumed samples: 12615680 | consumed tokens: 25836912640 | elapsed time per iteration (s): 0.08 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 4.550946E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.745 | TFLOPs: 11.94 | 7: iteration 49290/ 173500 | consumed samples: 12618240 | consumed tokens: 25842155520 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.564037E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.970 | TFLOPs: 11.95 | 7: iteration 49300/ 173500 | consumed samples: 12620800 | consumed tokens: 25847398400 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.563290E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.894 | TFLOPs: 12.00 | 7: iteration 49310/ 173500 | consumed samples: 12623360 | consumed tokens: 25852641280 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.561159E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.455 | TFLOPs: 12.01 | 7: iteration 49320/ 173500 | consumed samples: 12625920 | consumed tokens: 25857884160 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.556149E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.667 | TFLOPs: 12.04 | 7: iteration 49330/ 173500 | consumed samples: 12628480 | consumed tokens: 25863127040 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.570250E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.602 | TFLOPs: 11.34 | 7: iteration 49340/ 173500 | consumed samples: 12631040 | consumed tokens: 25868369920 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.561262E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.015 | TFLOPs: 11.93 | 7: iteration 49350/ 173500 | consumed samples: 12633600 | consumed tokens: 25873612800 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.557554E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.678 | TFLOPs: 12.02 | 7: iteration 49360/ 173500 | consumed samples: 12636160 | consumed tokens: 25878855680 | elapsed time per iteration (s): 0.08 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 4.563589E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.879 | TFLOPs: 11.99 | 7: iteration 49370/ 173500 | consumed samples: 12638720 | consumed tokens: 25884098560 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.568664E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.279 | TFLOPs: 12.03 | 7: iteration 49380/ 173500 | consumed samples: 12641280 | consumed tokens: 25889341440 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.561699E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3257.297 | TFLOPs: 12.12 | 7: iteration 49390/ 173500 | consumed samples: 12643840 | consumed tokens: 25894584320 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.549048E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.259 | TFLOPs: 12.10 | 7: iteration 49400/ 173500 | consumed samples: 12646400 | consumed tokens: 25899827200 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.557642E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.881 | TFLOPs: 12.09 | 7: iteration 49410/ 173500 | consumed samples: 12648960 | consumed tokens: 25905070080 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.554910E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3263.371 | TFLOPs: 12.14 | 7: iteration 49420/ 173500 | consumed samples: 12651520 | consumed tokens: 25910312960 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.559940E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.711 | TFLOPs: 12.05 | 7: iteration 49430/ 173500 | consumed samples: 12654080 | consumed tokens: 25915555840 | elapsed time per iteration (s): 0.08 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.551146E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.543 | TFLOPs: 11.62 | 7: iteration 49440/ 173500 | consumed samples: 12656640 | consumed tokens: 25920798720 | elapsed time per iteration (s): 0.11 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 4.565622E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.573 | TFLOPs: 8.98 | 7: iteration 49450/ 173500 | consumed samples: 12659200 | consumed tokens: 25926041600 | elapsed time per iteration (s): 0.09 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.568776E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.582 | TFLOPs: 10.22 | 7: iteration 49460/ 173500 | consumed samples: 12661760 | consumed tokens: 25931284480 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.561315E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.525 | TFLOPs: 11.86 | 7: iteration 49470/ 173500 | consumed samples: 12664320 | consumed tokens: 25936527360 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.569752E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.661 | TFLOPs: 12.04 | 7: iteration 49480/ 173500 | consumed samples: 12666880 | consumed tokens: 25941770240 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.555441E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.862 | TFLOPs: 11.99 | 7: iteration 49490/ 173500 | consumed samples: 12669440 | consumed tokens: 25947013120 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.552491E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.955 | TFLOPs: 12.02 | 7: iteration 49500/ 173500 | consumed samples: 12672000 | consumed tokens: 25952256000 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.569769E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.011 | TFLOPs: 12.03 | 7: iteration 49510/ 173500 | consumed samples: 12674560 | consumed tokens: 25957498880 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.564843E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.348 | TFLOPs: 11.83 | 7: iteration 49520/ 173500 | consumed samples: 12677120 | consumed tokens: 25962741760 | elapsed time per iteration (s): 0.08 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 4.565938E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.733 | TFLOPs: 12.00 | 7: iteration 49530/ 173500 | consumed samples: 12679680 | consumed tokens: 25967984640 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.538420E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.084 | TFLOPs: 12.04 | 7: iteration 49540/ 173500 | consumed samples: 12682240 | consumed tokens: 25973227520 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.561268E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.241 | TFLOPs: 12.01 | 7: iteration 49550/ 173500 | consumed samples: 12684800 | consumed tokens: 25978470400 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.569822E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.959 | TFLOPs: 12.03 | 7: iteration 49560/ 173500 | consumed samples: 12687360 | consumed tokens: 25983713280 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.559903E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.869 | TFLOPs: 12.00 | 7: iteration 49570/ 173500 | consumed samples: 12689920 | consumed tokens: 25988956160 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.565059E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.845 | TFLOPs: 12.03 | 7: iteration 49580/ 173500 | consumed samples: 12692480 | consumed tokens: 25994199040 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.558326E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.723 | TFLOPs: 12.03 | 7: iteration 49590/ 173500 | consumed samples: 12695040 | consumed tokens: 25999441920 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.577248E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.088 | TFLOPs: 12.02 | 7: iteration 49600/ 173500 | consumed samples: 12697600 | consumed tokens: 26004684800 | elapsed time per iteration (s): 0.08 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 4.569308E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.136 | TFLOPs: 12.00 | 7: iteration 49610/ 173500 | consumed samples: 12700160 | consumed tokens: 26009927680 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.546022E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.045 | TFLOPs: 12.06 | 7: iteration 49620/ 173500 | consumed samples: 12702720 | consumed tokens: 26015170560 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.559381E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.455 | TFLOPs: 12.02 | 7: iteration 49630/ 173500 | consumed samples: 12705280 | consumed tokens: 26020413440 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.560531E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.495 | TFLOPs: 11.87 | 7: iteration 49640/ 173500 | consumed samples: 12707840 | consumed tokens: 26025656320 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.567299E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.839 | TFLOPs: 12.10 | 7: iteration 49650/ 173500 | consumed samples: 12710400 | consumed tokens: 26030899200 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.570745E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.190 | TFLOPs: 12.01 | 7: iteration 49660/ 173500 | consumed samples: 12712960 | consumed tokens: 26036142080 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.554075E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.110 | TFLOPs: 12.03 | 7: iteration 49670/ 173500 | consumed samples: 12715520 | consumed tokens: 26041384960 | elapsed time per iteration (s): 0.08 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.568412E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.126 | TFLOPs: 12.07 | 7: iteration 49680/ 173500 | consumed samples: 12718080 | consumed tokens: 26046627840 | elapsed time per iteration (s): 0.09 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 4.567754E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2738.963 | TFLOPs: 10.19 | 7: iteration 49690/ 173500 | consumed samples: 12720640 | consumed tokens: 26051870720 | elapsed time per iteration (s): 0.08 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.560272E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3263.059 | TFLOPs: 12.14 | 7: iteration 49700/ 173500 | consumed samples: 12723200 | consumed tokens: 26057113600 | elapsed time per iteration (s): 0.08 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.550087E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.274 | TFLOPs: 12.00 | 7: iteration 49710/ 173500 | consumed samples: 12725760 | consumed tokens: 26062356480 | elapsed time per iteration (s): 0.09 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.565411E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.619 | TFLOPs: 10.89 | 7: iteration 49720/ 173500 | consumed samples: 12728320 | consumed tokens: 26067599360 | elapsed time per iteration (s): 0.11 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.557389E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.465 | TFLOPs: 8.59 | 7: iteration 49730/ 173500 | consumed samples: 12730880 | consumed tokens: 26072842240 | elapsed time per iteration (s): 0.11 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.571433E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.444 | TFLOPs: 8.72 | 7: iteration 49740/ 173500 | consumed samples: 12733440 | consumed tokens: 26078085120 | elapsed time per iteration (s): 0.11 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.570963E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.423 | TFLOPs: 8.82 | 7: iteration 49750/ 173500 | consumed samples: 12736000 | consumed tokens: 26083328000 | elapsed time per iteration (s): 0.11 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.564037E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.742 | TFLOPs: 8.62 | 7: iteration 49760/ 173500 | consumed samples: 12738560 | consumed tokens: 26088570880 | elapsed time per iteration (s): 0.12 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 4.561624E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2139.355 | TFLOPs: 7.96 | 7: iteration 49770/ 173500 | consumed samples: 12741120 | consumed tokens: 26093813760 | elapsed time per iteration (s): 0.13 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.556381E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.412 | TFLOPs: 7.53 | 7: iteration 49780/ 173500 | consumed samples: 12743680 | consumed tokens: 26099056640 | elapsed time per iteration (s): 0.11 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.556763E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.113 | TFLOPs: 8.66 | 7: iteration 49790/ 173500 | consumed samples: 12746240 | consumed tokens: 26104299520 | elapsed time per iteration (s): 0.10 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.558478E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2488.539 | TFLOPs: 9.26 | 7: iteration 49800/ 173500 | consumed samples: 12748800 | consumed tokens: 26109542400 | elapsed time per iteration (s): 0.10 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.560991E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2466.328 | TFLOPs: 9.17 | 7: iteration 49810/ 173500 | consumed samples: 12751360 | consumed tokens: 26114785280 | elapsed time per iteration (s): 0.11 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.561087E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.192 | TFLOPs: 9.04 | 7: iteration 49820/ 173500 | consumed samples: 12753920 | consumed tokens: 26120028160 | elapsed time per iteration (s): 0.11 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.561714E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.529 | TFLOPs: 8.95 | 7: iteration 49830/ 173500 | consumed samples: 12756480 | consumed tokens: 26125271040 | elapsed time per iteration (s): 0.11 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.567530E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.534 | TFLOPs: 9.05 | 7: iteration 49840/ 173500 | consumed samples: 12759040 | consumed tokens: 26130513920 | elapsed time per iteration (s): 0.11 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 4.563818E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2414.937 | TFLOPs: 8.98 | 7: iteration 49850/ 173500 | consumed samples: 12761600 | consumed tokens: 26135756800 | elapsed time per iteration (s): 0.12 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.548823E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2170.940 | TFLOPs: 8.07 | 7: iteration 49860/ 173500 | consumed samples: 12764160 | consumed tokens: 26140999680 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.547465E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.596 | TFLOPs: 8.75 | 7: iteration 49870/ 173500 | consumed samples: 12766720 | consumed tokens: 26146242560 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.554420E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.350 | TFLOPs: 8.88 | 7: iteration 49880/ 173500 | consumed samples: 12769280 | consumed tokens: 26151485440 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.556157E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.086 | TFLOPs: 8.88 | 7: iteration 49890/ 173500 | consumed samples: 12771840 | consumed tokens: 26156728320 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.555187E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.481 | TFLOPs: 8.56 | 7: iteration 49900/ 173500 | consumed samples: 12774400 | consumed tokens: 26161971200 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.568394E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.781 | TFLOPs: 8.91 | 7: iteration 49910/ 173500 | consumed samples: 12776960 | consumed tokens: 26167214080 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.559033E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2323.150 | TFLOPs: 8.64 | 7: iteration 49920/ 173500 | consumed samples: 12779520 | consumed tokens: 26172456960 | elapsed time per iteration (s): 0.11 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 4.544769E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2331.336 | TFLOPs: 8.67 | 7: iteration 49930/ 173500 | consumed samples: 12782080 | consumed tokens: 26177699840 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.574289E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.634 | TFLOPs: 8.59 | 7: iteration 49940/ 173500 | consumed samples: 12784640 | consumed tokens: 26182942720 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.562896E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.854 | TFLOPs: 8.63 | 7: iteration 49950/ 173500 | consumed samples: 12787200 | consumed tokens: 26188185600 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.550404E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.983 | TFLOPs: 8.86 | 7: iteration 49960/ 173500 | consumed samples: 12789760 | consumed tokens: 26193428480 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.553285E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2324.039 | TFLOPs: 8.64 | 7: iteration 49970/ 173500 | consumed samples: 12792320 | consumed tokens: 26198671360 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.545370E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.841 | TFLOPs: 8.85 | 7: iteration 49980/ 173500 | consumed samples: 12794880 | consumed tokens: 26203914240 | elapsed time per iteration (s): 0.12 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.565441E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2140.822 | TFLOPs: 7.96 | 7: iteration 49990/ 173500 | consumed samples: 12797440 | consumed tokens: 26209157120 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.553904E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2278.755 | TFLOPs: 8.48 | 0: [2023-03-17 01:29:16,995] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=0, lr=[0.00016715144913462704, 0.00016715144913462704, 0.00016715144913462704], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 50000/ 173500 | consumed samples: 12800000 | consumed tokens: 26214400000 | elapsed time per iteration (s): 0.11 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 4.557697E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2368.804 | TFLOPs: 8.81 | 0: steps: 50000 loss: 4.5758 iter time (s): 0.092 samples/sec: 2796.763 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 50000 | lm loss value: 4.428394E+00 | lm loss PPL: 8.379676E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 50000 to checkpoints_14m91b100m 0: [2023-03-17 01:29:17,070] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step50000 is begin to save! 0: [2023-03-17 01:29:17,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:29:17,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:29:17,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:29:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:29:17,103] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:29:17,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:29:17,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:29:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:29:17,109] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:29:17,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:29:17,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:29:17,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:29:17,113] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step50000/mp_rank_00_model_states.pt 0: [2023-03-17 01:29:17,113] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:29:17,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:29:17,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:29:17,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:29:17,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2023-03-17 01:29:17,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:29:17,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:29:17,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2023-03-17 01:29:17,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:29:17,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:29:17,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: successfully saved checkpoint at iteration 50000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.41 7: iteration 50010/ 173500 | consumed samples: 12802560 | consumed tokens: 26219642880 | elapsed time per iteration (s): 0.12 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.542437E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2118.787 | TFLOPs: 7.88 | 7: iteration 50020/ 173500 | consumed samples: 12805120 | consumed tokens: 26224885760 | elapsed time per iteration (s): 0.10 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.556404E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2525.454 | TFLOPs: 9.39 | 7: iteration 50030/ 173500 | consumed samples: 12807680 | consumed tokens: 26230128640 | elapsed time per iteration (s): 0.08 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.558246E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.130 | TFLOPs: 11.82 | 7: iteration 50040/ 173500 | consumed samples: 12810240 | consumed tokens: 26235371520 | elapsed time per iteration (s): 0.08 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.555978E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.987 | TFLOPs: 11.55 | 7: iteration 50050/ 173500 | consumed samples: 12812800 | consumed tokens: 26240614400 | elapsed time per iteration (s): 0.08 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.565369E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.372 | TFLOPs: 11.82 | 7: iteration 50060/ 173500 | consumed samples: 12815360 | consumed tokens: 26245857280 | elapsed time per iteration (s): 0.08 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.559483E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.819 | TFLOPs: 11.79 | 7: iteration 50070/ 173500 | consumed samples: 12817920 | consumed tokens: 26251100160 | elapsed time per iteration (s): 0.08 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 4.551255E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.183 | TFLOPs: 11.87 | 7: iteration 50080/ 173500 | consumed samples: 12820480 | consumed tokens: 26256343040 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.563697E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.497 | TFLOPs: 11.86 | 7: iteration 50090/ 173500 | consumed samples: 12823040 | consumed tokens: 26261585920 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.558776E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.950 | TFLOPs: 11.90 | 7: iteration 50100/ 173500 | consumed samples: 12825600 | consumed tokens: 26266828800 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.560314E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.606 | TFLOPs: 11.89 | 7: iteration 50110/ 173500 | consumed samples: 12828160 | consumed tokens: 26272071680 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.554498E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.026 | TFLOPs: 11.79 | 7: iteration 50120/ 173500 | consumed samples: 12830720 | consumed tokens: 26277314560 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.548546E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.178 | TFLOPs: 11.93 | 7: iteration 50130/ 173500 | consumed samples: 12833280 | consumed tokens: 26282557440 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.552458E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.382 | TFLOPs: 11.83 | 7: iteration 50140/ 173500 | consumed samples: 12835840 | consumed tokens: 26287800320 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.564914E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.188 | TFLOPs: 11.88 | 7: iteration 50150/ 173500 | consumed samples: 12838400 | consumed tokens: 26293043200 | elapsed time per iteration (s): 0.08 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 4.550842E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.966 | TFLOPs: 11.91 | 7: iteration 50160/ 173500 | consumed samples: 12840960 | consumed tokens: 26298286080 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.565359E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.981 | TFLOPs: 11.89 | 7: iteration 50170/ 173500 | consumed samples: 12843520 | consumed tokens: 26303528960 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.559340E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.567 | TFLOPs: 11.99 | 7: iteration 50180/ 173500 | consumed samples: 12846080 | consumed tokens: 26308771840 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.559116E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.718 | TFLOPs: 11.93 | 7: iteration 50190/ 173500 | consumed samples: 12848640 | consumed tokens: 26314014720 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.561034E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.462 | TFLOPs: 11.95 | 7: iteration 50200/ 173500 | consumed samples: 12851200 | consumed tokens: 26319257600 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.552445E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.664 | TFLOPs: 11.88 | 7: iteration 50210/ 173500 | consumed samples: 12853760 | consumed tokens: 26324500480 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.545670E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.854 | TFLOPs: 11.97 | 7: iteration 50220/ 173500 | consumed samples: 12856320 | consumed tokens: 26329743360 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.555161E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.022 | TFLOPs: 11.89 | 7: iteration 50230/ 173500 | consumed samples: 12858880 | consumed tokens: 26334986240 | elapsed time per iteration (s): 0.08 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 4.559320E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.947 | TFLOPs: 11.60 | 7: iteration 50240/ 173500 | consumed samples: 12861440 | consumed tokens: 26340229120 | elapsed time per iteration (s): 0.08 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.552930E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.849 | TFLOPs: 11.88 | 7: iteration 50250/ 173500 | consumed samples: 12864000 | consumed tokens: 26345472000 | elapsed time per iteration (s): 0.10 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.558028E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.549 | TFLOPs: 9.59 | 7: iteration 50260/ 173500 | consumed samples: 12866560 | consumed tokens: 26350714880 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.556448E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2272.733 | TFLOPs: 8.45 | 7: iteration 50270/ 173500 | consumed samples: 12869120 | consumed tokens: 26355957760 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.548180E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2250.869 | TFLOPs: 8.37 | 7: iteration 50280/ 173500 | consumed samples: 12871680 | consumed tokens: 26361200640 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.555427E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2240.188 | TFLOPs: 8.33 | 7: iteration 50290/ 173500 | consumed samples: 12874240 | consumed tokens: 26366443520 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.562364E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.146 | TFLOPs: 8.98 | 7: iteration 50300/ 173500 | consumed samples: 12876800 | consumed tokens: 26371686400 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.563163E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.322 | TFLOPs: 8.75 | 7: iteration 50310/ 173500 | consumed samples: 12879360 | consumed tokens: 26376929280 | elapsed time per iteration (s): 0.11 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 4.553145E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.923 | TFLOPs: 8.88 | 7: iteration 50320/ 173500 | consumed samples: 12881920 | consumed tokens: 26382172160 | elapsed time per iteration (s): 0.11 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.553517E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.975 | TFLOPs: 8.85 | 7: iteration 50330/ 173500 | consumed samples: 12884480 | consumed tokens: 26387415040 | elapsed time per iteration (s): 0.13 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.546819E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1943.279 | TFLOPs: 7.23 | 7: iteration 50340/ 173500 | consumed samples: 12887040 | consumed tokens: 26392657920 | elapsed time per iteration (s): 0.12 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.566081E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2107.730 | TFLOPs: 7.84 | 7: iteration 50350/ 173500 | consumed samples: 12889600 | consumed tokens: 26397900800 | elapsed time per iteration (s): 0.12 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.559190E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2189.676 | TFLOPs: 8.14 | 7: iteration 50360/ 173500 | consumed samples: 12892160 | consumed tokens: 26403143680 | elapsed time per iteration (s): 0.11 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.559409E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2384.824 | TFLOPs: 8.87 | 7: iteration 50370/ 173500 | consumed samples: 12894720 | consumed tokens: 26408386560 | elapsed time per iteration (s): 0.10 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.563900E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2440.292 | TFLOPs: 9.08 | 7: iteration 50380/ 173500 | consumed samples: 12897280 | consumed tokens: 26413629440 | elapsed time per iteration (s): 0.12 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.561814E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2125.381 | TFLOPs: 7.91 | 7: iteration 50390/ 173500 | consumed samples: 12899840 | consumed tokens: 26418872320 | elapsed time per iteration (s): 0.12 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.566838E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2218.249 | TFLOPs: 8.25 | 7: iteration 50400/ 173500 | consumed samples: 12902400 | consumed tokens: 26424115200 | elapsed time per iteration (s): 0.12 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.563155E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2218.336 | TFLOPs: 8.25 | 7: iteration 50410/ 173500 | consumed samples: 12904960 | consumed tokens: 26429358080 | elapsed time per iteration (s): 0.12 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.570032E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.972 | TFLOPs: 8.15 | 7: iteration 50420/ 173500 | consumed samples: 12907520 | consumed tokens: 26434600960 | elapsed time per iteration (s): 0.12 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.555214E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2170.469 | TFLOPs: 8.07 | 7: iteration 50430/ 173500 | consumed samples: 12910080 | consumed tokens: 26439843840 | elapsed time per iteration (s): 0.10 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.560026E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.129 | TFLOPs: 9.09 | 7: iteration 50440/ 173500 | consumed samples: 12912640 | consumed tokens: 26445086720 | elapsed time per iteration (s): 0.10 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.569001E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.752 | TFLOPs: 9.34 | 7: iteration 50450/ 173500 | consumed samples: 12915200 | consumed tokens: 26450329600 | elapsed time per iteration (s): 0.10 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.563160E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2554.921 | TFLOPs: 9.50 | 7: iteration 50460/ 173500 | consumed samples: 12917760 | consumed tokens: 26455572480 | elapsed time per iteration (s): 0.10 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.550788E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.111 | TFLOPs: 9.12 | 7: iteration 50470/ 173500 | consumed samples: 12920320 | consumed tokens: 26460815360 | elapsed time per iteration (s): 0.10 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 4.563031E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.540 | TFLOPs: 9.25 | 7: iteration 50480/ 173500 | consumed samples: 12922880 | consumed tokens: 26466058240 | elapsed time per iteration (s): 0.10 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.573971E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.693 | TFLOPs: 9.16 | 7: iteration 50490/ 173500 | consumed samples: 12925440 | consumed tokens: 26471301120 | elapsed time per iteration (s): 0.10 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.565286E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.528 | TFLOPs: 9.30 | 7: iteration 50500/ 173500 | consumed samples: 12928000 | consumed tokens: 26476544000 | elapsed time per iteration (s): 0.11 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.569926E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.673 | TFLOPs: 8.91 | 7: iteration 50510/ 173500 | consumed samples: 12930560 | consumed tokens: 26481786880 | elapsed time per iteration (s): 0.11 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.562871E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.887 | TFLOPs: 8.79 | 7: iteration 50520/ 173500 | consumed samples: 12933120 | consumed tokens: 26487029760 | elapsed time per iteration (s): 0.10 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.555503E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2462.255 | TFLOPs: 9.16 | 7: iteration 50530/ 173500 | consumed samples: 12935680 | consumed tokens: 26492272640 | elapsed time per iteration (s): 0.10 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.560654E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.668 | TFLOPs: 9.45 | 7: iteration 50540/ 173500 | consumed samples: 12938240 | consumed tokens: 26497515520 | elapsed time per iteration (s): 0.11 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 4.567096E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.352 | TFLOPs: 8.85 | 7: iteration 50550/ 173500 | consumed samples: 12940800 | consumed tokens: 26502758400 | elapsed time per iteration (s): 0.10 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.551444E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.557 | TFLOPs: 9.25 | 7: iteration 50560/ 173500 | consumed samples: 12943360 | consumed tokens: 26508001280 | elapsed time per iteration (s): 0.11 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.555829E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2348.389 | TFLOPs: 8.73 | 7: iteration 50570/ 173500 | consumed samples: 12945920 | consumed tokens: 26513244160 | elapsed time per iteration (s): 0.10 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.568620E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2554.760 | TFLOPs: 9.50 | 7: iteration 50580/ 173500 | consumed samples: 12948480 | consumed tokens: 26518487040 | elapsed time per iteration (s): 0.10 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.561504E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2438.433 | TFLOPs: 9.07 | 7: iteration 50590/ 173500 | consumed samples: 12951040 | consumed tokens: 26523729920 | elapsed time per iteration (s): 0.11 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.542282E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.211 | TFLOPs: 8.88 | 7: iteration 50600/ 173500 | consumed samples: 12953600 | consumed tokens: 26528972800 | elapsed time per iteration (s): 0.10 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.563353E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.378 | TFLOPs: 9.23 | 7: iteration 50610/ 173500 | consumed samples: 12956160 | consumed tokens: 26534215680 | elapsed time per iteration (s): 0.10 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.565355E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.692 | TFLOPs: 9.16 | 7: iteration 50620/ 173500 | consumed samples: 12958720 | consumed tokens: 26539458560 | elapsed time per iteration (s): 0.11 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 4.553079E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2227.814 | TFLOPs: 8.29 | 7: iteration 50630/ 173500 | consumed samples: 12961280 | consumed tokens: 26544701440 | elapsed time per iteration (s): 0.13 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.554352E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.813 | TFLOPs: 7.28 | 7: iteration 50640/ 173500 | consumed samples: 12963840 | consumed tokens: 26549944320 | elapsed time per iteration (s): 0.14 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.557351E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1879.790 | TFLOPs: 6.99 | 7: iteration 50650/ 173500 | consumed samples: 12966400 | consumed tokens: 26555187200 | elapsed time per iteration (s): 0.11 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.540765E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2266.764 | TFLOPs: 8.43 | 7: iteration 50660/ 173500 | consumed samples: 12968960 | consumed tokens: 26560430080 | elapsed time per iteration (s): 0.11 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.553897E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2231.030 | TFLOPs: 8.30 | 7: iteration 50670/ 173500 | consumed samples: 12971520 | consumed tokens: 26565672960 | elapsed time per iteration (s): 0.12 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.551770E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2087.497 | TFLOPs: 7.76 | 7: iteration 50680/ 173500 | consumed samples: 12974080 | consumed tokens: 26570915840 | elapsed time per iteration (s): 0.11 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.540380E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2297.188 | TFLOPs: 8.54 | 7: iteration 50690/ 173500 | consumed samples: 12976640 | consumed tokens: 26576158720 | elapsed time per iteration (s): 0.10 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.553630E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.850 | TFLOPs: 9.60 | 7: iteration 50700/ 173500 | consumed samples: 12979200 | consumed tokens: 26581401600 | elapsed time per iteration (s): 0.10 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 4.542625E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.592 | TFLOPs: 9.34 | 7: iteration 50710/ 173500 | consumed samples: 12981760 | consumed tokens: 26586644480 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.563622E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.279 | TFLOPs: 9.27 | 7: iteration 50720/ 173500 | consumed samples: 12984320 | consumed tokens: 26591887360 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.560593E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.713 | TFLOPs: 9.33 | 7: iteration 50730/ 173500 | consumed samples: 12986880 | consumed tokens: 26597130240 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.553952E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2512.261 | TFLOPs: 9.34 | 7: iteration 50740/ 173500 | consumed samples: 12989440 | consumed tokens: 26602373120 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.557445E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.845 | TFLOPs: 9.64 | 7: iteration 50750/ 173500 | consumed samples: 12992000 | consumed tokens: 26607616000 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.547599E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.022 | TFLOPs: 9.30 | 7: iteration 50760/ 173500 | consumed samples: 12994560 | consumed tokens: 26612858880 | elapsed time per iteration (s): 0.11 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.556345E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2374.290 | TFLOPs: 8.83 | 7: iteration 50770/ 173500 | consumed samples: 12997120 | consumed tokens: 26618101760 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.551372E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.588 | TFLOPs: 9.23 | 7: iteration 50780/ 173500 | consumed samples: 12999680 | consumed tokens: 26623344640 | elapsed time per iteration (s): 0.10 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 4.558140E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.789 | TFLOPs: 9.48 | 7: iteration 50790/ 173500 | consumed samples: 13002240 | consumed tokens: 26628587520 | elapsed time per iteration (s): 0.11 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.552628E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2363.852 | TFLOPs: 8.79 | 7: iteration 50800/ 173500 | consumed samples: 13004800 | consumed tokens: 26633830400 | elapsed time per iteration (s): 0.11 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.553354E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.552 | TFLOPs: 8.37 | 7: iteration 50810/ 173500 | consumed samples: 13007360 | consumed tokens: 26639073280 | elapsed time per iteration (s): 0.11 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.549971E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2389.222 | TFLOPs: 8.89 | 7: iteration 50820/ 173500 | consumed samples: 13009920 | consumed tokens: 26644316160 | elapsed time per iteration (s): 0.10 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.575669E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.666 | TFLOPs: 9.80 | 7: iteration 50830/ 173500 | consumed samples: 13012480 | consumed tokens: 26649559040 | elapsed time per iteration (s): 0.10 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.565609E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.905 | TFLOPs: 9.48 | 7: iteration 50840/ 173500 | consumed samples: 13015040 | consumed tokens: 26654801920 | elapsed time per iteration (s): 0.10 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.553996E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.093 | TFLOPs: 9.36 | 7: iteration 50850/ 173500 | consumed samples: 13017600 | consumed tokens: 26660044800 | elapsed time per iteration (s): 0.10 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.562897E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2466.547 | TFLOPs: 9.17 | 7: iteration 50860/ 173500 | consumed samples: 13020160 | consumed tokens: 26665287680 | elapsed time per iteration (s): 0.10 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 4.557440E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2518.565 | TFLOPs: 9.37 | 7: iteration 50870/ 173500 | consumed samples: 13022720 | consumed tokens: 26670530560 | elapsed time per iteration (s): 0.11 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.567839E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2375.059 | TFLOPs: 8.83 | 7: iteration 50880/ 173500 | consumed samples: 13025280 | consumed tokens: 26675773440 | elapsed time per iteration (s): 0.10 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.554805E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2446.860 | TFLOPs: 9.10 | 7: iteration 50890/ 173500 | consumed samples: 13027840 | consumed tokens: 26681016320 | elapsed time per iteration (s): 0.11 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.556315E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2362.201 | TFLOPs: 8.79 | 7: iteration 50900/ 173500 | consumed samples: 13030400 | consumed tokens: 26686259200 | elapsed time per iteration (s): 0.10 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.561630E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.451 | TFLOPs: 9.41 | 7: iteration 50910/ 173500 | consumed samples: 13032960 | consumed tokens: 26691502080 | elapsed time per iteration (s): 0.11 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.556973E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.002 | TFLOPs: 8.86 | 7: iteration 50920/ 173500 | consumed samples: 13035520 | consumed tokens: 26696744960 | elapsed time per iteration (s): 0.11 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.570851E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2292.447 | TFLOPs: 8.53 | 7: iteration 50930/ 173500 | consumed samples: 13038080 | consumed tokens: 26701987840 | elapsed time per iteration (s): 0.10 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 4.565532E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.357 | TFLOPs: 9.19 | 7: iteration 50940/ 173500 | consumed samples: 13040640 | consumed tokens: 26707230720 | elapsed time per iteration (s): 0.09 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.552619E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.022 | TFLOPs: 10.28 | 7: iteration 50950/ 173500 | consumed samples: 13043200 | consumed tokens: 26712473600 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.555902E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.442 | TFLOPs: 11.93 | 7: iteration 50960/ 173500 | consumed samples: 13045760 | consumed tokens: 26717716480 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.558797E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.287 | TFLOPs: 12.05 | 7: iteration 50970/ 173500 | consumed samples: 13048320 | consumed tokens: 26722959360 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.564902E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.128 | TFLOPs: 12.02 | 7: iteration 50980/ 173500 | consumed samples: 13050880 | consumed tokens: 26728202240 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.563937E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.951 | TFLOPs: 11.88 | 7: iteration 50990/ 173500 | consumed samples: 13053440 | consumed tokens: 26733445120 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.563793E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.377 | TFLOPs: 11.83 | 7: iteration 51000/ 173500 | consumed samples: 13056000 | consumed tokens: 26738688000 | elapsed time per iteration (s): 0.08 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.552736E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.702 | TFLOPs: 11.93 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 51000 | lm loss value: 4.439434E+00 | lm loss PPL: 8.472698E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 51000 to checkpoints_14m91b100m 0: [2023-03-17 01:30:57,262] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step51000 is begin to save! 0: [2023-03-17 01:30:57,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:30:57,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:30:57,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:30:57,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:30:57,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:30:57,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:30:57,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:30:57,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:30:57,300] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:30:57,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:30:57,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:30:57,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:30:57,304] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step51000/mp_rank_00_model_states.pt 0: [2023-03-17 01:30:57,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:30:57,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:30:57,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:30:57,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:30:57,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2023-03-17 01:30:57,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2023-03-17 01:30:57,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:30:57,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:30:57,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: successfully saved checkpoint at iteration 51000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.77 7: iteration 51010/ 173500 | consumed samples: 13058560 | consumed tokens: 26743930880 | elapsed time per iteration (s): 0.09 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 4.558247E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.769 | TFLOPs: 10.43 | 7: iteration 51020/ 173500 | consumed samples: 13061120 | consumed tokens: 26749173760 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.567015E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.252 | TFLOPs: 11.91 | 7: iteration 51030/ 173500 | consumed samples: 13063680 | consumed tokens: 26754416640 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.555558E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.601 | TFLOPs: 11.90 | 7: iteration 51040/ 173500 | consumed samples: 13066240 | consumed tokens: 26759659520 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.547696E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.345 | TFLOPs: 11.96 | 7: iteration 51050/ 173500 | consumed samples: 13068800 | consumed tokens: 26764902400 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.555024E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.595 | TFLOPs: 11.98 | 7: iteration 51060/ 173500 | consumed samples: 13071360 | consumed tokens: 26770145280 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.555429E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.733 | TFLOPs: 12.02 | 7: iteration 51070/ 173500 | consumed samples: 13073920 | consumed tokens: 26775388160 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.568056E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.006 | TFLOPs: 11.94 | 7: iteration 51080/ 173500 | consumed samples: 13076480 | consumed tokens: 26780631040 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.568008E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.320 | TFLOPs: 11.89 | 7: iteration 51090/ 173500 | consumed samples: 13079040 | consumed tokens: 26785873920 | elapsed time per iteration (s): 0.08 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 4.562681E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.509 | TFLOPs: 11.92 | 7: iteration 51100/ 173500 | consumed samples: 13081600 | consumed tokens: 26791116800 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.555655E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.371 | TFLOPs: 11.90 | 7: iteration 51110/ 173500 | consumed samples: 13084160 | consumed tokens: 26796359680 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.570213E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.373 | TFLOPs: 11.93 | 7: iteration 51120/ 173500 | consumed samples: 13086720 | consumed tokens: 26801602560 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.554522E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.748 | TFLOPs: 11.92 | 7: iteration 51130/ 173500 | consumed samples: 13089280 | consumed tokens: 26806845440 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.559311E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.148 | TFLOPs: 11.90 | 7: iteration 51140/ 173500 | consumed samples: 13091840 | consumed tokens: 26812088320 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.558818E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.497 | TFLOPs: 11.89 | 7: iteration 51150/ 173500 | consumed samples: 13094400 | consumed tokens: 26817331200 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.548396E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.217 | TFLOPs: 11.88 | 7: iteration 51160/ 173500 | consumed samples: 13096960 | consumed tokens: 26822574080 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.567359E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.447 | TFLOPs: 11.93 | 7: iteration 51170/ 173500 | consumed samples: 13099520 | consumed tokens: 26827816960 | elapsed time per iteration (s): 0.08 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 4.565571E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.475 | TFLOPs: 11.89 | 7: iteration 51180/ 173500 | consumed samples: 13102080 | consumed tokens: 26833059840 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.549107E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.257 | TFLOPs: 11.91 | 7: iteration 51190/ 173500 | consumed samples: 13104640 | consumed tokens: 26838302720 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.568722E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.234 | TFLOPs: 11.93 | 7: iteration 51200/ 173500 | consumed samples: 13107200 | consumed tokens: 26843545600 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.554213E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.941 | TFLOPs: 11.84 | 7: iteration 51210/ 173500 | consumed samples: 13109760 | consumed tokens: 26848788480 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.549302E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.467 | TFLOPs: 11.99 | 7: iteration 51220/ 173500 | consumed samples: 13112320 | consumed tokens: 26854031360 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.538751E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.784 | TFLOPs: 11.78 | 7: iteration 51230/ 173500 | consumed samples: 13114880 | consumed tokens: 26859274240 | elapsed time per iteration (s): 0.10 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.563174E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.871 | TFLOPs: 9.46 | 7: iteration 51240/ 173500 | consumed samples: 13117440 | consumed tokens: 26864517120 | elapsed time per iteration (s): 0.08 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 4.559424E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.579 | TFLOPs: 12.02 | 7: iteration 51250/ 173500 | consumed samples: 13120000 | consumed tokens: 26869760000 | elapsed time per iteration (s): 0.08 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.567905E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.226 | TFLOPs: 11.87 | 7: iteration 51260/ 173500 | consumed samples: 13122560 | consumed tokens: 26875002880 | elapsed time per iteration (s): 0.08 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.553639E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.412 | TFLOPs: 12.01 | 7: iteration 51270/ 173500 | consumed samples: 13125120 | consumed tokens: 26880245760 | elapsed time per iteration (s): 0.08 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.562344E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.500 | TFLOPs: 11.96 | 7: iteration 51280/ 173500 | consumed samples: 13127680 | consumed tokens: 26885488640 | elapsed time per iteration (s): 0.08 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.562447E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.605 | TFLOPs: 11.22 | 7: iteration 51290/ 173500 | consumed samples: 13130240 | consumed tokens: 26890731520 | elapsed time per iteration (s): 0.09 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.562264E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2732.764 | TFLOPs: 10.16 | 7: iteration 51300/ 173500 | consumed samples: 13132800 | consumed tokens: 26895974400 | elapsed time per iteration (s): 0.09 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.572258E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2718.340 | TFLOPs: 10.11 | 7: iteration 51310/ 173500 | consumed samples: 13135360 | consumed tokens: 26901217280 | elapsed time per iteration (s): 0.10 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.550348E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2684.706 | TFLOPs: 9.99 | 7: iteration 51320/ 173500 | consumed samples: 13137920 | consumed tokens: 26906460160 | elapsed time per iteration (s): 0.09 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 4.559977E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.528 | TFLOPs: 10.32 | 7: iteration 51330/ 173500 | consumed samples: 13140480 | consumed tokens: 26911703040 | elapsed time per iteration (s): 0.10 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.558084E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.167 | TFLOPs: 9.49 | 7: iteration 51340/ 173500 | consumed samples: 13143040 | consumed tokens: 26916945920 | elapsed time per iteration (s): 0.10 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.564647E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.788 | TFLOPs: 9.30 | 7: iteration 51350/ 173500 | consumed samples: 13145600 | consumed tokens: 26922188800 | elapsed time per iteration (s): 0.10 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.546777E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2628.703 | TFLOPs: 9.78 | 7: iteration 51360/ 173500 | consumed samples: 13148160 | consumed tokens: 26927431680 | elapsed time per iteration (s): 0.08 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.558873E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.990 | TFLOPs: 11.78 | 7: iteration 51370/ 173500 | consumed samples: 13150720 | consumed tokens: 26932674560 | elapsed time per iteration (s): 0.08 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.561646E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.807 | TFLOPs: 11.93 | 7: iteration 51380/ 173500 | consumed samples: 13153280 | consumed tokens: 26937917440 | elapsed time per iteration (s): 0.08 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.542786E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.773 | TFLOPs: 11.96 | 7: iteration 51390/ 173500 | consumed samples: 13155840 | consumed tokens: 26943160320 | elapsed time per iteration (s): 0.08 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.555590E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.228 | TFLOPs: 11.91 | 7: iteration 51400/ 173500 | consumed samples: 13158400 | consumed tokens: 26948403200 | elapsed time per iteration (s): 0.08 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 4.564346E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.191 | TFLOPs: 12.01 | 7: iteration 51410/ 173500 | consumed samples: 13160960 | consumed tokens: 26953646080 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.544511E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.412 | TFLOPs: 11.98 | 7: iteration 51420/ 173500 | consumed samples: 13163520 | consumed tokens: 26958888960 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.552840E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.990 | TFLOPs: 12.01 | 7: iteration 51430/ 173500 | consumed samples: 13166080 | consumed tokens: 26964131840 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.554844E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.871 | TFLOPs: 11.96 | 7: iteration 51440/ 173500 | consumed samples: 13168640 | consumed tokens: 26969374720 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.563268E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.594 | TFLOPs: 11.90 | 7: iteration 51450/ 173500 | consumed samples: 13171200 | consumed tokens: 26974617600 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.573550E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.976 | TFLOPs: 11.70 | 7: iteration 51460/ 173500 | consumed samples: 13173760 | consumed tokens: 26979860480 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.557431E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.847 | TFLOPs: 12.00 | 7: iteration 51470/ 173500 | consumed samples: 13176320 | consumed tokens: 26985103360 | elapsed time per iteration (s): 0.08 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 4.545867E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.589 | TFLOPs: 12.02 | 7: iteration 51480/ 173500 | consumed samples: 13178880 | consumed tokens: 26990346240 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.564090E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.329 | TFLOPs: 11.42 | 7: iteration 51490/ 173500 | consumed samples: 13181440 | consumed tokens: 26995589120 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.562510E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.858 | TFLOPs: 12.02 | 7: iteration 51500/ 173500 | consumed samples: 13184000 | consumed tokens: 27000832000 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.558214E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.642 | TFLOPs: 11.94 | 7: iteration 51510/ 173500 | consumed samples: 13186560 | consumed tokens: 27006074880 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.558927E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.939 | TFLOPs: 11.97 | 7: iteration 51520/ 173500 | consumed samples: 13189120 | consumed tokens: 27011317760 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.568345E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.097 | TFLOPs: 11.98 | 7: iteration 51530/ 173500 | consumed samples: 13191680 | consumed tokens: 27016560640 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.554502E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.486 | TFLOPs: 11.97 | 7: iteration 51540/ 173500 | consumed samples: 13194240 | consumed tokens: 27021803520 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.544937E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.492 | TFLOPs: 11.99 | 7: iteration 51550/ 173500 | consumed samples: 13196800 | consumed tokens: 27027046400 | elapsed time per iteration (s): 0.08 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 4.552229E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.011 | TFLOPs: 12.00 | 7: iteration 51560/ 173500 | consumed samples: 13199360 | consumed tokens: 27032289280 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.554203E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.246 | TFLOPs: 11.84 | 7: iteration 51570/ 173500 | consumed samples: 13201920 | consumed tokens: 27037532160 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.569230E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.074 | TFLOPs: 11.87 | 7: iteration 51580/ 173500 | consumed samples: 13204480 | consumed tokens: 27042775040 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.554674E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.616 | TFLOPs: 11.92 | 7: iteration 51590/ 173500 | consumed samples: 13207040 | consumed tokens: 27048017920 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.546513E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.998 | TFLOPs: 11.61 | 7: iteration 51600/ 173500 | consumed samples: 13209600 | consumed tokens: 27053260800 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.549604E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.586 | TFLOPs: 11.93 | 7: iteration 51610/ 173500 | consumed samples: 13212160 | consumed tokens: 27058503680 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.563071E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.743 | TFLOPs: 11.66 | 7: iteration 51620/ 173500 | consumed samples: 13214720 | consumed tokens: 27063746560 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.553982E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.108 | TFLOPs: 12.00 | 7: iteration 51630/ 173500 | consumed samples: 13217280 | consumed tokens: 27068989440 | elapsed time per iteration (s): 0.08 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 4.563247E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.455 | TFLOPs: 12.00 | 7: iteration 51640/ 173500 | consumed samples: 13219840 | consumed tokens: 27074232320 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.566435E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.861 | TFLOPs: 12.01 | 7: iteration 51650/ 173500 | consumed samples: 13222400 | consumed tokens: 27079475200 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.556738E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.952 | TFLOPs: 11.43 | 7: iteration 51660/ 173500 | consumed samples: 13224960 | consumed tokens: 27084718080 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.562681E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.625 | TFLOPs: 11.98 | 7: iteration 51670/ 173500 | consumed samples: 13227520 | consumed tokens: 27089960960 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.562357E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.348 | TFLOPs: 11.74 | 7: iteration 51680/ 173500 | consumed samples: 13230080 | consumed tokens: 27095203840 | elapsed time per iteration (s): 0.09 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.568902E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.302 | TFLOPs: 11.04 | 7: iteration 51690/ 173500 | consumed samples: 13232640 | consumed tokens: 27100446720 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.554951E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.105 | TFLOPs: 11.34 | 7: iteration 51700/ 173500 | consumed samples: 13235200 | consumed tokens: 27105689600 | elapsed time per iteration (s): 0.08 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 4.562743E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.115 | TFLOPs: 11.23 | 7: iteration 51710/ 173500 | consumed samples: 13237760 | consumed tokens: 27110932480 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.554145E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.345 | TFLOPs: 8.41 | 7: iteration 51720/ 173500 | consumed samples: 13240320 | consumed tokens: 27116175360 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.561044E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.691 | TFLOPs: 9.05 | 7: iteration 51730/ 173500 | consumed samples: 13242880 | consumed tokens: 27121418240 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.568063E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.293 | TFLOPs: 9.05 | 7: iteration 51740/ 173500 | consumed samples: 13245440 | consumed tokens: 27126661120 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.552370E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.192 | TFLOPs: 8.88 | 7: iteration 51750/ 173500 | consumed samples: 13248000 | consumed tokens: 27131904000 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.554816E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2425.012 | TFLOPs: 9.02 | 7: iteration 51760/ 173500 | consumed samples: 13250560 | consumed tokens: 27137146880 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.550200E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.164 | TFLOPs: 8.88 | 7: iteration 51770/ 173500 | consumed samples: 13253120 | consumed tokens: 27142389760 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.547102E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2294.166 | TFLOPs: 8.53 | 7: iteration 51780/ 173500 | consumed samples: 13255680 | consumed tokens: 27147632640 | elapsed time per iteration (s): 0.11 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.555151E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.880 | TFLOPs: 8.88 | 7: iteration 51790/ 173500 | consumed samples: 13258240 | consumed tokens: 27152875520 | elapsed time per iteration (s): 0.11 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.554182E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2358.587 | TFLOPs: 8.77 | 7: iteration 51800/ 173500 | consumed samples: 13260800 | consumed tokens: 27158118400 | elapsed time per iteration (s): 0.11 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.563322E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2288.863 | TFLOPs: 8.51 | 7: iteration 51810/ 173500 | consumed samples: 13263360 | consumed tokens: 27163361280 | elapsed time per iteration (s): 0.11 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.558718E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.494 | TFLOPs: 8.56 | 7: iteration 51820/ 173500 | consumed samples: 13265920 | consumed tokens: 27168604160 | elapsed time per iteration (s): 0.11 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.556812E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2295.284 | TFLOPs: 8.54 | 7: iteration 51830/ 173500 | consumed samples: 13268480 | consumed tokens: 27173847040 | elapsed time per iteration (s): 0.11 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.568060E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2342.588 | TFLOPs: 8.71 | 7: iteration 51840/ 173500 | consumed samples: 13271040 | consumed tokens: 27179089920 | elapsed time per iteration (s): 0.09 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.573206E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.041 | TFLOPs: 10.10 | 7: iteration 51850/ 173500 | consumed samples: 13273600 | consumed tokens: 27184332800 | elapsed time per iteration (s): 0.08 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.553793E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.787 | TFLOPs: 11.85 | 7: iteration 51860/ 173500 | consumed samples: 13276160 | consumed tokens: 27189575680 | elapsed time per iteration (s): 0.08 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 4.569317E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.974 | TFLOPs: 11.86 | 7: iteration 51870/ 173500 | consumed samples: 13278720 | consumed tokens: 27194818560 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.561784E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.974 | TFLOPs: 11.31 | 7: iteration 51880/ 173500 | consumed samples: 13281280 | consumed tokens: 27200061440 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.559470E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.101 | TFLOPs: 11.90 | 7: iteration 51890/ 173500 | consumed samples: 13283840 | consumed tokens: 27205304320 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.555807E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.581 | TFLOPs: 11.94 | 7: iteration 51900/ 173500 | consumed samples: 13286400 | consumed tokens: 27210547200 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.555019E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.673 | TFLOPs: 11.86 | 7: iteration 51910/ 173500 | consumed samples: 13288960 | consumed tokens: 27215790080 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.567308E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.870 | TFLOPs: 11.66 | 7: iteration 51920/ 173500 | consumed samples: 13291520 | consumed tokens: 27221032960 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.562831E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.086 | TFLOPs: 11.89 | 7: iteration 51930/ 173500 | consumed samples: 13294080 | consumed tokens: 27226275840 | elapsed time per iteration (s): 0.08 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 4.556566E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.616 | TFLOPs: 11.92 | 7: iteration 51940/ 173500 | consumed samples: 13296640 | consumed tokens: 27231518720 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.562318E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.925 | TFLOPs: 11.84 | 7: iteration 51950/ 173500 | consumed samples: 13299200 | consumed tokens: 27236761600 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.556024E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.178 | TFLOPs: 11.90 | 7: iteration 51960/ 173500 | consumed samples: 13301760 | consumed tokens: 27242004480 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.562072E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.772 | TFLOPs: 11.63 | 7: iteration 51970/ 173500 | consumed samples: 13304320 | consumed tokens: 27247247360 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.554415E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.605 | TFLOPs: 11.40 | 7: iteration 51980/ 173500 | consumed samples: 13306880 | consumed tokens: 27252490240 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.572168E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.789 | TFLOPs: 11.95 | 7: iteration 51990/ 173500 | consumed samples: 13309440 | consumed tokens: 27257733120 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.550378E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.831 | TFLOPs: 11.93 | 0: [2023-03-17 01:32:22,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=0, lr=[0.00016457056203724818, 0.00016457056203724818, 0.00016457056203724818], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 52000/ 173500 | consumed samples: 13312000 | consumed tokens: 27262976000 | elapsed time per iteration (s): 0.08 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.553554E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.608 | TFLOPs: 11.87 | 0: steps: 52000 loss: 4.5562 iter time (s): 0.092 samples/sec: 2775.633 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 52000 | lm loss value: 4.437653E+00 | lm loss PPL: 8.457617E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 52000 to checkpoints_14m91b100m 0: [2023-03-17 01:32:22,974] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step52000 is begin to save! 0: [2023-03-17 01:32:22,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:32:23,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:32:23,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:32:23,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:32:23,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:32:23,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:32:23,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:32:23,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:32:23,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:32:23,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:32:23,015] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:32:23,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:32:23,016] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step52000/mp_rank_00_model_states.pt 0: [2023-03-17 01:32:23,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:32:23,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:32:23,034] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:32:23,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:32:23,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:32:23,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:32:23,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2023-03-17 01:32:23,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2023-03-17 01:32:23,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:32:23,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2023-03-17 01:32:23,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2023-03-17 01:32:23,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2023-03-17 01:32:23,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:32:23,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:32:23,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: successfully saved checkpoint at iteration 52000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.68 7: iteration 52010/ 173500 | consumed samples: 13314560 | consumed tokens: 27268218880 | elapsed time per iteration (s): 0.10 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 4.562242E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2682.557 | TFLOPs: 9.98 | 7: iteration 52020/ 173500 | consumed samples: 13317120 | consumed tokens: 27273461760 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.563782E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.031 | TFLOPs: 11.98 | 7: iteration 52030/ 173500 | consumed samples: 13319680 | consumed tokens: 27278704640 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.547806E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.597 | TFLOPs: 11.77 | 7: iteration 52040/ 173500 | consumed samples: 13322240 | consumed tokens: 27283947520 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.551093E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.712 | TFLOPs: 11.76 | 7: iteration 52050/ 173500 | consumed samples: 13324800 | consumed tokens: 27289190400 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.563217E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.826 | TFLOPs: 11.99 | 7: iteration 52060/ 173500 | consumed samples: 13327360 | consumed tokens: 27294433280 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.545248E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.479 | TFLOPs: 12.04 | 7: iteration 52070/ 173500 | consumed samples: 13329920 | consumed tokens: 27299676160 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.570236E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.260 | TFLOPs: 12.07 | 7: iteration 52080/ 173500 | consumed samples: 13332480 | consumed tokens: 27304919040 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.567641E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.319 | TFLOPs: 12.05 | 7: iteration 52090/ 173500 | consumed samples: 13335040 | consumed tokens: 27310161920 | elapsed time per iteration (s): 0.08 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 4.575124E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.744 | TFLOPs: 12.08 | 7: iteration 52100/ 173500 | consumed samples: 13337600 | consumed tokens: 27315404800 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.546700E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.122 | TFLOPs: 12.11 | 7: iteration 52110/ 173500 | consumed samples: 13340160 | consumed tokens: 27320647680 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.542854E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.097 | TFLOPs: 12.03 | 7: iteration 52120/ 173500 | consumed samples: 13342720 | consumed tokens: 27325890560 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.551095E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.487 | TFLOPs: 12.11 | 7: iteration 52130/ 173500 | consumed samples: 13345280 | consumed tokens: 27331133440 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.564212E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.510 | TFLOPs: 11.67 | 7: iteration 52140/ 173500 | consumed samples: 13347840 | consumed tokens: 27336376320 | elapsed time per iteration (s): 0.09 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.551452E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2937.445 | TFLOPs: 10.93 | 7: iteration 52150/ 173500 | consumed samples: 13350400 | consumed tokens: 27341619200 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.561197E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.562 | TFLOPs: 12.10 | 7: iteration 52160/ 173500 | consumed samples: 13352960 | consumed tokens: 27346862080 | elapsed time per iteration (s): 0.08 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 4.549495E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.761 | TFLOPs: 12.10 | 7: iteration 52170/ 173500 | consumed samples: 13355520 | consumed tokens: 27352104960 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.551876E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.412 | TFLOPs: 11.47 | 7: iteration 52180/ 173500 | consumed samples: 13358080 | consumed tokens: 27357347840 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.533621E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.487 | TFLOPs: 12.01 | 7: iteration 52190/ 173500 | consumed samples: 13360640 | consumed tokens: 27362590720 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.566918E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.074 | TFLOPs: 11.94 | 7: iteration 52200/ 173500 | consumed samples: 13363200 | consumed tokens: 27367833600 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.563933E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.681 | TFLOPs: 11.98 | 7: iteration 52210/ 173500 | consumed samples: 13365760 | consumed tokens: 27373076480 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.556291E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.484 | TFLOPs: 11.89 | 7: iteration 52220/ 173500 | consumed samples: 13368320 | consumed tokens: 27378319360 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.563851E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.876 | TFLOPs: 11.96 | 7: iteration 52230/ 173500 | consumed samples: 13370880 | consumed tokens: 27383562240 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.546982E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.747 | TFLOPs: 11.96 | 7: iteration 52240/ 173500 | consumed samples: 13373440 | consumed tokens: 27388805120 | elapsed time per iteration (s): 0.08 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 4.545931E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.763 | TFLOPs: 11.89 | 7: iteration 52250/ 173500 | consumed samples: 13376000 | consumed tokens: 27394048000 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.549518E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.635 | TFLOPs: 11.95 | 7: iteration 52260/ 173500 | consumed samples: 13378560 | consumed tokens: 27399290880 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.564906E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.069 | TFLOPs: 11.58 | 7: iteration 52270/ 173500 | consumed samples: 13381120 | consumed tokens: 27404533760 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.552293E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.069 | TFLOPs: 11.79 | 7: iteration 52280/ 173500 | consumed samples: 13383680 | consumed tokens: 27409776640 | elapsed time per iteration (s): 0.09 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.554765E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.352 | TFLOPs: 10.57 | 7: iteration 52290/ 173500 | consumed samples: 13386240 | consumed tokens: 27415019520 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.571022E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.576 | TFLOPs: 11.62 | 7: iteration 52300/ 173500 | consumed samples: 13388800 | consumed tokens: 27420262400 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.557924E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.122 | TFLOPs: 11.92 | 7: iteration 52310/ 173500 | consumed samples: 13391360 | consumed tokens: 27425505280 | elapsed time per iteration (s): 0.09 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.555325E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2793.986 | TFLOPs: 10.39 | 7: iteration 52320/ 173500 | consumed samples: 13393920 | consumed tokens: 27430748160 | elapsed time per iteration (s): 0.08 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 4.541849E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.246 | TFLOPs: 11.91 | 7: iteration 52330/ 173500 | consumed samples: 13396480 | consumed tokens: 27435991040 | elapsed time per iteration (s): 0.13 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.565442E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1966.681 | TFLOPs: 7.32 | 7: iteration 52340/ 173500 | consumed samples: 13399040 | consumed tokens: 27441233920 | elapsed time per iteration (s): 0.10 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.568158E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2456.808 | TFLOPs: 9.14 | 7: iteration 52350/ 173500 | consumed samples: 13401600 | consumed tokens: 27446476800 | elapsed time per iteration (s): 0.12 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.557360E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2150.402 | TFLOPs: 8.00 | 7: iteration 52360/ 173500 | consumed samples: 13404160 | consumed tokens: 27451719680 | elapsed time per iteration (s): 0.13 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.555436E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1961.155 | TFLOPs: 7.29 | 7: iteration 52370/ 173500 | consumed samples: 13406720 | consumed tokens: 27456962560 | elapsed time per iteration (s): 0.11 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.548625E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2230.734 | TFLOPs: 8.30 | 7: iteration 52380/ 173500 | consumed samples: 13409280 | consumed tokens: 27462205440 | elapsed time per iteration (s): 0.13 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.568347E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2034.150 | TFLOPs: 7.57 | 7: iteration 52390/ 173500 | consumed samples: 13411840 | consumed tokens: 27467448320 | elapsed time per iteration (s): 0.12 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 4.551464E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.528 | TFLOPs: 7.81 | 7: iteration 52400/ 173500 | consumed samples: 13414400 | consumed tokens: 27472691200 | elapsed time per iteration (s): 0.12 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.553330E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2142.870 | TFLOPs: 7.97 | 7: iteration 52410/ 173500 | consumed samples: 13416960 | consumed tokens: 27477934080 | elapsed time per iteration (s): 0.10 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.554058E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.453 | TFLOPs: 9.29 | 7: iteration 52420/ 173500 | consumed samples: 13419520 | consumed tokens: 27483176960 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.564357E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.039 | TFLOPs: 12.00 | 7: iteration 52430/ 173500 | consumed samples: 13422080 | consumed tokens: 27488419840 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.569442E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.824 | TFLOPs: 12.09 | 7: iteration 52440/ 173500 | consumed samples: 13424640 | consumed tokens: 27493662720 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.556540E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.017 | TFLOPs: 12.10 | 7: iteration 52450/ 173500 | consumed samples: 13427200 | consumed tokens: 27498905600 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.556776E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.451 | TFLOPs: 12.08 | 7: iteration 52460/ 173500 | consumed samples: 13429760 | consumed tokens: 27504148480 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.560825E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.515 | TFLOPs: 12.12 | 7: iteration 52470/ 173500 | consumed samples: 13432320 | consumed tokens: 27509391360 | elapsed time per iteration (s): 0.08 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 4.563336E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.731 | TFLOPs: 12.01 | 7: iteration 52480/ 173500 | consumed samples: 13434880 | consumed tokens: 27514634240 | elapsed time per iteration (s): 0.08 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.563480E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.443 | TFLOPs: 12.08 | 7: iteration 52490/ 173500 | consumed samples: 13437440 | consumed tokens: 27519877120 | elapsed time per iteration (s): 0.08 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.564559E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.728 | TFLOPs: 12.11 | 7: iteration 52500/ 173500 | consumed samples: 13440000 | consumed tokens: 27525120000 | elapsed time per iteration (s): 0.08 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.554000E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.015 | TFLOPs: 12.06 | 7: iteration 52510/ 173500 | consumed samples: 13442560 | consumed tokens: 27530362880 | elapsed time per iteration (s): 0.12 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.563864E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.769 | TFLOPs: 8.18 | 7: iteration 52520/ 173500 | consumed samples: 13445120 | consumed tokens: 27535605760 | elapsed time per iteration (s): 0.15 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.567242E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1756.786 | TFLOPs: 6.53 | 7: iteration 52530/ 173500 | consumed samples: 13447680 | consumed tokens: 27540848640 | elapsed time per iteration (s): 0.13 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.565714E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2046.056 | TFLOPs: 7.61 | 7: iteration 52540/ 173500 | consumed samples: 13450240 | consumed tokens: 27546091520 | elapsed time per iteration (s): 0.08 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 4.548863E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.572 | TFLOPs: 12.08 | 7: iteration 52550/ 173500 | consumed samples: 13452800 | consumed tokens: 27551334400 | elapsed time per iteration (s): 0.08 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.556591E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.591 | TFLOPs: 12.05 | 7: iteration 52560/ 173500 | consumed samples: 13455360 | consumed tokens: 27556577280 | elapsed time per iteration (s): 0.08 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.560440E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.649 | TFLOPs: 12.11 | 7: iteration 52570/ 173500 | consumed samples: 13457920 | consumed tokens: 27561820160 | elapsed time per iteration (s): 0.08 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.558742E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.479 | TFLOPs: 11.99 | 7: iteration 52580/ 173500 | consumed samples: 13460480 | consumed tokens: 27567063040 | elapsed time per iteration (s): 0.08 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.564512E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.185 | TFLOPs: 11.96 | 7: iteration 52590/ 173500 | consumed samples: 13463040 | consumed tokens: 27572305920 | elapsed time per iteration (s): 0.10 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.550067E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2657.327 | TFLOPs: 9.88 | 7: iteration 52600/ 173500 | consumed samples: 13465600 | consumed tokens: 27577548800 | elapsed time per iteration (s): 0.08 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.544009E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.292 | TFLOPs: 11.47 | 7: iteration 52610/ 173500 | consumed samples: 13468160 | consumed tokens: 27582791680 | elapsed time per iteration (s): 0.10 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.565398E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2563.290 | TFLOPs: 9.53 | 7: iteration 52620/ 173500 | consumed samples: 13470720 | consumed tokens: 27588034560 | elapsed time per iteration (s): 0.11 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 4.558957E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2317.675 | TFLOPs: 8.62 | 7: iteration 52630/ 173500 | consumed samples: 13473280 | consumed tokens: 27593277440 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.545386E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2329.075 | TFLOPs: 8.66 | 7: iteration 52640/ 173500 | consumed samples: 13475840 | consumed tokens: 27598520320 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.556161E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2268.837 | TFLOPs: 8.44 | 7: iteration 52650/ 173500 | consumed samples: 13478400 | consumed tokens: 27603763200 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.566009E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.671 | TFLOPs: 8.62 | 7: iteration 52660/ 173500 | consumed samples: 13480960 | consumed tokens: 27609006080 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.552924E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.933 | TFLOPs: 8.50 | 7: iteration 52670/ 173500 | consumed samples: 13483520 | consumed tokens: 27614248960 | elapsed time per iteration (s): 0.11 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.554454E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.150 | TFLOPs: 8.56 | 7: iteration 52680/ 173500 | consumed samples: 13486080 | consumed tokens: 27619491840 | elapsed time per iteration (s): 0.12 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.554351E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2175.574 | TFLOPs: 8.09 | 7: iteration 52690/ 173500 | consumed samples: 13488640 | consumed tokens: 27624734720 | elapsed time per iteration (s): 0.09 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 4.549925E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.418 | TFLOPs: 10.19 | 7: iteration 52700/ 173500 | consumed samples: 13491200 | consumed tokens: 27629977600 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.551909E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.973 | TFLOPs: 10.54 | 7: iteration 52710/ 173500 | consumed samples: 13493760 | consumed tokens: 27635220480 | elapsed time per iteration (s): 0.08 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.565796E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.109 | TFLOPs: 11.47 | 7: iteration 52720/ 173500 | consumed samples: 13496320 | consumed tokens: 27640463360 | elapsed time per iteration (s): 0.08 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.544526E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.862 | TFLOPs: 11.88 | 7: iteration 52730/ 173500 | consumed samples: 13498880 | consumed tokens: 27645706240 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.555033E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.246 | TFLOPs: 10.39 | 7: iteration 52740/ 173500 | consumed samples: 13501440 | consumed tokens: 27650949120 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.552427E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.444 | TFLOPs: 10.30 | 7: iteration 52750/ 173500 | consumed samples: 13504000 | consumed tokens: 27656192000 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.552880E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.385 | TFLOPs: 10.36 | 7: iteration 52760/ 173500 | consumed samples: 13506560 | consumed tokens: 27661434880 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.544786E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.281 | TFLOPs: 10.40 | 7: iteration 52770/ 173500 | consumed samples: 13509120 | consumed tokens: 27666677760 | elapsed time per iteration (s): 0.09 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 4.560713E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.183 | TFLOPs: 10.13 | 7: iteration 52780/ 173500 | consumed samples: 13511680 | consumed tokens: 27671920640 | elapsed time per iteration (s): 0.09 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.562029E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.564 | TFLOPs: 10.39 | 7: iteration 52790/ 173500 | consumed samples: 13514240 | consumed tokens: 27677163520 | elapsed time per iteration (s): 0.09 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.551767E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.812 | TFLOPs: 10.22 | 7: iteration 52800/ 173500 | consumed samples: 13516800 | consumed tokens: 27682406400 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.559479E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.303 | TFLOPs: 11.51 | 7: iteration 52810/ 173500 | consumed samples: 13519360 | consumed tokens: 27687649280 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.546919E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.363 | TFLOPs: 11.94 | 7: iteration 52820/ 173500 | consumed samples: 13521920 | consumed tokens: 27692892160 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.569343E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.000 | TFLOPs: 11.93 | 7: iteration 52830/ 173500 | consumed samples: 13524480 | consumed tokens: 27698135040 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.554812E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.686 | TFLOPs: 11.95 | 7: iteration 52840/ 173500 | consumed samples: 13527040 | consumed tokens: 27703377920 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.553730E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.710 | TFLOPs: 11.93 | 7: iteration 52850/ 173500 | consumed samples: 13529600 | consumed tokens: 27708620800 | elapsed time per iteration (s): 0.08 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 4.551748E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.330 | TFLOPs: 11.87 | 7: iteration 52860/ 173500 | consumed samples: 13532160 | consumed tokens: 27713863680 | elapsed time per iteration (s): 0.08 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.559299E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.754 | TFLOPs: 11.94 | 7: iteration 52870/ 173500 | consumed samples: 13534720 | consumed tokens: 27719106560 | elapsed time per iteration (s): 0.08 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.555641E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.034 | TFLOPs: 11.95 | 7: iteration 52880/ 173500 | consumed samples: 13537280 | consumed tokens: 27724349440 | elapsed time per iteration (s): 0.08 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.552908E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.746 | TFLOPs: 11.91 | 7: iteration 52890/ 173500 | consumed samples: 13539840 | consumed tokens: 27729592320 | elapsed time per iteration (s): 0.08 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.563298E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.557 | TFLOPs: 11.94 | 7: iteration 52900/ 173500 | consumed samples: 13542400 | consumed tokens: 27734835200 | elapsed time per iteration (s): 0.09 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.561261E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2889.525 | TFLOPs: 10.75 | 7: iteration 52910/ 173500 | consumed samples: 13544960 | consumed tokens: 27740078080 | elapsed time per iteration (s): 0.10 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.556372E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2669.754 | TFLOPs: 9.93 | 7: iteration 52920/ 173500 | consumed samples: 13547520 | consumed tokens: 27745320960 | elapsed time per iteration (s): 0.08 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 4.560254E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.162 | TFLOPs: 11.91 | 7: iteration 52930/ 173500 | consumed samples: 13550080 | consumed tokens: 27750563840 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.558825E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.945 | TFLOPs: 11.70 | 7: iteration 52940/ 173500 | consumed samples: 13552640 | consumed tokens: 27755806720 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.550883E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.472 | TFLOPs: 11.91 | 7: iteration 52950/ 173500 | consumed samples: 13555200 | consumed tokens: 27761049600 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.556023E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.650 | TFLOPs: 11.95 | 7: iteration 52960/ 173500 | consumed samples: 13557760 | consumed tokens: 27766292480 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.542289E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.916 | TFLOPs: 12.07 | 7: iteration 52970/ 173500 | consumed samples: 13560320 | consumed tokens: 27771535360 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.555463E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.273 | TFLOPs: 12.05 | 7: iteration 52980/ 173500 | consumed samples: 13562880 | consumed tokens: 27776778240 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.566141E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.882 | TFLOPs: 11.96 | 7: iteration 52990/ 173500 | consumed samples: 13565440 | consumed tokens: 27782021120 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.558719E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.987 | TFLOPs: 11.95 | 7: iteration 53000/ 173500 | consumed samples: 13568000 | consumed tokens: 27787264000 | elapsed time per iteration (s): 0.08 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 4.553661E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.019 | TFLOPs: 11.98 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 53000 | lm loss value: 4.449680E+00 | lm loss PPL: 8.559958E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 53000 to checkpoints_14m91b100m 0: [2023-03-17 01:33:52,188] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step53000 is begin to save! 0: [2023-03-17 01:33:52,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:33:52,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:33:52,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:33:52,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:33:52,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:33:52,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:33:52,223] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:33:52,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:33:52,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:33:52,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:33:52,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:33:52,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:33:52,229] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step53000/mp_rank_00_model_states.pt 0: [2023-03-17 01:33:52,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:33:52,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:33:52,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:33:52,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,261] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:33:52,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2023-03-17 01:33:52,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:33:52,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2023-03-17 01:33:52,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:33:52,264] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:33:52,264] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: successfully saved checkpoint at iteration 53000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.74 7: iteration 53010/ 173500 | consumed samples: 13570560 | consumed tokens: 27792506880 | elapsed time per iteration (s): 0.09 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.558655E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.677 | TFLOPs: 10.48 | 7: iteration 53020/ 173500 | consumed samples: 13573120 | consumed tokens: 27797749760 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.560023E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.951 | TFLOPs: 12.01 | 7: iteration 53030/ 173500 | consumed samples: 13575680 | consumed tokens: 27802992640 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.556094E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.432 | TFLOPs: 11.93 | 7: iteration 53040/ 173500 | consumed samples: 13578240 | consumed tokens: 27808235520 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.561905E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.567 | TFLOPs: 12.01 | 7: iteration 53050/ 173500 | consumed samples: 13580800 | consumed tokens: 27813478400 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.559681E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.028 | TFLOPs: 11.94 | 7: iteration 53060/ 173500 | consumed samples: 13583360 | consumed tokens: 27818721280 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.539276E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.950 | TFLOPs: 12.02 | 7: iteration 53070/ 173500 | consumed samples: 13585920 | consumed tokens: 27823964160 | elapsed time per iteration (s): 0.08 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.549766E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.602 | TFLOPs: 12.00 | 7: iteration 53080/ 173500 | consumed samples: 13588480 | consumed tokens: 27829207040 | elapsed time per iteration (s): 0.08 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.555211E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.065 | TFLOPs: 11.24 | 7: iteration 53090/ 173500 | consumed samples: 13591040 | consumed tokens: 27834449920 | elapsed time per iteration (s): 0.10 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.555677E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.462 | TFLOPs: 9.16 | 7: iteration 53100/ 173500 | consumed samples: 13593600 | consumed tokens: 27839692800 | elapsed time per iteration (s): 0.11 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.551958E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.933 | TFLOPs: 8.53 | 7: iteration 53110/ 173500 | consumed samples: 13596160 | consumed tokens: 27844935680 | elapsed time per iteration (s): 0.11 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.552271E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.205 | TFLOPs: 8.98 | 7: iteration 53120/ 173500 | consumed samples: 13598720 | consumed tokens: 27850178560 | elapsed time per iteration (s): 0.11 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.558286E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.923 | TFLOPs: 9.04 | 7: iteration 53130/ 173500 | consumed samples: 13601280 | consumed tokens: 27855421440 | elapsed time per iteration (s): 0.08 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.570369E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.412 | TFLOPs: 12.02 | 7: iteration 53140/ 173500 | consumed samples: 13603840 | consumed tokens: 27860664320 | elapsed time per iteration (s): 0.08 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.571097E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.854 | TFLOPs: 11.82 | 7: iteration 53150/ 173500 | consumed samples: 13606400 | consumed tokens: 27865907200 | elapsed time per iteration (s): 0.09 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 4.558514E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.773 | TFLOPs: 10.63 | 7: iteration 53160/ 173500 | consumed samples: 13608960 | consumed tokens: 27871150080 | elapsed time per iteration (s): 0.08 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.559667E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.139 | TFLOPs: 11.67 | 7: iteration 53170/ 173500 | consumed samples: 13611520 | consumed tokens: 27876392960 | elapsed time per iteration (s): 0.08 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.551534E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.059 | TFLOPs: 11.47 | 7: iteration 53180/ 173500 | consumed samples: 13614080 | consumed tokens: 27881635840 | elapsed time per iteration (s): 0.09 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.563056E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2775.370 | TFLOPs: 10.32 | 7: iteration 53190/ 173500 | consumed samples: 13616640 | consumed tokens: 27886878720 | elapsed time per iteration (s): 0.13 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.549087E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2032.371 | TFLOPs: 7.56 | 7: iteration 53200/ 173500 | consumed samples: 13619200 | consumed tokens: 27892121600 | elapsed time per iteration (s): 0.09 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.557785E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.150 | TFLOPs: 10.18 | 7: iteration 53210/ 173500 | consumed samples: 13621760 | consumed tokens: 27897364480 | elapsed time per iteration (s): 0.08 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.564460E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.950 | TFLOPs: 11.90 | 7: iteration 53220/ 173500 | consumed samples: 13624320 | consumed tokens: 27902607360 | elapsed time per iteration (s): 0.08 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 4.555857E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.402 | TFLOPs: 11.93 | 7: iteration 53230/ 173500 | consumed samples: 13626880 | consumed tokens: 27907850240 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.565238E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.217 | TFLOPs: 11.95 | 7: iteration 53240/ 173500 | consumed samples: 13629440 | consumed tokens: 27913093120 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.558335E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.654 | TFLOPs: 11.41 | 7: iteration 53250/ 173500 | consumed samples: 13632000 | consumed tokens: 27918336000 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.553880E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.309 | TFLOPs: 11.97 | 7: iteration 53260/ 173500 | consumed samples: 13634560 | consumed tokens: 27923578880 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.553778E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.781 | TFLOPs: 11.70 | 7: iteration 53270/ 173500 | consumed samples: 13637120 | consumed tokens: 27928821760 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.560447E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.870 | TFLOPs: 11.28 | 7: iteration 53280/ 173500 | consumed samples: 13639680 | consumed tokens: 27934064640 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.554000E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.756 | TFLOPs: 11.40 | 7: iteration 53290/ 173500 | consumed samples: 13642240 | consumed tokens: 27939307520 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.568478E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.370 | TFLOPs: 11.83 | 7: iteration 53300/ 173500 | consumed samples: 13644800 | consumed tokens: 27944550400 | elapsed time per iteration (s): 0.08 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 4.545902E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.719 | TFLOPs: 11.92 | 7: iteration 53310/ 173500 | consumed samples: 13647360 | consumed tokens: 27949793280 | elapsed time per iteration (s): 0.09 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.557830E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.830 | TFLOPs: 11.16 | 7: iteration 53320/ 173500 | consumed samples: 13649920 | consumed tokens: 27955036160 | elapsed time per iteration (s): 0.13 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.555097E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.336 | TFLOPs: 7.46 | 7: iteration 53330/ 173500 | consumed samples: 13652480 | consumed tokens: 27960279040 | elapsed time per iteration (s): 0.13 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.552816E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2035.699 | TFLOPs: 7.57 | 7: iteration 53340/ 173500 | consumed samples: 13655040 | consumed tokens: 27965521920 | elapsed time per iteration (s): 0.08 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.559930E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.794 | TFLOPs: 11.74 | 7: iteration 53350/ 173500 | consumed samples: 13657600 | consumed tokens: 27970764800 | elapsed time per iteration (s): 0.08 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.546793E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.170 | TFLOPs: 12.01 | 7: iteration 53360/ 173500 | consumed samples: 13660160 | consumed tokens: 27976007680 | elapsed time per iteration (s): 0.08 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.541285E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.344 | TFLOPs: 12.10 | 7: iteration 53370/ 173500 | consumed samples: 13662720 | consumed tokens: 27981250560 | elapsed time per iteration (s): 0.08 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 4.546630E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.663 | TFLOPs: 11.99 | 7: iteration 53380/ 173500 | consumed samples: 13665280 | consumed tokens: 27986493440 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.554177E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.862 | TFLOPs: 12.06 | 7: iteration 53390/ 173500 | consumed samples: 13667840 | consumed tokens: 27991736320 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.557312E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.144 | TFLOPs: 12.07 | 7: iteration 53400/ 173500 | consumed samples: 13670400 | consumed tokens: 27996979200 | elapsed time per iteration (s): 0.09 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.559732E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.747 | TFLOPs: 11.05 | 7: iteration 53410/ 173500 | consumed samples: 13672960 | consumed tokens: 28002222080 | elapsed time per iteration (s): 0.10 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.561377E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2455.278 | TFLOPs: 9.13 | 7: iteration 53420/ 173500 | consumed samples: 13675520 | consumed tokens: 28007464960 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.559082E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.385 | TFLOPs: 11.56 | 7: iteration 53430/ 173500 | consumed samples: 13678080 | consumed tokens: 28012707840 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.549988E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.511 | TFLOPs: 11.26 | 7: iteration 53440/ 173500 | consumed samples: 13680640 | consumed tokens: 28017950720 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.561589E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.836 | TFLOPs: 12.05 | 7: iteration 53450/ 173500 | consumed samples: 13683200 | consumed tokens: 28023193600 | elapsed time per iteration (s): 0.08 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 4.563486E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.115 | TFLOPs: 12.08 | 7: iteration 53460/ 173500 | consumed samples: 13685760 | consumed tokens: 28028436480 | elapsed time per iteration (s): 0.08 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.555859E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.921 | TFLOPs: 11.93 | 7: iteration 53470/ 173500 | consumed samples: 13688320 | consumed tokens: 28033679360 | elapsed time per iteration (s): 0.08 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.551576E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.407 | TFLOPs: 12.00 | 7: iteration 53480/ 173500 | consumed samples: 13690880 | consumed tokens: 28038922240 | elapsed time per iteration (s): 0.11 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.560834E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2349.523 | TFLOPs: 8.74 | 7: iteration 53490/ 173500 | consumed samples: 13693440 | consumed tokens: 28044165120 | elapsed time per iteration (s): 0.08 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.543266E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.956 | TFLOPs: 12.09 | 7: iteration 53500/ 173500 | consumed samples: 13696000 | consumed tokens: 28049408000 | elapsed time per iteration (s): 0.09 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.551093E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.985 | TFLOPs: 10.38 | 7: iteration 53510/ 173500 | consumed samples: 13698560 | consumed tokens: 28054650880 | elapsed time per iteration (s): 0.10 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.553947E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2537.458 | TFLOPs: 9.44 | 7: iteration 53520/ 173500 | consumed samples: 13701120 | consumed tokens: 28059893760 | elapsed time per iteration (s): 0.08 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 4.557764E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.024 | TFLOPs: 11.76 | 7: iteration 53530/ 173500 | consumed samples: 13703680 | consumed tokens: 28065136640 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.568769E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.144 | TFLOPs: 12.06 | 7: iteration 53540/ 173500 | consumed samples: 13706240 | consumed tokens: 28070379520 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.551861E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.531 | TFLOPs: 11.80 | 7: iteration 53550/ 173500 | consumed samples: 13708800 | consumed tokens: 28075622400 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.559730E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.531 | TFLOPs: 11.71 | 7: iteration 53560/ 173500 | consumed samples: 13711360 | consumed tokens: 28080865280 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.549037E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.630 | TFLOPs: 12.11 | 7: iteration 53570/ 173500 | consumed samples: 13713920 | consumed tokens: 28086108160 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.552021E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.696 | TFLOPs: 12.11 | 7: iteration 53580/ 173500 | consumed samples: 13716480 | consumed tokens: 28091351040 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.545807E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.266 | TFLOPs: 11.92 | 7: iteration 53590/ 173500 | consumed samples: 13719040 | consumed tokens: 28096593920 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.546108E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.480 | TFLOPs: 11.76 | 7: iteration 53600/ 173500 | consumed samples: 13721600 | consumed tokens: 28101836800 | elapsed time per iteration (s): 0.08 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 4.560130E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.085 | TFLOPs: 11.99 | 7: iteration 53610/ 173500 | consumed samples: 13724160 | consumed tokens: 28107079680 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.556787E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.023 | TFLOPs: 12.00 | 7: iteration 53620/ 173500 | consumed samples: 13726720 | consumed tokens: 28112322560 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.547629E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.321 | TFLOPs: 12.05 | 7: iteration 53630/ 173500 | consumed samples: 13729280 | consumed tokens: 28117565440 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.553168E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.888 | TFLOPs: 11.79 | 7: iteration 53640/ 173500 | consumed samples: 13731840 | consumed tokens: 28122808320 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.564790E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.547 | TFLOPs: 11.42 | 7: iteration 53650/ 173500 | consumed samples: 13734400 | consumed tokens: 28128051200 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.545075E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.517 | TFLOPs: 12.02 | 7: iteration 53660/ 173500 | consumed samples: 13736960 | consumed tokens: 28133294080 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.544960E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.848 | TFLOPs: 11.74 | 7: iteration 53670/ 173500 | consumed samples: 13739520 | consumed tokens: 28138536960 | elapsed time per iteration (s): 0.08 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 4.543041E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.449 | TFLOPs: 12.04 | 7: iteration 53680/ 173500 | consumed samples: 13742080 | consumed tokens: 28143779840 | elapsed time per iteration (s): 0.08 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.555014E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.114 | TFLOPs: 11.74 | 7: iteration 53690/ 173500 | consumed samples: 13744640 | consumed tokens: 28149022720 | elapsed time per iteration (s): 0.08 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.546592E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.017 | TFLOPs: 11.77 | 7: iteration 53700/ 173500 | consumed samples: 13747200 | consumed tokens: 28154265600 | elapsed time per iteration (s): 0.09 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.561266E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.261 | TFLOPs: 11.01 | 7: iteration 53710/ 173500 | consumed samples: 13749760 | consumed tokens: 28159508480 | elapsed time per iteration (s): 0.08 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.557053E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.973 | TFLOPs: 11.75 | 7: iteration 53720/ 173500 | consumed samples: 13752320 | consumed tokens: 28164751360 | elapsed time per iteration (s): 0.08 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.550423E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.196 | TFLOPs: 12.04 | 7: iteration 53730/ 173500 | consumed samples: 13754880 | consumed tokens: 28169994240 | elapsed time per iteration (s): 0.09 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.544063E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.423 | TFLOPs: 10.04 | 7: iteration 53740/ 173500 | consumed samples: 13757440 | consumed tokens: 28175237120 | elapsed time per iteration (s): 0.13 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.549831E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1999.969 | TFLOPs: 7.44 | 7: iteration 53750/ 173500 | consumed samples: 13760000 | consumed tokens: 28180480000 | elapsed time per iteration (s): 0.13 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 4.561724E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.283 | TFLOPs: 7.30 | 7: iteration 53760/ 173500 | consumed samples: 13762560 | consumed tokens: 28185722880 | elapsed time per iteration (s): 0.10 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.553014E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2659.476 | TFLOPs: 9.89 | 7: iteration 53770/ 173500 | consumed samples: 13765120 | consumed tokens: 28190965760 | elapsed time per iteration (s): 0.08 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.551119E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.734 | TFLOPs: 11.99 | 7: iteration 53780/ 173500 | consumed samples: 13767680 | consumed tokens: 28196208640 | elapsed time per iteration (s): 0.09 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.562444E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.198 | TFLOPs: 11.15 | 7: iteration 53790/ 173500 | consumed samples: 13770240 | consumed tokens: 28201451520 | elapsed time per iteration (s): 0.08 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.553587E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.180 | TFLOPs: 11.59 | 7: iteration 53800/ 173500 | consumed samples: 13772800 | consumed tokens: 28206694400 | elapsed time per iteration (s): 0.08 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.566146E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.046 | TFLOPs: 11.62 | 7: iteration 53810/ 173500 | consumed samples: 13775360 | consumed tokens: 28211937280 | elapsed time per iteration (s): 0.08 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.545921E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.764 | TFLOPs: 11.87 | 7: iteration 53820/ 173500 | consumed samples: 13777920 | consumed tokens: 28217180160 | elapsed time per iteration (s): 0.08 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 4.561533E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.964 | TFLOPs: 11.64 | 7: iteration 53830/ 173500 | consumed samples: 13780480 | consumed tokens: 28222423040 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.539551E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.320 | TFLOPs: 11.86 | 7: iteration 53840/ 173500 | consumed samples: 13783040 | consumed tokens: 28227665920 | elapsed time per iteration (s): 0.09 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.563866E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.200 | TFLOPs: 11.11 | 7: iteration 53850/ 173500 | consumed samples: 13785600 | consumed tokens: 28232908800 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.552475E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.226 | TFLOPs: 11.84 | 7: iteration 53860/ 173500 | consumed samples: 13788160 | consumed tokens: 28238151680 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.562580E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.612 | TFLOPs: 11.84 | 7: iteration 53870/ 173500 | consumed samples: 13790720 | consumed tokens: 28243394560 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.556933E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.358 | TFLOPs: 11.25 | 7: iteration 53880/ 173500 | consumed samples: 13793280 | consumed tokens: 28248637440 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.558035E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.657 | TFLOPs: 11.36 | 7: iteration 53890/ 173500 | consumed samples: 13795840 | consumed tokens: 28253880320 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.549176E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.404 | TFLOPs: 11.35 | 7: iteration 53900/ 173500 | consumed samples: 13798400 | consumed tokens: 28259123200 | elapsed time per iteration (s): 0.08 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 4.558415E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.054 | TFLOPs: 11.65 | 7: iteration 53910/ 173500 | consumed samples: 13800960 | consumed tokens: 28264366080 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.566594E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.890 | TFLOPs: 11.37 | 7: iteration 53920/ 173500 | consumed samples: 13803520 | consumed tokens: 28269608960 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.563610E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.288 | TFLOPs: 11.92 | 7: iteration 53930/ 173500 | consumed samples: 13806080 | consumed tokens: 28274851840 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.541072E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.400 | TFLOPs: 11.88 | 7: iteration 53940/ 173500 | consumed samples: 13808640 | consumed tokens: 28280094720 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.562628E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.150 | TFLOPs: 11.63 | 7: iteration 53950/ 173500 | consumed samples: 13811200 | consumed tokens: 28285337600 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.550269E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.998 | TFLOPs: 11.64 | 7: iteration 53960/ 173500 | consumed samples: 13813760 | consumed tokens: 28290580480 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.555087E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.041 | TFLOPs: 11.58 | 7: iteration 53970/ 173500 | consumed samples: 13816320 | consumed tokens: 28295823360 | elapsed time per iteration (s): 0.08 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 4.544663E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.429 | TFLOPs: 11.93 | 7: iteration 53980/ 173500 | consumed samples: 13818880 | consumed tokens: 28301066240 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.555981E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.718 | TFLOPs: 11.84 | 7: iteration 53990/ 173500 | consumed samples: 13821440 | consumed tokens: 28306309120 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.551111E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.613 | TFLOPs: 11.88 | 0: [2023-03-17 01:35:18,058] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=0, lr=[0.00016191666237869197, 0.00016191666237869197, 0.00016191666237869197], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 54000/ 173500 | consumed samples: 13824000 | consumed tokens: 28311552000 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.569869E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.586 | TFLOPs: 11.57 | 0: steps: 54000 loss: 4.5438 iter time (s): 0.087 samples/sec: 2947.270 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 54000 | lm loss value: 4.427131E+00 | lm loss PPL: 8.369098E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 54000 to checkpoints_14m91b100m 0: [2023-03-17 01:35:18,116] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step54000 is begin to save! 0: [2023-03-17 01:35:18,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:35:18,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:35:18,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:35:18,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:35:18,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:35:18,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:35:18,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:35:18,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:35:18,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:35:18,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:35:18,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:35:18,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:35:18,160] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step54000/mp_rank_00_model_states.pt 0: [2023-03-17 01:35:18,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:35:18,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:35:18,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:35:18,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:35:18,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 01:35:18,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 01:35:18,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2023-03-17 01:35:18,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2023-03-17 01:35:18,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:35:18,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:35:18,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: successfully saved checkpoint at iteration 54000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.63 7: iteration 54010/ 173500 | consumed samples: 13826560 | consumed tokens: 28316794880 | elapsed time per iteration (s): 0.09 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.547240E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.849 | TFLOPs: 10.18 | 7: iteration 54020/ 173500 | consumed samples: 13829120 | consumed tokens: 28322037760 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.548525E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.103 | TFLOPs: 11.64 | 7: iteration 54030/ 173500 | consumed samples: 13831680 | consumed tokens: 28327280640 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.558107E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.405 | TFLOPs: 11.63 | 7: iteration 54040/ 173500 | consumed samples: 13834240 | consumed tokens: 28332523520 | elapsed time per iteration (s): 0.08 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 4.554213E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.191 | TFLOPs: 11.83 | 7: iteration 54050/ 173500 | consumed samples: 13836800 | consumed tokens: 28337766400 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.553537E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.169 | TFLOPs: 11.60 | 7: iteration 54060/ 173500 | consumed samples: 13839360 | consumed tokens: 28343009280 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.562645E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.284 | TFLOPs: 11.84 | 7: iteration 54070/ 173500 | consumed samples: 13841920 | consumed tokens: 28348252160 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.548454E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.121 | TFLOPs: 11.92 | 7: iteration 54080/ 173500 | consumed samples: 13844480 | consumed tokens: 28353495040 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.545174E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.367 | TFLOPs: 11.69 | 7: iteration 54090/ 173500 | consumed samples: 13847040 | consumed tokens: 28358737920 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.552818E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.473 | TFLOPs: 12.02 | 7: iteration 54100/ 173500 | consumed samples: 13849600 | consumed tokens: 28363980800 | elapsed time per iteration (s): 0.09 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.561070E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.906 | TFLOPs: 10.78 | 7: iteration 54110/ 173500 | consumed samples: 13852160 | consumed tokens: 28369223680 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.559139E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.092 | TFLOPs: 11.97 | 7: iteration 54120/ 173500 | consumed samples: 13854720 | consumed tokens: 28374466560 | elapsed time per iteration (s): 0.08 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 4.548272E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.998 | TFLOPs: 11.66 | 7: iteration 54130/ 173500 | consumed samples: 13857280 | consumed tokens: 28379709440 | elapsed time per iteration (s): 0.08 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.560151E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.098 | TFLOPs: 11.93 | 7: iteration 54140/ 173500 | consumed samples: 13859840 | consumed tokens: 28384952320 | elapsed time per iteration (s): 0.08 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.557332E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.789 | TFLOPs: 11.31 | 7: iteration 54150/ 173500 | consumed samples: 13862400 | consumed tokens: 28390195200 | elapsed time per iteration (s): 0.08 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.550636E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.609 | TFLOPs: 11.95 | 7: iteration 54160/ 173500 | consumed samples: 13864960 | consumed tokens: 28395438080 | elapsed time per iteration (s): 0.08 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.551583E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.984 | TFLOPs: 11.83 | 7: iteration 54170/ 173500 | consumed samples: 13867520 | consumed tokens: 28400680960 | elapsed time per iteration (s): 0.09 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.560611E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.982 | TFLOPs: 10.36 | 7: iteration 54180/ 173500 | consumed samples: 13870080 | consumed tokens: 28405923840 | elapsed time per iteration (s): 0.08 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.566571E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.760 | TFLOPs: 11.94 | 7: iteration 54190/ 173500 | consumed samples: 13872640 | consumed tokens: 28411166720 | elapsed time per iteration (s): 0.12 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 4.565330E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2117.555 | TFLOPs: 7.88 | 7: iteration 54200/ 173500 | consumed samples: 13875200 | consumed tokens: 28416409600 | elapsed time per iteration (s): 0.13 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.555306E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.435 | TFLOPs: 7.21 | 7: iteration 54210/ 173500 | consumed samples: 13877760 | consumed tokens: 28421652480 | elapsed time per iteration (s): 0.09 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.554275E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2991.178 | TFLOPs: 11.13 | 7: iteration 54220/ 173500 | consumed samples: 13880320 | consumed tokens: 28426895360 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.567794E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.802 | TFLOPs: 11.59 | 7: iteration 54230/ 173500 | consumed samples: 13882880 | consumed tokens: 28432138240 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.562195E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.269 | TFLOPs: 11.88 | 7: iteration 54240/ 173500 | consumed samples: 13885440 | consumed tokens: 28437381120 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.545187E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.685 | TFLOPs: 11.91 | 7: iteration 54250/ 173500 | consumed samples: 13888000 | consumed tokens: 28442624000 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.548395E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.922 | TFLOPs: 11.95 | 7: iteration 54260/ 173500 | consumed samples: 13890560 | consumed tokens: 28447866880 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.552005E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.972 | TFLOPs: 11.90 | 7: iteration 54270/ 173500 | consumed samples: 13893120 | consumed tokens: 28453109760 | elapsed time per iteration (s): 0.08 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 4.542120E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.987 | TFLOPs: 11.82 | 7: iteration 54280/ 173500 | consumed samples: 13895680 | consumed tokens: 28458352640 | elapsed time per iteration (s): 0.08 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.562497E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.125 | TFLOPs: 11.35 | 7: iteration 54290/ 173500 | consumed samples: 13898240 | consumed tokens: 28463595520 | elapsed time per iteration (s): 0.10 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.555685E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.931 | TFLOPs: 10.01 | 7: iteration 54300/ 173500 | consumed samples: 13900800 | consumed tokens: 28468838400 | elapsed time per iteration (s): 0.09 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.543295E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.094 | TFLOPs: 11.08 | 7: iteration 54310/ 173500 | consumed samples: 13903360 | consumed tokens: 28474081280 | elapsed time per iteration (s): 0.08 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.548710E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.914 | TFLOPs: 11.92 | 7: iteration 54320/ 173500 | consumed samples: 13905920 | consumed tokens: 28479324160 | elapsed time per iteration (s): 0.08 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.540879E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.582 | TFLOPs: 11.88 | 7: iteration 54330/ 173500 | consumed samples: 13908480 | consumed tokens: 28484567040 | elapsed time per iteration (s): 0.08 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.562628E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.815 | TFLOPs: 11.62 | 7: iteration 54340/ 173500 | consumed samples: 13911040 | consumed tokens: 28489809920 | elapsed time per iteration (s): 0.08 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.553505E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.644 | TFLOPs: 11.83 | 7: iteration 54350/ 173500 | consumed samples: 13913600 | consumed tokens: 28495052800 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.565268E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.151 | TFLOPs: 11.92 | 7: iteration 54360/ 173500 | consumed samples: 13916160 | consumed tokens: 28500295680 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.532269E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.405 | TFLOPs: 11.91 | 7: iteration 54370/ 173500 | consumed samples: 13918720 | consumed tokens: 28505538560 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.551925E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.309 | TFLOPs: 11.91 | 7: iteration 54380/ 173500 | consumed samples: 13921280 | consumed tokens: 28510781440 | elapsed time per iteration (s): 0.09 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.566557E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.188 | TFLOPs: 11.06 | 7: iteration 54390/ 173500 | consumed samples: 13923840 | consumed tokens: 28516024320 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.551490E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.703 | TFLOPs: 11.91 | 7: iteration 54400/ 173500 | consumed samples: 13926400 | consumed tokens: 28521267200 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.574484E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.102 | TFLOPs: 11.56 | 7: iteration 54410/ 173500 | consumed samples: 13928960 | consumed tokens: 28526510080 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.556860E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.964 | TFLOPs: 11.85 | 7: iteration 54420/ 173500 | consumed samples: 13931520 | consumed tokens: 28531752960 | elapsed time per iteration (s): 0.08 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 4.555562E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.261 | TFLOPs: 11.77 | 7: iteration 54430/ 173500 | consumed samples: 13934080 | consumed tokens: 28536995840 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.555809E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.806 | TFLOPs: 11.75 | 7: iteration 54440/ 173500 | consumed samples: 13936640 | consumed tokens: 28542238720 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.566016E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.705 | TFLOPs: 11.83 | 7: iteration 54450/ 173500 | consumed samples: 13939200 | consumed tokens: 28547481600 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.554671E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.869 | TFLOPs: 11.85 | 7: iteration 54460/ 173500 | consumed samples: 13941760 | consumed tokens: 28552724480 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.543901E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.670 | TFLOPs: 11.85 | 7: iteration 54470/ 173500 | consumed samples: 13944320 | consumed tokens: 28557967360 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.546275E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.901 | TFLOPs: 11.94 | 7: iteration 54480/ 173500 | consumed samples: 13946880 | consumed tokens: 28563210240 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.558703E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.295 | TFLOPs: 11.70 | 7: iteration 54490/ 173500 | consumed samples: 13949440 | consumed tokens: 28568453120 | elapsed time per iteration (s): 0.08 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 4.558179E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.164 | TFLOPs: 11.71 | 7: iteration 54500/ 173500 | consumed samples: 13952000 | consumed tokens: 28573696000 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.562866E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.811 | TFLOPs: 11.97 | 7: iteration 54510/ 173500 | consumed samples: 13954560 | consumed tokens: 28578938880 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.558356E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.647 | TFLOPs: 11.95 | 7: iteration 54520/ 173500 | consumed samples: 13957120 | consumed tokens: 28584181760 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.559396E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.902 | TFLOPs: 11.88 | 7: iteration 54530/ 173500 | consumed samples: 13959680 | consumed tokens: 28589424640 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.557885E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.554 | TFLOPs: 11.89 | 7: iteration 54540/ 173500 | consumed samples: 13962240 | consumed tokens: 28594667520 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.549837E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.064 | TFLOPs: 11.92 | 7: iteration 54550/ 173500 | consumed samples: 13964800 | consumed tokens: 28599910400 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.548711E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.876 | TFLOPs: 11.89 | 7: iteration 54560/ 173500 | consumed samples: 13967360 | consumed tokens: 28605153280 | elapsed time per iteration (s): 0.08 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 4.553415E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.378 | TFLOPs: 11.60 | 7: iteration 54570/ 173500 | consumed samples: 13969920 | consumed tokens: 28610396160 | elapsed time per iteration (s): 0.08 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.555005E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.660 | TFLOPs: 11.85 | 7: iteration 54580/ 173500 | consumed samples: 13972480 | consumed tokens: 28615639040 | elapsed time per iteration (s): 0.09 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.555231E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2937.122 | TFLOPs: 10.92 | 7: iteration 54590/ 173500 | consumed samples: 13975040 | consumed tokens: 28620881920 | elapsed time per iteration (s): 0.12 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.562607E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2168.134 | TFLOPs: 8.06 | 7: iteration 54600/ 173500 | consumed samples: 13977600 | consumed tokens: 28626124800 | elapsed time per iteration (s): 0.11 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.553902E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.643 | TFLOPs: 8.59 | 7: iteration 54610/ 173500 | consumed samples: 13980160 | consumed tokens: 28631367680 | elapsed time per iteration (s): 0.12 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.551771E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.850 | TFLOPs: 8.10 | 7: iteration 54620/ 173500 | consumed samples: 13982720 | consumed tokens: 28636610560 | elapsed time per iteration (s): 0.11 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.569134E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.943 | TFLOPs: 8.53 | 7: iteration 54630/ 173500 | consumed samples: 13985280 | consumed tokens: 28641853440 | elapsed time per iteration (s): 0.12 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.559481E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2140.587 | TFLOPs: 7.96 | 7: iteration 54640/ 173500 | consumed samples: 13987840 | consumed tokens: 28647096320 | elapsed time per iteration (s): 0.12 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 4.552000E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2126.203 | TFLOPs: 7.91 | 7: iteration 54650/ 173500 | consumed samples: 13990400 | consumed tokens: 28652339200 | elapsed time per iteration (s): 0.10 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.561111E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.107 | TFLOPs: 9.56 | 7: iteration 54660/ 173500 | consumed samples: 13992960 | consumed tokens: 28657582080 | elapsed time per iteration (s): 0.11 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.559141E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.561 | TFLOPs: 9.02 | 7: iteration 54670/ 173500 | consumed samples: 13995520 | consumed tokens: 28662824960 | elapsed time per iteration (s): 0.11 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.550863E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.266 | TFLOPs: 8.44 | 7: iteration 54680/ 173500 | consumed samples: 13998080 | consumed tokens: 28668067840 | elapsed time per iteration (s): 0.11 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.546606E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.513 | TFLOPs: 8.56 | 7: iteration 54690/ 173500 | consumed samples: 14000640 | consumed tokens: 28673310720 | elapsed time per iteration (s): 0.09 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.553418E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2703.589 | TFLOPs: 10.06 | 7: iteration 54700/ 173500 | consumed samples: 14003200 | consumed tokens: 28678553600 | elapsed time per iteration (s): 0.08 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.550882E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.128 | TFLOPs: 12.10 | 7: iteration 54710/ 173500 | consumed samples: 14005760 | consumed tokens: 28683796480 | elapsed time per iteration (s): 0.08 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 4.553004E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.465 | TFLOPs: 12.08 | 7: iteration 54720/ 173500 | consumed samples: 14008320 | consumed tokens: 28689039360 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.549474E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.566 | TFLOPs: 11.60 | 7: iteration 54730/ 173500 | consumed samples: 14010880 | consumed tokens: 28694282240 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.559343E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.557 | TFLOPs: 11.74 | 7: iteration 54740/ 173500 | consumed samples: 14013440 | consumed tokens: 28699525120 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.558977E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.782 | TFLOPs: 11.70 | 7: iteration 54750/ 173500 | consumed samples: 14016000 | consumed tokens: 28704768000 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.562820E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.590 | TFLOPs: 12.01 | 7: iteration 54760/ 173500 | consumed samples: 14018560 | consumed tokens: 28710010880 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.562479E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.462 | TFLOPs: 11.74 | 7: iteration 54770/ 173500 | consumed samples: 14021120 | consumed tokens: 28715253760 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.552843E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.361 | TFLOPs: 11.86 | 7: iteration 54780/ 173500 | consumed samples: 14023680 | consumed tokens: 28720496640 | elapsed time per iteration (s): 0.08 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 4.559911E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.039 | TFLOPs: 12.03 | 7: iteration 54790/ 173500 | consumed samples: 14026240 | consumed tokens: 28725739520 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.547620E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.925 | TFLOPs: 11.68 | 7: iteration 54800/ 173500 | consumed samples: 14028800 | consumed tokens: 28730982400 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.558323E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.866 | TFLOPs: 11.99 | 7: iteration 54810/ 173500 | consumed samples: 14031360 | consumed tokens: 28736225280 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.542245E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.091 | TFLOPs: 11.43 | 7: iteration 54820/ 173500 | consumed samples: 14033920 | consumed tokens: 28741468160 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.555994E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.304 | TFLOPs: 11.96 | 7: iteration 54830/ 173500 | consumed samples: 14036480 | consumed tokens: 28746711040 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.548229E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.408 | TFLOPs: 11.99 | 7: iteration 54840/ 173500 | consumed samples: 14039040 | consumed tokens: 28751953920 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.555582E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.717 | TFLOPs: 11.26 | 7: iteration 54850/ 173500 | consumed samples: 14041600 | consumed tokens: 28757196800 | elapsed time per iteration (s): 0.09 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.543519E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.147 | TFLOPs: 10.43 | 7: iteration 54860/ 173500 | consumed samples: 14044160 | consumed tokens: 28762439680 | elapsed time per iteration (s): 0.08 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 4.551292E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.993 | TFLOPs: 11.66 | 7: iteration 54870/ 173500 | consumed samples: 14046720 | consumed tokens: 28767682560 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.550159E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.532 | TFLOPs: 12.01 | 7: iteration 54880/ 173500 | consumed samples: 14049280 | consumed tokens: 28772925440 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.556210E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.962 | TFLOPs: 12.05 | 7: iteration 54890/ 173500 | consumed samples: 14051840 | consumed tokens: 28778168320 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.547710E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.529 | TFLOPs: 11.89 | 7: iteration 54900/ 173500 | consumed samples: 14054400 | consumed tokens: 28783411200 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.554841E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.528 | TFLOPs: 11.48 | 7: iteration 54910/ 173500 | consumed samples: 14056960 | consumed tokens: 28788654080 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.548868E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.515 | TFLOPs: 11.80 | 7: iteration 54920/ 173500 | consumed samples: 14059520 | consumed tokens: 28793896960 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.543073E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.628 | TFLOPs: 12.00 | 7: iteration 54930/ 173500 | consumed samples: 14062080 | consumed tokens: 28799139840 | elapsed time per iteration (s): 0.08 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 4.555616E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.494 | TFLOPs: 12.00 | 7: iteration 54940/ 173500 | consumed samples: 14064640 | consumed tokens: 28804382720 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.543335E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.473 | TFLOPs: 11.99 | 7: iteration 54950/ 173500 | consumed samples: 14067200 | consumed tokens: 28809625600 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.542258E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.509 | TFLOPs: 11.96 | 7: iteration 54960/ 173500 | consumed samples: 14069760 | consumed tokens: 28814868480 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.553273E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.349 | TFLOPs: 11.99 | 7: iteration 54970/ 173500 | consumed samples: 14072320 | consumed tokens: 28820111360 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.562825E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.252 | TFLOPs: 11.93 | 7: iteration 54980/ 173500 | consumed samples: 14074880 | consumed tokens: 28825354240 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.549896E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.650 | TFLOPs: 12.00 | 7: iteration 54990/ 173500 | consumed samples: 14077440 | consumed tokens: 28830597120 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.557449E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.864 | TFLOPs: 11.97 | 7: iteration 55000/ 173500 | consumed samples: 14080000 | consumed tokens: 28835840000 | elapsed time per iteration (s): 0.08 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 4.550669E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.881 | TFLOPs: 11.91 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 55000 | lm loss value: 4.415738E+00 | lm loss PPL: 8.274285E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 55000 to checkpoints_14m91b100m 0: [2023-03-17 01:36:43,700] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step55000 is begin to save! 0: [2023-03-17 01:36:43,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:36:43,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:36:43,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:36:43,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:36:43,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:36:43,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:36:43,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:36:43,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:36:43,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:36:43,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:36:43,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:36:43,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:36:43,743] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step55000/mp_rank_00_model_states.pt 0: [2023-03-17 01:36:43,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:36:43,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:36:43,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:36:43,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:36:43,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 01:36:43,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 01:36:43,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2023-03-17 01:36:43,776] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2023-03-17 01:36:43,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:36:43,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:36:43,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: successfully saved checkpoint at iteration 55000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.74 7: iteration 55010/ 173500 | consumed samples: 14082560 | consumed tokens: 28841082880 | elapsed time per iteration (s): 0.09 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.556274E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.224 | TFLOPs: 10.25 | 7: iteration 55020/ 173500 | consumed samples: 14085120 | consumed tokens: 28846325760 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.545926E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.683 | TFLOPs: 11.83 | 7: iteration 55030/ 173500 | consumed samples: 14087680 | consumed tokens: 28851568640 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.549469E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.587 | TFLOPs: 11.67 | 7: iteration 55040/ 173500 | consumed samples: 14090240 | consumed tokens: 28856811520 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.563116E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.179 | TFLOPs: 11.94 | 7: iteration 55050/ 173500 | consumed samples: 14092800 | consumed tokens: 28862054400 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.555961E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.193 | TFLOPs: 11.69 | 7: iteration 55060/ 173500 | consumed samples: 14095360 | consumed tokens: 28867297280 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.556985E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.891 | TFLOPs: 11.98 | 7: iteration 55070/ 173500 | consumed samples: 14097920 | consumed tokens: 28872540160 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.557640E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.555 | TFLOPs: 11.99 | 7: iteration 55080/ 173500 | consumed samples: 14100480 | consumed tokens: 28877783040 | elapsed time per iteration (s): 0.08 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 4.558398E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.186 | TFLOPs: 11.94 | 7: iteration 55090/ 173500 | consumed samples: 14103040 | consumed tokens: 28883025920 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.556894E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.628 | TFLOPs: 11.99 | 7: iteration 55100/ 173500 | consumed samples: 14105600 | consumed tokens: 28888268800 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.564001E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.410 | TFLOPs: 12.01 | 7: iteration 55110/ 173500 | consumed samples: 14108160 | consumed tokens: 28893511680 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.544660E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.718 | TFLOPs: 12.02 | 7: iteration 55120/ 173500 | consumed samples: 14110720 | consumed tokens: 28898754560 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.560250E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.871 | TFLOPs: 12.01 | 7: iteration 55130/ 173500 | consumed samples: 14113280 | consumed tokens: 28903997440 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.551875E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.046 | TFLOPs: 12.01 | 7: iteration 55140/ 173500 | consumed samples: 14115840 | consumed tokens: 28909240320 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.562157E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.913 | TFLOPs: 12.01 | 7: iteration 55150/ 173500 | consumed samples: 14118400 | consumed tokens: 28914483200 | elapsed time per iteration (s): 0.08 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 4.547332E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.564 | TFLOPs: 11.72 | 7: iteration 55160/ 173500 | consumed samples: 14120960 | consumed tokens: 28919726080 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.544848E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.687 | TFLOPs: 12.00 | 7: iteration 55170/ 173500 | consumed samples: 14123520 | consumed tokens: 28924968960 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.535048E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.471 | TFLOPs: 11.72 | 7: iteration 55180/ 173500 | consumed samples: 14126080 | consumed tokens: 28930211840 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.553490E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.476 | TFLOPs: 12.02 | 7: iteration 55190/ 173500 | consumed samples: 14128640 | consumed tokens: 28935454720 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.555978E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.880 | TFLOPs: 12.01 | 7: iteration 55200/ 173500 | consumed samples: 14131200 | consumed tokens: 28940697600 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.549199E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.191 | TFLOPs: 12.02 | 7: iteration 55210/ 173500 | consumed samples: 14133760 | consumed tokens: 28945940480 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.558612E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.415 | TFLOPs: 11.94 | 7: iteration 55220/ 173500 | consumed samples: 14136320 | consumed tokens: 28951183360 | elapsed time per iteration (s): 0.08 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 4.529873E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.076 | TFLOPs: 11.70 | 7: iteration 55230/ 173500 | consumed samples: 14138880 | consumed tokens: 28956426240 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.552060E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.093 | TFLOPs: 11.98 | 7: iteration 55240/ 173500 | consumed samples: 14141440 | consumed tokens: 28961669120 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.553426E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.707 | TFLOPs: 11.97 | 7: iteration 55250/ 173500 | consumed samples: 14144000 | consumed tokens: 28966912000 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.567292E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.794 | TFLOPs: 11.45 | 7: iteration 55260/ 173500 | consumed samples: 14146560 | consumed tokens: 28972154880 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.560114E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.685 | TFLOPs: 11.72 | 7: iteration 55270/ 173500 | consumed samples: 14149120 | consumed tokens: 28977397760 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.543668E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.276 | TFLOPs: 11.99 | 7: iteration 55280/ 173500 | consumed samples: 14151680 | consumed tokens: 28982640640 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.552455E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.352 | TFLOPs: 11.99 | 7: iteration 55290/ 173500 | consumed samples: 14154240 | consumed tokens: 28987883520 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.541959E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.778 | TFLOPs: 12.01 | 7: iteration 55300/ 173500 | consumed samples: 14156800 | consumed tokens: 28993126400 | elapsed time per iteration (s): 0.08 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 4.554475E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.576 | TFLOPs: 11.96 | 7: iteration 55310/ 173500 | consumed samples: 14159360 | consumed tokens: 28998369280 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.561109E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.362 | TFLOPs: 11.68 | 7: iteration 55320/ 173500 | consumed samples: 14161920 | consumed tokens: 29003612160 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.530989E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.605 | TFLOPs: 11.59 | 7: iteration 55330/ 173500 | consumed samples: 14164480 | consumed tokens: 29008855040 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.558300E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.744 | TFLOPs: 12.00 | 7: iteration 55340/ 173500 | consumed samples: 14167040 | consumed tokens: 29014097920 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.557570E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.685 | TFLOPs: 11.72 | 7: iteration 55350/ 173500 | consumed samples: 14169600 | consumed tokens: 29019340800 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.556050E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.677 | TFLOPs: 11.86 | 7: iteration 55360/ 173500 | consumed samples: 14172160 | consumed tokens: 29024583680 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.552871E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.422 | TFLOPs: 12.03 | 7: iteration 55370/ 173500 | consumed samples: 14174720 | consumed tokens: 29029826560 | elapsed time per iteration (s): 0.08 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 4.556556E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.781 | TFLOPs: 11.77 | 7: iteration 55380/ 173500 | consumed samples: 14177280 | consumed tokens: 29035069440 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.545368E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.497 | TFLOPs: 12.07 | 7: iteration 55390/ 173500 | consumed samples: 14179840 | consumed tokens: 29040312320 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.552252E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.999 | TFLOPs: 11.92 | 7: iteration 55400/ 173500 | consumed samples: 14182400 | consumed tokens: 29045555200 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.557250E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.673 | TFLOPs: 12.04 | 7: iteration 55410/ 173500 | consumed samples: 14184960 | consumed tokens: 29050798080 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.557421E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.084 | TFLOPs: 11.95 | 7: iteration 55420/ 173500 | consumed samples: 14187520 | consumed tokens: 29056040960 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.538190E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.751 | TFLOPs: 11.99 | 7: iteration 55430/ 173500 | consumed samples: 14190080 | consumed tokens: 29061283840 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.548712E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.479 | TFLOPs: 11.80 | 7: iteration 55440/ 173500 | consumed samples: 14192640 | consumed tokens: 29066526720 | elapsed time per iteration (s): 0.08 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 4.542594E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.018 | TFLOPs: 11.89 | 7: iteration 55450/ 173500 | consumed samples: 14195200 | consumed tokens: 29071769600 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.555008E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.042 | TFLOPs: 12.07 | 7: iteration 55460/ 173500 | consumed samples: 14197760 | consumed tokens: 29077012480 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.550681E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.677 | TFLOPs: 12.04 | 7: iteration 55470/ 173500 | consumed samples: 14200320 | consumed tokens: 29082255360 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.560014E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.495 | TFLOPs: 12.02 | 7: iteration 55480/ 173500 | consumed samples: 14202880 | consumed tokens: 29087498240 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.555125E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.231 | TFLOPs: 11.62 | 7: iteration 55490/ 173500 | consumed samples: 14205440 | consumed tokens: 29092741120 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.547583E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.950 | TFLOPs: 11.68 | 7: iteration 55500/ 173500 | consumed samples: 14208000 | consumed tokens: 29097984000 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.549064E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.715 | TFLOPs: 11.98 | 7: iteration 55510/ 173500 | consumed samples: 14210560 | consumed tokens: 29103226880 | elapsed time per iteration (s): 0.09 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.548901E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.014 | TFLOPs: 10.83 | 7: iteration 55520/ 173500 | consumed samples: 14213120 | consumed tokens: 29108469760 | elapsed time per iteration (s): 0.08 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 4.554888E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.889 | TFLOPs: 12.00 | 7: iteration 55530/ 173500 | consumed samples: 14215680 | consumed tokens: 29113712640 | elapsed time per iteration (s): 0.08 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.545981E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.509 | TFLOPs: 11.95 | 7: iteration 55540/ 173500 | consumed samples: 14218240 | consumed tokens: 29118955520 | elapsed time per iteration (s): 0.08 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.541870E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.018 | TFLOPs: 12.01 | 7: iteration 55550/ 173500 | consumed samples: 14220800 | consumed tokens: 29124198400 | elapsed time per iteration (s): 0.08 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.554408E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.244 | TFLOPs: 11.69 | 7: iteration 55560/ 173500 | consumed samples: 14223360 | consumed tokens: 29129441280 | elapsed time per iteration (s): 0.09 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.556447E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.570 | TFLOPs: 10.97 | 7: iteration 55570/ 173500 | consumed samples: 14225920 | consumed tokens: 29134684160 | elapsed time per iteration (s): 0.08 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.543067E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.555 | TFLOPs: 12.00 | 7: iteration 55580/ 173500 | consumed samples: 14228480 | consumed tokens: 29139927040 | elapsed time per iteration (s): 0.09 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.541743E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.960 | TFLOPs: 10.80 | 7: iteration 55590/ 173500 | consumed samples: 14231040 | consumed tokens: 29145169920 | elapsed time per iteration (s): 0.08 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 4.547157E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.775 | TFLOPs: 11.63 | 7: iteration 55600/ 173500 | consumed samples: 14233600 | consumed tokens: 29150412800 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.553480E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.626 | TFLOPs: 12.02 | 7: iteration 55610/ 173500 | consumed samples: 14236160 | consumed tokens: 29155655680 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.555213E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.583 | TFLOPs: 12.00 | 7: iteration 55620/ 173500 | consumed samples: 14238720 | consumed tokens: 29160898560 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.547480E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.690 | TFLOPs: 12.00 | 7: iteration 55630/ 173500 | consumed samples: 14241280 | consumed tokens: 29166141440 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.552700E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.223 | TFLOPs: 11.53 | 7: iteration 55640/ 173500 | consumed samples: 14243840 | consumed tokens: 29171384320 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.555704E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.872 | TFLOPs: 12.00 | 7: iteration 55650/ 173500 | consumed samples: 14246400 | consumed tokens: 29176627200 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.555220E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.955 | TFLOPs: 11.98 | 7: iteration 55660/ 173500 | consumed samples: 14248960 | consumed tokens: 29181870080 | elapsed time per iteration (s): 0.08 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.556405E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.100 | TFLOPs: 12.00 | 7: iteration 55670/ 173500 | consumed samples: 14251520 | consumed tokens: 29187112960 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.562586E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.701 | TFLOPs: 12.04 | 7: iteration 55680/ 173500 | consumed samples: 14254080 | consumed tokens: 29192355840 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.550963E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.120 | TFLOPs: 12.02 | 7: iteration 55690/ 173500 | consumed samples: 14256640 | consumed tokens: 29197598720 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.559808E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.573 | TFLOPs: 12.02 | 7: iteration 55700/ 173500 | consumed samples: 14259200 | consumed tokens: 29202841600 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.548563E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.757 | TFLOPs: 12.04 | 7: iteration 55710/ 173500 | consumed samples: 14261760 | consumed tokens: 29208084480 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.543693E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.943 | TFLOPs: 12.05 | 7: iteration 55720/ 173500 | consumed samples: 14264320 | consumed tokens: 29213327360 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.543005E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.831 | TFLOPs: 11.73 | 7: iteration 55730/ 173500 | consumed samples: 14266880 | consumed tokens: 29218570240 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.555350E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.595 | TFLOPs: 12.07 | 7: iteration 55740/ 173500 | consumed samples: 14269440 | consumed tokens: 29223813120 | elapsed time per iteration (s): 0.08 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 4.550043E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.986 | TFLOPs: 12.01 | 7: iteration 55750/ 173500 | consumed samples: 14272000 | consumed tokens: 29229056000 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.549309E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.178 | TFLOPs: 12.02 | 7: iteration 55760/ 173500 | consumed samples: 14274560 | consumed tokens: 29234298880 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.551842E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.184 | TFLOPs: 12.02 | 7: iteration 55770/ 173500 | consumed samples: 14277120 | consumed tokens: 29239541760 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.555374E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.104 | TFLOPs: 11.80 | 7: iteration 55780/ 173500 | consumed samples: 14279680 | consumed tokens: 29244784640 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.561713E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.036 | TFLOPs: 12.03 | 7: iteration 55790/ 173500 | consumed samples: 14282240 | consumed tokens: 29250027520 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.554532E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.684 | TFLOPs: 12.05 | 7: iteration 55800/ 173500 | consumed samples: 14284800 | consumed tokens: 29255270400 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.552715E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.894 | TFLOPs: 12.01 | 7: iteration 55810/ 173500 | consumed samples: 14287360 | consumed tokens: 29260513280 | elapsed time per iteration (s): 0.08 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 4.555544E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.879 | TFLOPs: 12.05 | 7: iteration 55820/ 173500 | consumed samples: 14289920 | consumed tokens: 29265756160 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.551889E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.186 | TFLOPs: 12.06 | 7: iteration 55830/ 173500 | consumed samples: 14292480 | consumed tokens: 29270999040 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.551493E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.636 | TFLOPs: 12.04 | 7: iteration 55840/ 173500 | consumed samples: 14295040 | consumed tokens: 29276241920 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.553872E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.705 | TFLOPs: 12.01 | 7: iteration 55850/ 173500 | consumed samples: 14297600 | consumed tokens: 29281484800 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.557596E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.453 | TFLOPs: 12.03 | 7: iteration 55860/ 173500 | consumed samples: 14300160 | consumed tokens: 29286727680 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.547125E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.032 | TFLOPs: 12.06 | 7: iteration 55870/ 173500 | consumed samples: 14302720 | consumed tokens: 29291970560 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.557804E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.383 | TFLOPs: 12.05 | 7: iteration 55880/ 173500 | consumed samples: 14305280 | consumed tokens: 29297213440 | elapsed time per iteration (s): 0.08 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 4.553501E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.338 | TFLOPs: 12.08 | 7: iteration 55890/ 173500 | consumed samples: 14307840 | consumed tokens: 29302456320 | elapsed time per iteration (s): 0.08 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.552836E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.168 | TFLOPs: 12.05 | 7: iteration 55900/ 173500 | consumed samples: 14310400 | consumed tokens: 29307699200 | elapsed time per iteration (s): 0.08 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.551523E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.321 | TFLOPs: 12.01 | 7: iteration 55910/ 173500 | consumed samples: 14312960 | consumed tokens: 29312942080 | elapsed time per iteration (s): 0.09 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.536811E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.551 | TFLOPs: 11.03 | 7: iteration 55920/ 173500 | consumed samples: 14315520 | consumed tokens: 29318184960 | elapsed time per iteration (s): 0.10 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.547181E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2503.457 | TFLOPs: 9.31 | 7: iteration 55930/ 173500 | consumed samples: 14318080 | consumed tokens: 29323427840 | elapsed time per iteration (s): 0.09 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.566344E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2894.468 | TFLOPs: 10.77 | 7: iteration 55940/ 173500 | consumed samples: 14320640 | consumed tokens: 29328670720 | elapsed time per iteration (s): 0.08 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.552954E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.869 | TFLOPs: 11.73 | 7: iteration 55950/ 173500 | consumed samples: 14323200 | consumed tokens: 29333913600 | elapsed time per iteration (s): 0.09 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 4.549566E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.017 | TFLOPs: 10.70 | 7: iteration 55960/ 173500 | consumed samples: 14325760 | consumed tokens: 29339156480 | elapsed time per iteration (s): 0.08 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.556936E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.605 | TFLOPs: 12.01 | 7: iteration 55970/ 173500 | consumed samples: 14328320 | consumed tokens: 29344399360 | elapsed time per iteration (s): 0.08 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.555250E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.029 | TFLOPs: 11.49 | 7: iteration 55980/ 173500 | consumed samples: 14330880 | consumed tokens: 29349642240 | elapsed time per iteration (s): 0.08 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.549658E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.776 | TFLOPs: 11.71 | 7: iteration 55990/ 173500 | consumed samples: 14333440 | consumed tokens: 29354885120 | elapsed time per iteration (s): 0.08 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.564113E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.692 | TFLOPs: 11.22 | 0: [2023-03-17 01:38:04,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=0, lr=[0.0001591933009380588, 0.0001591933009380588, 0.0001591933009380588], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 56000/ 173500 | consumed samples: 14336000 | consumed tokens: 29360128000 | elapsed time per iteration (s): 0.11 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.552249E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.907 | TFLOPs: 8.35 | 0: steps: 56000 loss: 4.5642 iter time (s): 0.083 samples/sec: 3098.431 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 56000 | lm loss value: 4.443368E+00 | lm loss PPL: 8.506098E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 56000 to checkpoints_14m91b100m 0: [2023-03-17 01:38:04,815] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step56000 is begin to save! 0: [2023-03-17 01:38:04,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:38:04,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:38:04,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:38:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:38:04,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:38:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:38:04,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:38:04,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:38:04,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:38:04,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:38:04,856] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:38:04,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:38:04,858] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step56000/mp_rank_00_model_states.pt 0: [2023-03-17 01:38:04,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:38:04,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:38:04,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:38:04,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:38:04,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:38:04,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:38:04,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 7: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2023-03-17 01:38:04,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: successfully saved checkpoint at iteration 56000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.08 7: iteration 56010/ 173500 | consumed samples: 14338560 | consumed tokens: 29365370880 | elapsed time per iteration (s): 0.10 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.563738E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2569.371 | TFLOPs: 9.56 | 7: iteration 56020/ 173500 | consumed samples: 14341120 | consumed tokens: 29370613760 | elapsed time per iteration (s): 0.11 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.538462E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2380.113 | TFLOPs: 8.85 | 7: iteration 56030/ 173500 | consumed samples: 14343680 | consumed tokens: 29375856640 | elapsed time per iteration (s): 0.11 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 4.543598E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.992 | TFLOPs: 8.95 | 7: iteration 56040/ 173500 | consumed samples: 14346240 | consumed tokens: 29381099520 | elapsed time per iteration (s): 0.09 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.535117E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.632 | TFLOPs: 11.14 | 7: iteration 56050/ 173500 | consumed samples: 14348800 | consumed tokens: 29386342400 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.537609E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.635 | TFLOPs: 11.99 | 7: iteration 56060/ 173500 | consumed samples: 14351360 | consumed tokens: 29391585280 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.531643E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.281 | TFLOPs: 11.97 | 7: iteration 56070/ 173500 | consumed samples: 14353920 | consumed tokens: 29396828160 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.559172E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.237 | TFLOPs: 11.99 | 7: iteration 56080/ 173500 | consumed samples: 14356480 | consumed tokens: 29402071040 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.545832E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.026 | TFLOPs: 11.97 | 7: iteration 56090/ 173500 | consumed samples: 14359040 | consumed tokens: 29407313920 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.562759E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.240 | TFLOPs: 12.00 | 7: iteration 56100/ 173500 | consumed samples: 14361600 | consumed tokens: 29412556800 | elapsed time per iteration (s): 0.08 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 4.554441E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.013 | TFLOPs: 11.97 | 7: iteration 56110/ 173500 | consumed samples: 14364160 | consumed tokens: 29417799680 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.558068E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.643 | TFLOPs: 11.95 | 7: iteration 56120/ 173500 | consumed samples: 14366720 | consumed tokens: 29423042560 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.552475E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.569 | TFLOPs: 11.93 | 7: iteration 56130/ 173500 | consumed samples: 14369280 | consumed tokens: 29428285440 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.549657E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.735 | TFLOPs: 11.98 | 7: iteration 56140/ 173500 | consumed samples: 14371840 | consumed tokens: 29433528320 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.547758E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.456 | TFLOPs: 11.99 | 7: iteration 56150/ 173500 | consumed samples: 14374400 | consumed tokens: 29438771200 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.554733E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.567 | TFLOPs: 11.99 | 7: iteration 56160/ 173500 | consumed samples: 14376960 | consumed tokens: 29444014080 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.574118E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.563 | TFLOPs: 12.03 | 7: iteration 56170/ 173500 | consumed samples: 14379520 | consumed tokens: 29449256960 | elapsed time per iteration (s): 0.08 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 4.559461E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.863 | TFLOPs: 12.05 | 7: iteration 56180/ 173500 | consumed samples: 14382080 | consumed tokens: 29454499840 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.559026E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.548 | TFLOPs: 11.48 | 7: iteration 56190/ 173500 | consumed samples: 14384640 | consumed tokens: 29459742720 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.558842E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.132 | TFLOPs: 11.50 | 7: iteration 56200/ 173500 | consumed samples: 14387200 | consumed tokens: 29464985600 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.550368E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.122 | TFLOPs: 11.71 | 7: iteration 56210/ 173500 | consumed samples: 14389760 | consumed tokens: 29470228480 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.559287E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.568 | TFLOPs: 11.97 | 7: iteration 56220/ 173500 | consumed samples: 14392320 | consumed tokens: 29475471360 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.546000E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.863 | TFLOPs: 11.94 | 7: iteration 56230/ 173500 | consumed samples: 14394880 | consumed tokens: 29480714240 | elapsed time per iteration (s): 0.08 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.541236E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.645 | TFLOPs: 11.94 | 7: iteration 56240/ 173500 | consumed samples: 14397440 | consumed tokens: 29485957120 | elapsed time per iteration (s): 0.09 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 4.550777E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.014 | TFLOPs: 11.19 | 7: iteration 56250/ 173500 | consumed samples: 14400000 | consumed tokens: 29491200000 | elapsed time per iteration (s): 0.09 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.555298E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.021 | TFLOPs: 11.19 | 7: iteration 56260/ 173500 | consumed samples: 14402560 | consumed tokens: 29496442880 | elapsed time per iteration (s): 0.08 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.566280E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.732 | TFLOPs: 11.47 | 7: iteration 56270/ 173500 | consumed samples: 14405120 | consumed tokens: 29501685760 | elapsed time per iteration (s): 0.09 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.536178E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.594 | TFLOPs: 10.69 | 7: iteration 56280/ 173500 | consumed samples: 14407680 | consumed tokens: 29506928640 | elapsed time per iteration (s): 0.09 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.548731E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.250 | TFLOPs: 11.17 | 7: iteration 56290/ 173500 | consumed samples: 14410240 | consumed tokens: 29512171520 | elapsed time per iteration (s): 0.08 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.544180E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.416 | TFLOPs: 11.23 | 7: iteration 56300/ 173500 | consumed samples: 14412800 | consumed tokens: 29517414400 | elapsed time per iteration (s): 0.08 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.567481E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.858 | TFLOPs: 12.01 | 7: iteration 56310/ 173500 | consumed samples: 14415360 | consumed tokens: 29522657280 | elapsed time per iteration (s): 0.08 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.540752E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.333 | TFLOPs: 11.64 | 7: iteration 56320/ 173500 | consumed samples: 14417920 | consumed tokens: 29527900160 | elapsed time per iteration (s): 0.08 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 4.555682E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.748 | TFLOPs: 11.91 | 7: iteration 56330/ 173500 | consumed samples: 14420480 | consumed tokens: 29533143040 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.534153E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.104 | TFLOPs: 11.73 | 7: iteration 56340/ 173500 | consumed samples: 14423040 | consumed tokens: 29538385920 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.548823E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.515 | TFLOPs: 11.70 | 7: iteration 56350/ 173500 | consumed samples: 14425600 | consumed tokens: 29543628800 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.562310E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.957 | TFLOPs: 12.03 | 7: iteration 56360/ 173500 | consumed samples: 14428160 | consumed tokens: 29548871680 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.555394E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.780 | TFLOPs: 12.01 | 7: iteration 56370/ 173500 | consumed samples: 14430720 | consumed tokens: 29554114560 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.545279E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.999 | TFLOPs: 11.96 | 7: iteration 56380/ 173500 | consumed samples: 14433280 | consumed tokens: 29559357440 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.543914E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.784 | TFLOPs: 11.98 | 7: iteration 56390/ 173500 | consumed samples: 14435840 | consumed tokens: 29564600320 | elapsed time per iteration (s): 0.08 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 4.552145E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.309 | TFLOPs: 12.01 | 7: iteration 56400/ 173500 | consumed samples: 14438400 | consumed tokens: 29569843200 | elapsed time per iteration (s): 0.08 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.544794E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.011 | TFLOPs: 12.04 | 7: iteration 56410/ 173500 | consumed samples: 14440960 | consumed tokens: 29575086080 | elapsed time per iteration (s): 0.08 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.542734E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.434 | TFLOPs: 11.72 | 7: iteration 56420/ 173500 | consumed samples: 14443520 | consumed tokens: 29580328960 | elapsed time per iteration (s): 0.08 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.559635E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.438 | TFLOPs: 12.03 | 7: iteration 56430/ 173500 | consumed samples: 14446080 | consumed tokens: 29585571840 | elapsed time per iteration (s): 0.11 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.549662E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.364 | TFLOPs: 8.95 | 7: iteration 56440/ 173500 | consumed samples: 14448640 | consumed tokens: 29590814720 | elapsed time per iteration (s): 0.11 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.560379E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.655 | TFLOPs: 8.88 | 7: iteration 56450/ 173500 | consumed samples: 14451200 | consumed tokens: 29596057600 | elapsed time per iteration (s): 0.12 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.550649E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.278 | TFLOPs: 8.27 | 7: iteration 56460/ 173500 | consumed samples: 14453760 | consumed tokens: 29601300480 | elapsed time per iteration (s): 0.11 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 4.550370E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.880 | TFLOPs: 8.69 | 7: iteration 56470/ 173500 | consumed samples: 14456320 | consumed tokens: 29606543360 | elapsed time per iteration (s): 0.10 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.544738E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2473.612 | TFLOPs: 9.20 | 7: iteration 56480/ 173500 | consumed samples: 14458880 | consumed tokens: 29611786240 | elapsed time per iteration (s): 0.12 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.542760E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.729 | TFLOPs: 7.81 | 7: iteration 56490/ 173500 | consumed samples: 14461440 | consumed tokens: 29617029120 | elapsed time per iteration (s): 0.11 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.553299E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2324.459 | TFLOPs: 8.65 | 7: iteration 56500/ 173500 | consumed samples: 14464000 | consumed tokens: 29622272000 | elapsed time per iteration (s): 0.10 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.559630E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2456.073 | TFLOPs: 9.14 | 7: iteration 56510/ 173500 | consumed samples: 14466560 | consumed tokens: 29627514880 | elapsed time per iteration (s): 0.10 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.552085E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2478.525 | TFLOPs: 9.22 | 7: iteration 56520/ 173500 | consumed samples: 14469120 | consumed tokens: 29632757760 | elapsed time per iteration (s): 0.10 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.555991E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2488.681 | TFLOPs: 9.26 | 7: iteration 56530/ 173500 | consumed samples: 14471680 | consumed tokens: 29638000640 | elapsed time per iteration (s): 0.11 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 4.553055E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.926 | TFLOPs: 8.82 | 7: iteration 56540/ 173500 | consumed samples: 14474240 | consumed tokens: 29643243520 | elapsed time per iteration (s): 0.10 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.555516E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.689 | TFLOPs: 9.36 | 7: iteration 56550/ 173500 | consumed samples: 14476800 | consumed tokens: 29648486400 | elapsed time per iteration (s): 0.11 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.551865E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.143 | TFLOPs: 8.95 | 7: iteration 56560/ 173500 | consumed samples: 14479360 | consumed tokens: 29653729280 | elapsed time per iteration (s): 0.09 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.548267E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.043 | TFLOPs: 10.41 | 7: iteration 56570/ 173500 | consumed samples: 14481920 | consumed tokens: 29658972160 | elapsed time per iteration (s): 0.08 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.546181E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.186 | TFLOPs: 11.84 | 7: iteration 56580/ 173500 | consumed samples: 14484480 | consumed tokens: 29664215040 | elapsed time per iteration (s): 0.08 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.566353E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.309 | TFLOPs: 11.85 | 7: iteration 56590/ 173500 | consumed samples: 14487040 | consumed tokens: 29669457920 | elapsed time per iteration (s): 0.08 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.542484E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.770 | TFLOPs: 11.92 | 7: iteration 56600/ 173500 | consumed samples: 14489600 | consumed tokens: 29674700800 | elapsed time per iteration (s): 0.08 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 4.545883E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.466 | TFLOPs: 11.87 | 7: iteration 56610/ 173500 | consumed samples: 14492160 | consumed tokens: 29679943680 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.555457E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.742 | TFLOPs: 11.54 | 7: iteration 56620/ 173500 | consumed samples: 14494720 | consumed tokens: 29685186560 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.546959E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.947 | TFLOPs: 11.83 | 7: iteration 56630/ 173500 | consumed samples: 14497280 | consumed tokens: 29690429440 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.553497E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.191 | TFLOPs: 11.86 | 7: iteration 56640/ 173500 | consumed samples: 14499840 | consumed tokens: 29695672320 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.548489E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.833 | TFLOPs: 11.91 | 7: iteration 56650/ 173500 | consumed samples: 14502400 | consumed tokens: 29700915200 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.541770E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.166 | TFLOPs: 11.88 | 7: iteration 56660/ 173500 | consumed samples: 14504960 | consumed tokens: 29706158080 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.550646E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.239 | TFLOPs: 11.89 | 7: iteration 56670/ 173500 | consumed samples: 14507520 | consumed tokens: 29711400960 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.547615E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.644 | TFLOPs: 11.89 | 7: iteration 56680/ 173500 | consumed samples: 14510080 | consumed tokens: 29716643840 | elapsed time per iteration (s): 0.08 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 4.555636E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.285 | TFLOPs: 11.85 | 7: iteration 56690/ 173500 | consumed samples: 14512640 | consumed tokens: 29721886720 | elapsed time per iteration (s): 0.08 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.549297E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.082 | TFLOPs: 11.85 | 7: iteration 56700/ 173500 | consumed samples: 14515200 | consumed tokens: 29727129600 | elapsed time per iteration (s): 0.08 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.561009E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.947 | TFLOPs: 11.94 | 7: iteration 56710/ 173500 | consumed samples: 14517760 | consumed tokens: 29732372480 | elapsed time per iteration (s): 0.08 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.550718E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.435 | TFLOPs: 11.98 | 7: iteration 56720/ 173500 | consumed samples: 14520320 | consumed tokens: 29737615360 | elapsed time per iteration (s): 0.08 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.554832E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.069 | TFLOPs: 11.96 | 7: iteration 56730/ 173500 | consumed samples: 14522880 | consumed tokens: 29742858240 | elapsed time per iteration (s): 0.08 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.555552E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.555 | TFLOPs: 12.01 | 7: iteration 56740/ 173500 | consumed samples: 14525440 | consumed tokens: 29748101120 | elapsed time per iteration (s): 0.09 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.552843E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.601 | TFLOPs: 10.69 | 7: iteration 56750/ 173500 | consumed samples: 14528000 | consumed tokens: 29753344000 | elapsed time per iteration (s): 0.09 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 4.553383E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.418 | TFLOPs: 10.72 | 7: iteration 56760/ 173500 | consumed samples: 14530560 | consumed tokens: 29758586880 | elapsed time per iteration (s): 0.09 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.537676E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.923 | TFLOPs: 10.28 | 7: iteration 56770/ 173500 | consumed samples: 14533120 | consumed tokens: 29763829760 | elapsed time per iteration (s): 0.09 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.546174E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.907 | TFLOPs: 10.22 | 7: iteration 56780/ 173500 | consumed samples: 14535680 | consumed tokens: 29769072640 | elapsed time per iteration (s): 0.09 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.548503E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.174 | TFLOPs: 10.43 | 7: iteration 56790/ 173500 | consumed samples: 14538240 | consumed tokens: 29774315520 | elapsed time per iteration (s): 0.09 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.548386E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.698 | TFLOPs: 10.38 | 7: iteration 56800/ 173500 | consumed samples: 14540800 | consumed tokens: 29779558400 | elapsed time per iteration (s): 0.09 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.550383E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.362 | TFLOPs: 10.88 | 7: iteration 56810/ 173500 | consumed samples: 14543360 | consumed tokens: 29784801280 | elapsed time per iteration (s): 0.08 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.557859E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.229 | TFLOPs: 11.27 | 7: iteration 56820/ 173500 | consumed samples: 14545920 | consumed tokens: 29790044160 | elapsed time per iteration (s): 0.08 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 4.564820E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.001 | TFLOPs: 11.90 | 7: iteration 56830/ 173500 | consumed samples: 14548480 | consumed tokens: 29795287040 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.559677E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.116 | TFLOPs: 11.91 | 7: iteration 56840/ 173500 | consumed samples: 14551040 | consumed tokens: 29800529920 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.560888E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.372 | TFLOPs: 11.64 | 7: iteration 56850/ 173500 | consumed samples: 14553600 | consumed tokens: 29805772800 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.548300E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.815 | TFLOPs: 11.66 | 7: iteration 56860/ 173500 | consumed samples: 14556160 | consumed tokens: 29811015680 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.548359E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.402 | TFLOPs: 11.90 | 7: iteration 56870/ 173500 | consumed samples: 14558720 | consumed tokens: 29816258560 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.548510E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.913 | TFLOPs: 11.61 | 7: iteration 56880/ 173500 | consumed samples: 14561280 | consumed tokens: 29821501440 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.552881E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.792 | TFLOPs: 11.38 | 7: iteration 56890/ 173500 | consumed samples: 14563840 | consumed tokens: 29826744320 | elapsed time per iteration (s): 0.08 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.541846E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.157 | TFLOPs: 11.40 | 7: iteration 56900/ 173500 | consumed samples: 14566400 | consumed tokens: 29831987200 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.555838E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.325 | TFLOPs: 11.92 | 7: iteration 56910/ 173500 | consumed samples: 14568960 | consumed tokens: 29837230080 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.554851E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.596 | TFLOPs: 11.93 | 7: iteration 56920/ 173500 | consumed samples: 14571520 | consumed tokens: 29842472960 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.553624E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.879 | TFLOPs: 11.91 | 7: iteration 56930/ 173500 | consumed samples: 14574080 | consumed tokens: 29847715840 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.551304E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.971 | TFLOPs: 11.95 | 7: iteration 56940/ 173500 | consumed samples: 14576640 | consumed tokens: 29852958720 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.564742E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.470 | TFLOPs: 11.94 | 7: iteration 56950/ 173500 | consumed samples: 14579200 | consumed tokens: 29858201600 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.554712E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.714 | TFLOPs: 11.92 | 7: iteration 56960/ 173500 | consumed samples: 14581760 | consumed tokens: 29863444480 | elapsed time per iteration (s): 0.08 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 4.551169E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.727 | TFLOPs: 11.88 | 7: iteration 56970/ 173500 | consumed samples: 14584320 | consumed tokens: 29868687360 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.553327E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.193 | TFLOPs: 11.91 | 7: iteration 56980/ 173500 | consumed samples: 14586880 | consumed tokens: 29873930240 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.549026E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.802 | TFLOPs: 11.94 | 7: iteration 56990/ 173500 | consumed samples: 14589440 | consumed tokens: 29879173120 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.551669E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.181 | TFLOPs: 11.92 | 7: iteration 57000/ 173500 | consumed samples: 14592000 | consumed tokens: 29884416000 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.540888E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.987 | TFLOPs: 11.95 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 57000 | lm loss value: 4.460407E+00 | lm loss PPL: 8.652274E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 57000 to checkpoints_14m91b100m 0: [2023-03-17 01:39:30,549] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step57000 is begin to save! 0: [2023-03-17 01:39:30,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:39:30,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:39:30,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:39:30,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:39:30,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:39:30,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:39:30,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:39:30,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:39:30,589] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:39:30,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:39:30,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:39:30,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:39:30,593] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step57000/mp_rank_00_model_states.pt 0: [2023-03-17 01:39:30,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:39:30,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:39:30,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:39:30,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2023-03-17 01:39:30,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2023-03-17 01:39:30,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:39:30,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:39:30,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2023-03-17 01:39:30,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:39:30,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:39:30,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: successfully saved checkpoint at iteration 57000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.27 7: iteration 57010/ 173500 | consumed samples: 14594560 | consumed tokens: 29889658880 | elapsed time per iteration (s): 0.09 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.552070E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.977 | TFLOPs: 10.41 | 7: iteration 57020/ 173500 | consumed samples: 14597120 | consumed tokens: 29894901760 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.553065E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.373 | TFLOPs: 11.89 | 7: iteration 57030/ 173500 | consumed samples: 14599680 | consumed tokens: 29900144640 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.563219E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.460 | TFLOPs: 11.94 | 7: iteration 57040/ 173500 | consumed samples: 14602240 | consumed tokens: 29905387520 | elapsed time per iteration (s): 0.08 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 4.548708E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.144 | TFLOPs: 11.89 | 7: iteration 57050/ 173500 | consumed samples: 14604800 | consumed tokens: 29910630400 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.548866E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.869 | TFLOPs: 11.84 | 7: iteration 57060/ 173500 | consumed samples: 14607360 | consumed tokens: 29915873280 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.548558E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.187 | TFLOPs: 11.71 | 7: iteration 57070/ 173500 | consumed samples: 14609920 | consumed tokens: 29921116160 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.548820E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.108 | TFLOPs: 11.95 | 7: iteration 57080/ 173500 | consumed samples: 14612480 | consumed tokens: 29926359040 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.547920E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.638 | TFLOPs: 11.93 | 7: iteration 57090/ 173500 | consumed samples: 14615040 | consumed tokens: 29931601920 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.557293E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.116 | TFLOPs: 11.92 | 7: iteration 57100/ 173500 | consumed samples: 14617600 | consumed tokens: 29936844800 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.545698E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.144 | TFLOPs: 11.90 | 7: iteration 57110/ 173500 | consumed samples: 14620160 | consumed tokens: 29942087680 | elapsed time per iteration (s): 0.08 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 4.541098E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.831 | TFLOPs: 11.89 | 7: iteration 57120/ 173500 | consumed samples: 14622720 | consumed tokens: 29947330560 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.525285E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.740 | TFLOPs: 11.95 | 7: iteration 57130/ 173500 | consumed samples: 14625280 | consumed tokens: 29952573440 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.561009E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.530 | TFLOPs: 11.90 | 7: iteration 57140/ 173500 | consumed samples: 14627840 | consumed tokens: 29957816320 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.544913E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.611 | TFLOPs: 11.93 | 7: iteration 57150/ 173500 | consumed samples: 14630400 | consumed tokens: 29963059200 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.566860E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.358 | TFLOPs: 11.94 | 7: iteration 57160/ 173500 | consumed samples: 14632960 | consumed tokens: 29968302080 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.559151E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.375 | TFLOPs: 11.93 | 7: iteration 57170/ 173500 | consumed samples: 14635520 | consumed tokens: 29973544960 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.552726E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.722 | TFLOPs: 11.85 | 7: iteration 57180/ 173500 | consumed samples: 14638080 | consumed tokens: 29978787840 | elapsed time per iteration (s): 0.08 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 4.547464E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.373 | TFLOPs: 11.95 | 7: iteration 57190/ 173500 | consumed samples: 14640640 | consumed tokens: 29984030720 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.538876E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.100 | TFLOPs: 11.87 | 7: iteration 57200/ 173500 | consumed samples: 14643200 | consumed tokens: 29989273600 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.550656E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.017 | TFLOPs: 11.98 | 7: iteration 57210/ 173500 | consumed samples: 14645760 | consumed tokens: 29994516480 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.557086E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.578 | TFLOPs: 11.93 | 7: iteration 57220/ 173500 | consumed samples: 14648320 | consumed tokens: 29999759360 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.550797E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.944 | TFLOPs: 11.97 | 7: iteration 57230/ 173500 | consumed samples: 14650880 | consumed tokens: 30005002240 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.551823E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.593 | TFLOPs: 11.99 | 7: iteration 57240/ 173500 | consumed samples: 14653440 | consumed tokens: 30010245120 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.549745E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.944 | TFLOPs: 12.01 | 7: iteration 57250/ 173500 | consumed samples: 14656000 | consumed tokens: 30015488000 | elapsed time per iteration (s): 0.08 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 4.552233E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.737 | TFLOPs: 12.02 | 7: iteration 57260/ 173500 | consumed samples: 14658560 | consumed tokens: 30020730880 | elapsed time per iteration (s): 0.10 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.548524E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.043 | TFLOPs: 9.36 | 7: iteration 57270/ 173500 | consumed samples: 14661120 | consumed tokens: 30025973760 | elapsed time per iteration (s): 0.11 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.540701E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.669 | TFLOPs: 8.82 | 7: iteration 57280/ 173500 | consumed samples: 14663680 | consumed tokens: 30031216640 | elapsed time per iteration (s): 0.11 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.561407E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.809 | TFLOPs: 8.95 | 7: iteration 57290/ 173500 | consumed samples: 14666240 | consumed tokens: 30036459520 | elapsed time per iteration (s): 0.12 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.530481E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.042 | TFLOPs: 8.21 | 7: iteration 57300/ 173500 | consumed samples: 14668800 | consumed tokens: 30041702400 | elapsed time per iteration (s): 0.11 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.559929E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.238 | TFLOPs: 9.02 | 7: iteration 57310/ 173500 | consumed samples: 14671360 | consumed tokens: 30046945280 | elapsed time per iteration (s): 0.10 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.557994E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.003 | TFLOPs: 9.92 | 7: iteration 57320/ 173500 | consumed samples: 14673920 | consumed tokens: 30052188160 | elapsed time per iteration (s): 0.10 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 4.550494E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.133 | TFLOPs: 9.56 | 7: iteration 57330/ 173500 | consumed samples: 14676480 | consumed tokens: 30057431040 | elapsed time per iteration (s): 0.12 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.549361E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.024 | TFLOPs: 8.21 | 7: iteration 57340/ 173500 | consumed samples: 14679040 | consumed tokens: 30062673920 | elapsed time per iteration (s): 0.13 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.554003E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1964.705 | TFLOPs: 7.31 | 7: iteration 57350/ 173500 | consumed samples: 14681600 | consumed tokens: 30067916800 | elapsed time per iteration (s): 0.12 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.554402E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2081.042 | TFLOPs: 7.74 | 7: iteration 57360/ 173500 | consumed samples: 14684160 | consumed tokens: 30073159680 | elapsed time per iteration (s): 0.14 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.549147E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1868.185 | TFLOPs: 6.95 | 7: iteration 57370/ 173500 | consumed samples: 14686720 | consumed tokens: 30078402560 | elapsed time per iteration (s): 0.13 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.548047E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1966.624 | TFLOPs: 7.31 | 7: iteration 57380/ 173500 | consumed samples: 14689280 | consumed tokens: 30083645440 | elapsed time per iteration (s): 0.13 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.554912E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2015.380 | TFLOPs: 7.50 | 7: iteration 57390/ 173500 | consumed samples: 14691840 | consumed tokens: 30088888320 | elapsed time per iteration (s): 0.08 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 4.543343E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.325 | TFLOPs: 11.90 | 7: iteration 57400/ 173500 | consumed samples: 14694400 | consumed tokens: 30094131200 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.560624E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.707 | TFLOPs: 11.89 | 7: iteration 57410/ 173500 | consumed samples: 14696960 | consumed tokens: 30099374080 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.555628E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.009 | TFLOPs: 11.88 | 7: iteration 57420/ 173500 | consumed samples: 14699520 | consumed tokens: 30104616960 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.562426E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.941 | TFLOPs: 11.89 | 7: iteration 57430/ 173500 | consumed samples: 14702080 | consumed tokens: 30109859840 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.538713E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.095 | TFLOPs: 11.95 | 7: iteration 57440/ 173500 | consumed samples: 14704640 | consumed tokens: 30115102720 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.550233E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.378 | TFLOPs: 11.89 | 7: iteration 57450/ 173500 | consumed samples: 14707200 | consumed tokens: 30120345600 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.536954E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.353 | TFLOPs: 11.97 | 7: iteration 57460/ 173500 | consumed samples: 14709760 | consumed tokens: 30125588480 | elapsed time per iteration (s): 0.08 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 4.553799E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.996 | TFLOPs: 11.99 | 7: iteration 57470/ 173500 | consumed samples: 14712320 | consumed tokens: 30130831360 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.552669E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.176 | TFLOPs: 11.97 | 7: iteration 57480/ 173500 | consumed samples: 14714880 | consumed tokens: 30136074240 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.559005E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.626 | TFLOPs: 11.88 | 7: iteration 57490/ 173500 | consumed samples: 14717440 | consumed tokens: 30141317120 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.549394E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.340 | TFLOPs: 12.01 | 7: iteration 57500/ 173500 | consumed samples: 14720000 | consumed tokens: 30146560000 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.548591E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.395 | TFLOPs: 12.02 | 7: iteration 57510/ 173500 | consumed samples: 14722560 | consumed tokens: 30151802880 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.541700E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.854 | TFLOPs: 11.84 | 7: iteration 57520/ 173500 | consumed samples: 14725120 | consumed tokens: 30157045760 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.539523E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.317 | TFLOPs: 12.02 | 7: iteration 57530/ 173500 | consumed samples: 14727680 | consumed tokens: 30162288640 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.554842E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.498 | TFLOPs: 12.02 | 7: iteration 57540/ 173500 | consumed samples: 14730240 | consumed tokens: 30167531520 | elapsed time per iteration (s): 0.08 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 4.550468E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.902 | TFLOPs: 11.85 | 7: iteration 57550/ 173500 | consumed samples: 14732800 | consumed tokens: 30172774400 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.539407E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.878 | TFLOPs: 11.96 | 7: iteration 57560/ 173500 | consumed samples: 14735360 | consumed tokens: 30178017280 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.550875E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.604 | TFLOPs: 12.00 | 7: iteration 57570/ 173500 | consumed samples: 14737920 | consumed tokens: 30183260160 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.545119E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.435 | TFLOPs: 11.89 | 7: iteration 57580/ 173500 | consumed samples: 14740480 | consumed tokens: 30188503040 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.547123E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.399 | TFLOPs: 12.01 | 7: iteration 57590/ 173500 | consumed samples: 14743040 | consumed tokens: 30193745920 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.542177E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.952 | TFLOPs: 12.00 | 7: iteration 57600/ 173500 | consumed samples: 14745600 | consumed tokens: 30198988800 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.547063E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.659 | TFLOPs: 11.96 | 7: iteration 57610/ 173500 | consumed samples: 14748160 | consumed tokens: 30204231680 | elapsed time per iteration (s): 0.08 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 4.550704E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.125 | TFLOPs: 11.96 | 7: iteration 57620/ 173500 | consumed samples: 14750720 | consumed tokens: 30209474560 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.538757E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.279 | TFLOPs: 11.95 | 7: iteration 57630/ 173500 | consumed samples: 14753280 | consumed tokens: 30214717440 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.535292E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.639 | TFLOPs: 11.65 | 7: iteration 57640/ 173500 | consumed samples: 14755840 | consumed tokens: 30219960320 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.551628E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.531 | TFLOPs: 11.93 | 7: iteration 57650/ 173500 | consumed samples: 14758400 | consumed tokens: 30225203200 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.552675E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.552 | TFLOPs: 11.90 | 7: iteration 57660/ 173500 | consumed samples: 14760960 | consumed tokens: 30230446080 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.543054E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.144 | TFLOPs: 11.90 | 7: iteration 57670/ 173500 | consumed samples: 14763520 | consumed tokens: 30235688960 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.546745E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.235 | TFLOPs: 11.83 | 7: iteration 57680/ 173500 | consumed samples: 14766080 | consumed tokens: 30240931840 | elapsed time per iteration (s): 0.08 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 4.547039E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.606 | TFLOPs: 11.94 | 7: iteration 57690/ 173500 | consumed samples: 14768640 | consumed tokens: 30246174720 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.549314E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.729 | TFLOPs: 11.94 | 7: iteration 57700/ 173500 | consumed samples: 14771200 | consumed tokens: 30251417600 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.549374E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.403 | TFLOPs: 11.95 | 7: iteration 57710/ 173500 | consumed samples: 14773760 | consumed tokens: 30256660480 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.548563E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.244 | TFLOPs: 11.55 | 7: iteration 57720/ 173500 | consumed samples: 14776320 | consumed tokens: 30261903360 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.544211E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.202 | TFLOPs: 11.93 | 7: iteration 57730/ 173500 | consumed samples: 14778880 | consumed tokens: 30267146240 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.560402E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.838 | TFLOPs: 11.94 | 7: iteration 57740/ 173500 | consumed samples: 14781440 | consumed tokens: 30272389120 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.540205E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.652 | TFLOPs: 11.94 | 7: iteration 57750/ 173500 | consumed samples: 14784000 | consumed tokens: 30277632000 | elapsed time per iteration (s): 0.08 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 4.545334E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.393 | TFLOPs: 11.95 | 7: iteration 57760/ 173500 | consumed samples: 14786560 | consumed tokens: 30282874880 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.538763E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.832 | TFLOPs: 11.97 | 7: iteration 57770/ 173500 | consumed samples: 14789120 | consumed tokens: 30288117760 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.545327E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.661 | TFLOPs: 11.96 | 7: iteration 57780/ 173500 | consumed samples: 14791680 | consumed tokens: 30293360640 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.538086E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.783 | TFLOPs: 11.89 | 7: iteration 57790/ 173500 | consumed samples: 14794240 | consumed tokens: 30298603520 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.539428E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.031 | TFLOPs: 11.98 | 7: iteration 57800/ 173500 | consumed samples: 14796800 | consumed tokens: 30303846400 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.546677E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.468 | TFLOPs: 11.94 | 7: iteration 57810/ 173500 | consumed samples: 14799360 | consumed tokens: 30309089280 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.556461E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.138 | TFLOPs: 11.95 | 7: iteration 57820/ 173500 | consumed samples: 14801920 | consumed tokens: 30314332160 | elapsed time per iteration (s): 0.08 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 4.537635E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.009 | TFLOPs: 11.94 | 7: iteration 57830/ 173500 | consumed samples: 14804480 | consumed tokens: 30319575040 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.548170E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.396 | TFLOPs: 11.93 | 7: iteration 57840/ 173500 | consumed samples: 14807040 | consumed tokens: 30324817920 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.563138E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.647 | TFLOPs: 11.91 | 7: iteration 57850/ 173500 | consumed samples: 14809600 | consumed tokens: 30330060800 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.562790E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.626 | TFLOPs: 11.79 | 7: iteration 57860/ 173500 | consumed samples: 14812160 | consumed tokens: 30335303680 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.556709E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.008 | TFLOPs: 11.82 | 7: iteration 57870/ 173500 | consumed samples: 14814720 | consumed tokens: 30340546560 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.562241E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.875 | TFLOPs: 11.89 | 7: iteration 57880/ 173500 | consumed samples: 14817280 | consumed tokens: 30345789440 | elapsed time per iteration (s): 0.09 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.553077E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.349 | TFLOPs: 10.56 | 7: iteration 57890/ 173500 | consumed samples: 14819840 | consumed tokens: 30351032320 | elapsed time per iteration (s): 0.08 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 4.540608E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.110 | TFLOPs: 11.94 | 7: iteration 57900/ 173500 | consumed samples: 14822400 | consumed tokens: 30356275200 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.568741E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.832 | TFLOPs: 11.94 | 7: iteration 57910/ 173500 | consumed samples: 14824960 | consumed tokens: 30361518080 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.542303E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.652 | TFLOPs: 11.94 | 7: iteration 57920/ 173500 | consumed samples: 14827520 | consumed tokens: 30366760960 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.557807E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.608 | TFLOPs: 11.93 | 7: iteration 57930/ 173500 | consumed samples: 14830080 | consumed tokens: 30372003840 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.562526E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.541 | TFLOPs: 11.92 | 7: iteration 57940/ 173500 | consumed samples: 14832640 | consumed tokens: 30377246720 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.564476E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.132 | TFLOPs: 11.96 | 7: iteration 57950/ 173500 | consumed samples: 14835200 | consumed tokens: 30382489600 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.536213E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.074 | TFLOPs: 11.94 | 7: iteration 57960/ 173500 | consumed samples: 14837760 | consumed tokens: 30387732480 | elapsed time per iteration (s): 0.08 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 4.565337E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.918 | TFLOPs: 11.94 | 7: iteration 57970/ 173500 | consumed samples: 14840320 | consumed tokens: 30392975360 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.553028E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.139 | TFLOPs: 11.92 | 7: iteration 57980/ 173500 | consumed samples: 14842880 | consumed tokens: 30398218240 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.561447E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.894 | TFLOPs: 11.94 | 7: iteration 57990/ 173500 | consumed samples: 14845440 | consumed tokens: 30403461120 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.547313E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.586 | TFLOPs: 11.90 | 0: [2023-03-17 01:40:55,196] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=0, lr=[0.00015640412143068475, 0.00015640412143068475, 0.00015640412143068475], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 58000/ 173500 | consumed samples: 14848000 | consumed tokens: 30408704000 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.550365E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.733 | TFLOPs: 11.92 | 0: steps: 58000 loss: 4.5791 iter time (s): 0.084 samples/sec: 3030.025 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 58000 | lm loss value: 4.391181E+00 | lm loss PPL: 8.073567E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 58000 to checkpoints_14m91b100m 0: [2023-03-17 01:40:55,254] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step58000 is begin to save! 0: [2023-03-17 01:40:55,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:40:55,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:40:55,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:40:55,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:40:55,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:40:55,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:40:55,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:40:55,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:40:55,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:40:55,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:40:55,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:40:55,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:40:55,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step58000/mp_rank_00_model_states.pt 0: [2023-03-17 01:40:55,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:40:55,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:40:55,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:40:55,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2023-03-17 01:40:55,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2023-03-17 01:40:55,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:40:55,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:40:55,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: successfully saved checkpoint at iteration 58000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.73 7: iteration 58010/ 173500 | consumed samples: 14850560 | consumed tokens: 30413946880 | elapsed time per iteration (s): 0.09 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.567881E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.793 | TFLOPs: 10.21 | 7: iteration 58020/ 173500 | consumed samples: 14853120 | consumed tokens: 30419189760 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.557632E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.796 | TFLOPs: 11.92 | 7: iteration 58030/ 173500 | consumed samples: 14855680 | consumed tokens: 30424432640 | elapsed time per iteration (s): 0.08 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 4.549500E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.994 | TFLOPs: 11.90 | 7: iteration 58040/ 173500 | consumed samples: 14858240 | consumed tokens: 30429675520 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.550238E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.849 | TFLOPs: 11.85 | 7: iteration 58050/ 173500 | consumed samples: 14860800 | consumed tokens: 30434918400 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.553273E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.750 | TFLOPs: 11.90 | 7: iteration 58060/ 173500 | consumed samples: 14863360 | consumed tokens: 30440161280 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.546457E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.352 | TFLOPs: 11.95 | 7: iteration 58070/ 173500 | consumed samples: 14865920 | consumed tokens: 30445404160 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.547674E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.969 | TFLOPs: 11.90 | 7: iteration 58080/ 173500 | consumed samples: 14868480 | consumed tokens: 30450647040 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.558614E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.779 | TFLOPs: 11.92 | 7: iteration 58090/ 173500 | consumed samples: 14871040 | consumed tokens: 30455889920 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.546105E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.835 | TFLOPs: 11.89 | 7: iteration 58100/ 173500 | consumed samples: 14873600 | consumed tokens: 30461132800 | elapsed time per iteration (s): 0.08 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 4.556812E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.591 | TFLOPs: 11.82 | 7: iteration 58110/ 173500 | consumed samples: 14876160 | consumed tokens: 30466375680 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.548656E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.304 | TFLOPs: 11.89 | 7: iteration 58120/ 173500 | consumed samples: 14878720 | consumed tokens: 30471618560 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.549074E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.943 | TFLOPs: 11.89 | 7: iteration 58130/ 173500 | consumed samples: 14881280 | consumed tokens: 30476861440 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.551329E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.455 | TFLOPs: 11.96 | 7: iteration 58140/ 173500 | consumed samples: 14883840 | consumed tokens: 30482104320 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.551046E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.158 | TFLOPs: 11.92 | 7: iteration 58150/ 173500 | consumed samples: 14886400 | consumed tokens: 30487347200 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.555378E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.950 | TFLOPs: 11.92 | 7: iteration 58160/ 173500 | consumed samples: 14888960 | consumed tokens: 30492590080 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.539972E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.205 | TFLOPs: 11.89 | 7: iteration 58170/ 173500 | consumed samples: 14891520 | consumed tokens: 30497832960 | elapsed time per iteration (s): 0.08 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.554117E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.290 | TFLOPs: 11.66 | 7: iteration 58180/ 173500 | consumed samples: 14894080 | consumed tokens: 30503075840 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.524957E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.209 | TFLOPs: 11.95 | 7: iteration 58190/ 173500 | consumed samples: 14896640 | consumed tokens: 30508318720 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.566247E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.208 | TFLOPs: 11.93 | 7: iteration 58200/ 173500 | consumed samples: 14899200 | consumed tokens: 30513561600 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.551295E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.842 | TFLOPs: 11.94 | 7: iteration 58210/ 173500 | consumed samples: 14901760 | consumed tokens: 30518804480 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.541367E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.097 | TFLOPs: 11.82 | 7: iteration 58220/ 173500 | consumed samples: 14904320 | consumed tokens: 30524047360 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.561434E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.111 | TFLOPs: 11.93 | 7: iteration 58230/ 173500 | consumed samples: 14906880 | consumed tokens: 30529290240 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.554821E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.343 | TFLOPs: 11.94 | 7: iteration 58240/ 173500 | consumed samples: 14909440 | consumed tokens: 30534533120 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.541805E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.133 | TFLOPs: 11.90 | 7: iteration 58250/ 173500 | consumed samples: 14912000 | consumed tokens: 30539776000 | elapsed time per iteration (s): 0.08 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 4.551368E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.450 | TFLOPs: 11.89 | 7: iteration 58260/ 173500 | consumed samples: 14914560 | consumed tokens: 30545018880 | elapsed time per iteration (s): 0.09 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.550105E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.034 | TFLOPs: 11.14 | 7: iteration 58270/ 173500 | consumed samples: 14917120 | consumed tokens: 30550261760 | elapsed time per iteration (s): 0.08 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.556152E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.119 | TFLOPs: 11.50 | 7: iteration 58280/ 173500 | consumed samples: 14919680 | consumed tokens: 30555504640 | elapsed time per iteration (s): 0.08 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.564795E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.802 | TFLOPs: 11.23 | 7: iteration 58290/ 173500 | consumed samples: 14922240 | consumed tokens: 30560747520 | elapsed time per iteration (s): 0.08 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.544593E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.897 | TFLOPs: 11.30 | 7: iteration 58300/ 173500 | consumed samples: 14924800 | consumed tokens: 30565990400 | elapsed time per iteration (s): 0.08 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.546582E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.097 | TFLOPs: 11.41 | 7: iteration 58310/ 173500 | consumed samples: 14927360 | consumed tokens: 30571233280 | elapsed time per iteration (s): 0.09 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.550500E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2926.733 | TFLOPs: 10.89 | 7: iteration 58320/ 173500 | consumed samples: 14929920 | consumed tokens: 30576476160 | elapsed time per iteration (s): 0.09 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 4.548461E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.685 | TFLOPs: 11.15 | 7: iteration 58330/ 173500 | consumed samples: 14932480 | consumed tokens: 30581719040 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.566524E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.396 | TFLOPs: 11.17 | 7: iteration 58340/ 173500 | consumed samples: 14935040 | consumed tokens: 30586961920 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.547609E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2811.316 | TFLOPs: 10.46 | 7: iteration 58350/ 173500 | consumed samples: 14937600 | consumed tokens: 30592204800 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.549958E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.975 | TFLOPs: 10.67 | 7: iteration 58360/ 173500 | consumed samples: 14940160 | consumed tokens: 30597447680 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.548252E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2811.955 | TFLOPs: 10.46 | 7: iteration 58370/ 173500 | consumed samples: 14942720 | consumed tokens: 30602690560 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.547723E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2926.700 | TFLOPs: 10.89 | 7: iteration 58380/ 173500 | consumed samples: 14945280 | consumed tokens: 30607933440 | elapsed time per iteration (s): 0.08 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.547336E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.349 | TFLOPs: 11.28 | 7: iteration 58390/ 173500 | consumed samples: 14947840 | consumed tokens: 30613176320 | elapsed time per iteration (s): 0.09 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 4.543874E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.600 | TFLOPs: 11.18 | 7: iteration 58400/ 173500 | consumed samples: 14950400 | consumed tokens: 30618419200 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.552974E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.125 | TFLOPs: 11.69 | 7: iteration 58410/ 173500 | consumed samples: 14952960 | consumed tokens: 30623662080 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.547993E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.386 | TFLOPs: 11.85 | 7: iteration 58420/ 173500 | consumed samples: 14955520 | consumed tokens: 30628904960 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.552737E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.687 | TFLOPs: 11.91 | 7: iteration 58430/ 173500 | consumed samples: 14958080 | consumed tokens: 30634147840 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.552685E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.109 | TFLOPs: 11.92 | 7: iteration 58440/ 173500 | consumed samples: 14960640 | consumed tokens: 30639390720 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.541872E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.611 | TFLOPs: 11.89 | 7: iteration 58450/ 173500 | consumed samples: 14963200 | consumed tokens: 30644633600 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.546188E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.937 | TFLOPs: 11.91 | 7: iteration 58460/ 173500 | consumed samples: 14965760 | consumed tokens: 30649876480 | elapsed time per iteration (s): 0.08 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 4.552606E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.854 | TFLOPs: 11.88 | 7: iteration 58470/ 173500 | consumed samples: 14968320 | consumed tokens: 30655119360 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.533564E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.037 | TFLOPs: 11.90 | 7: iteration 58480/ 173500 | consumed samples: 14970880 | consumed tokens: 30660362240 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.553278E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.947 | TFLOPs: 11.62 | 7: iteration 58490/ 173500 | consumed samples: 14973440 | consumed tokens: 30665605120 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.558949E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.252 | TFLOPs: 11.85 | 7: iteration 58500/ 173500 | consumed samples: 14976000 | consumed tokens: 30670848000 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.542767E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.903 | TFLOPs: 11.65 | 7: iteration 58510/ 173500 | consumed samples: 14978560 | consumed tokens: 30676090880 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.570846E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.157 | TFLOPs: 11.91 | 7: iteration 58520/ 173500 | consumed samples: 14981120 | consumed tokens: 30681333760 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.542071E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.766 | TFLOPs: 11.81 | 7: iteration 58530/ 173500 | consumed samples: 14983680 | consumed tokens: 30686576640 | elapsed time per iteration (s): 0.08 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 4.551943E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.912 | TFLOPs: 11.92 | 7: iteration 58540/ 173500 | consumed samples: 14986240 | consumed tokens: 30691819520 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.550312E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.839 | TFLOPs: 11.95 | 7: iteration 58550/ 173500 | consumed samples: 14988800 | consumed tokens: 30697062400 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.546757E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.384 | TFLOPs: 11.26 | 7: iteration 58560/ 173500 | consumed samples: 14991360 | consumed tokens: 30702305280 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.536861E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.812 | TFLOPs: 11.58 | 7: iteration 58570/ 173500 | consumed samples: 14993920 | consumed tokens: 30707548160 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.541850E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.801 | TFLOPs: 11.92 | 7: iteration 58580/ 173500 | consumed samples: 14996480 | consumed tokens: 30712791040 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.548965E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.151 | TFLOPs: 11.91 | 7: iteration 58590/ 173500 | consumed samples: 14999040 | consumed tokens: 30718033920 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.552468E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.096 | TFLOPs: 11.83 | 7: iteration 58600/ 173500 | consumed samples: 15001600 | consumed tokens: 30723276800 | elapsed time per iteration (s): 0.08 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 4.536068E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.847 | TFLOPs: 11.83 | 7: iteration 58610/ 173500 | consumed samples: 15004160 | consumed tokens: 30728519680 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.538702E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.907 | TFLOPs: 11.93 | 7: iteration 58620/ 173500 | consumed samples: 15006720 | consumed tokens: 30733762560 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.551596E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.083 | TFLOPs: 11.92 | 7: iteration 58630/ 173500 | consumed samples: 15009280 | consumed tokens: 30739005440 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.536434E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.765 | TFLOPs: 11.95 | 7: iteration 58640/ 173500 | consumed samples: 15011840 | consumed tokens: 30744248320 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.540857E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.723 | TFLOPs: 11.94 | 7: iteration 58650/ 173500 | consumed samples: 15014400 | consumed tokens: 30749491200 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.556549E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.270 | TFLOPs: 11.93 | 7: iteration 58660/ 173500 | consumed samples: 15016960 | consumed tokens: 30754734080 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.553791E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.820 | TFLOPs: 11.86 | 7: iteration 58670/ 173500 | consumed samples: 15019520 | consumed tokens: 30759976960 | elapsed time per iteration (s): 0.08 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 4.552209E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.053 | TFLOPs: 11.90 | 7: iteration 58680/ 173500 | consumed samples: 15022080 | consumed tokens: 30765219840 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.543276E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.298 | TFLOPs: 11.85 | 7: iteration 58690/ 173500 | consumed samples: 15024640 | consumed tokens: 30770462720 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.541158E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.608 | TFLOPs: 11.80 | 7: iteration 58700/ 173500 | consumed samples: 15027200 | consumed tokens: 30775705600 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.545545E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.865 | TFLOPs: 11.85 | 7: iteration 58710/ 173500 | consumed samples: 15029760 | consumed tokens: 30780948480 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.544274E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.460 | TFLOPs: 11.87 | 7: iteration 58720/ 173500 | consumed samples: 15032320 | consumed tokens: 30786191360 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.546754E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.950 | TFLOPs: 11.85 | 7: iteration 58730/ 173500 | consumed samples: 15034880 | consumed tokens: 30791434240 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.562409E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.159 | TFLOPs: 11.56 | 7: iteration 58740/ 173500 | consumed samples: 15037440 | consumed tokens: 30796677120 | elapsed time per iteration (s): 0.08 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 4.556963E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.304 | TFLOPs: 11.87 | 7: iteration 58750/ 173500 | consumed samples: 15040000 | consumed tokens: 30801920000 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.549216E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.987 | TFLOPs: 11.90 | 7: iteration 58760/ 173500 | consumed samples: 15042560 | consumed tokens: 30807162880 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.552085E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.029 | TFLOPs: 11.90 | 7: iteration 58770/ 173500 | consumed samples: 15045120 | consumed tokens: 30812405760 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.544399E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.733 | TFLOPs: 11.88 | 7: iteration 58780/ 173500 | consumed samples: 15047680 | consumed tokens: 30817648640 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.537695E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.284 | TFLOPs: 11.89 | 7: iteration 58790/ 173500 | consumed samples: 15050240 | consumed tokens: 30822891520 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.551491E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.448 | TFLOPs: 11.85 | 7: iteration 58800/ 173500 | consumed samples: 15052800 | consumed tokens: 30828134400 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.540530E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.979 | TFLOPs: 11.88 | 7: iteration 58810/ 173500 | consumed samples: 15055360 | consumed tokens: 30833377280 | elapsed time per iteration (s): 0.08 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 4.549871E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.565 | TFLOPs: 11.92 | 7: iteration 58820/ 173500 | consumed samples: 15057920 | consumed tokens: 30838620160 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.562765E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.427 | TFLOPs: 11.90 | 7: iteration 58830/ 173500 | consumed samples: 15060480 | consumed tokens: 30843863040 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.552530E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.576 | TFLOPs: 11.86 | 7: iteration 58840/ 173500 | consumed samples: 15063040 | consumed tokens: 30849105920 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.554576E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.516 | TFLOPs: 11.92 | 7: iteration 58850/ 173500 | consumed samples: 15065600 | consumed tokens: 30854348800 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.559343E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.508 | TFLOPs: 11.94 | 7: iteration 58860/ 173500 | consumed samples: 15068160 | consumed tokens: 30859591680 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.546236E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.122 | TFLOPs: 11.93 | 7: iteration 58870/ 173500 | consumed samples: 15070720 | consumed tokens: 30864834560 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.538096E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.625 | TFLOPs: 11.81 | 7: iteration 58880/ 173500 | consumed samples: 15073280 | consumed tokens: 30870077440 | elapsed time per iteration (s): 0.08 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 4.555402E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.788 | TFLOPs: 11.86 | 7: iteration 58890/ 173500 | consumed samples: 15075840 | consumed tokens: 30875320320 | elapsed time per iteration (s): 0.08 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.549290E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.597 | TFLOPs: 11.86 | 7: iteration 58900/ 173500 | consumed samples: 15078400 | consumed tokens: 30880563200 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.559847E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.625 | TFLOPs: 11.15 | 7: iteration 58910/ 173500 | consumed samples: 15080960 | consumed tokens: 30885806080 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.555075E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.511 | TFLOPs: 10.81 | 7: iteration 58920/ 173500 | consumed samples: 15083520 | consumed tokens: 30891048960 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.554108E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.992 | TFLOPs: 10.48 | 7: iteration 58930/ 173500 | consumed samples: 15086080 | consumed tokens: 30896291840 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.545897E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.578 | TFLOPs: 10.70 | 7: iteration 58940/ 173500 | consumed samples: 15088640 | consumed tokens: 30901534720 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.551925E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2884.087 | TFLOPs: 10.73 | 7: iteration 58950/ 173500 | consumed samples: 15091200 | consumed tokens: 30906777600 | elapsed time per iteration (s): 0.09 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 4.553460E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2705.899 | TFLOPs: 10.06 | 7: iteration 58960/ 173500 | consumed samples: 15093760 | consumed tokens: 30912020480 | elapsed time per iteration (s): 0.09 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.548170E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.567 | TFLOPs: 11.14 | 7: iteration 58970/ 173500 | consumed samples: 15096320 | consumed tokens: 30917263360 | elapsed time per iteration (s): 0.09 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.543410E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.368 | TFLOPs: 11.12 | 7: iteration 58980/ 173500 | consumed samples: 15098880 | consumed tokens: 30922506240 | elapsed time per iteration (s): 0.09 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.553260E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2854.722 | TFLOPs: 10.62 | 7: iteration 58990/ 173500 | consumed samples: 15101440 | consumed tokens: 30927749120 | elapsed time per iteration (s): 0.08 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.543752E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.255 | TFLOPs: 11.91 | 7: iteration 59000/ 173500 | consumed samples: 15104000 | consumed tokens: 30932992000 | elapsed time per iteration (s): 0.08 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.535708E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.280 | TFLOPs: 11.49 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 59000 | lm loss value: 4.414203E+00 | lm loss PPL: 8.261594E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 59000 to checkpoints_14m91b100m 0: [2023-03-17 01:42:17,248] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step59000 is begin to save! 0: [2023-03-17 01:42:17,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:42:17,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:42:17,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:42:17,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:42:17,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:42:17,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:42:17,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:42:17,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:42:17,286] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:42:17,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:42:17,289] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:42:17,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:42:17,290] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step59000/mp_rank_00_model_states.pt 0: [2023-03-17 01:42:17,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:42:17,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:42:17,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:42:17,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2023-03-17 01:42:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2023-03-17 01:42:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:42:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:42:17,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: successfully saved checkpoint at iteration 59000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.75 7: iteration 59010/ 173500 | consumed samples: 15106560 | consumed tokens: 30938234880 | elapsed time per iteration (s): 0.09 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.559081E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2781.695 | TFLOPs: 10.35 | 7: iteration 59020/ 173500 | consumed samples: 15109120 | consumed tokens: 30943477760 | elapsed time per iteration (s): 0.08 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 4.542504E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.113 | TFLOPs: 11.89 | 7: iteration 59030/ 173500 | consumed samples: 15111680 | consumed tokens: 30948720640 | elapsed time per iteration (s): 0.08 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.538200E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.775 | TFLOPs: 11.85 | 7: iteration 59040/ 173500 | consumed samples: 15114240 | consumed tokens: 30953963520 | elapsed time per iteration (s): 0.08 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.534716E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.121 | TFLOPs: 11.88 | 7: iteration 59050/ 173500 | consumed samples: 15116800 | consumed tokens: 30959206400 | elapsed time per iteration (s): 0.08 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.542268E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.329 | TFLOPs: 11.86 | 7: iteration 59060/ 173500 | consumed samples: 15119360 | consumed tokens: 30964449280 | elapsed time per iteration (s): 0.08 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.546006E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.859 | TFLOPs: 11.34 | 7: iteration 59070/ 173500 | consumed samples: 15121920 | consumed tokens: 30969692160 | elapsed time per iteration (s): 0.09 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.536877E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.644 | TFLOPs: 10.64 | 7: iteration 59080/ 173500 | consumed samples: 15124480 | consumed tokens: 30974935040 | elapsed time per iteration (s): 0.09 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.536221E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2921.896 | TFLOPs: 10.87 | 7: iteration 59090/ 173500 | consumed samples: 15127040 | consumed tokens: 30980177920 | elapsed time per iteration (s): 0.09 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 4.551960E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.526 | TFLOPs: 11.11 | 7: iteration 59100/ 173500 | consumed samples: 15129600 | consumed tokens: 30985420800 | elapsed time per iteration (s): 0.09 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.546740E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.896 | TFLOPs: 10.57 | 7: iteration 59110/ 173500 | consumed samples: 15132160 | consumed tokens: 30990663680 | elapsed time per iteration (s): 0.09 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.550705E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.217 | TFLOPs: 10.37 | 7: iteration 59120/ 173500 | consumed samples: 15134720 | consumed tokens: 30995906560 | elapsed time per iteration (s): 0.09 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.546899E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2914.373 | TFLOPs: 10.84 | 7: iteration 59130/ 173500 | consumed samples: 15137280 | consumed tokens: 31001149440 | elapsed time per iteration (s): 0.08 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.551543E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.307 | TFLOPs: 11.60 | 7: iteration 59140/ 173500 | consumed samples: 15139840 | consumed tokens: 31006392320 | elapsed time per iteration (s): 0.08 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.536526E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.795 | TFLOPs: 11.86 | 7: iteration 59150/ 173500 | consumed samples: 15142400 | consumed tokens: 31011635200 | elapsed time per iteration (s): 0.08 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.543721E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.027 | TFLOPs: 11.88 | 7: iteration 59160/ 173500 | consumed samples: 15144960 | consumed tokens: 31016878080 | elapsed time per iteration (s): 0.08 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 4.530819E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.221 | TFLOPs: 11.82 | 7: iteration 59170/ 173500 | consumed samples: 15147520 | consumed tokens: 31022120960 | elapsed time per iteration (s): 0.09 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.575393E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.874 | TFLOPs: 10.86 | 7: iteration 59180/ 173500 | consumed samples: 15150080 | consumed tokens: 31027363840 | elapsed time per iteration (s): 0.09 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.551838E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.345 | TFLOPs: 11.06 | 7: iteration 59190/ 173500 | consumed samples: 15152640 | consumed tokens: 31032606720 | elapsed time per iteration (s): 0.08 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.545275E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.169 | TFLOPs: 11.86 | 7: iteration 59200/ 173500 | consumed samples: 15155200 | consumed tokens: 31037849600 | elapsed time per iteration (s): 0.08 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.539895E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.280 | TFLOPs: 11.87 | 7: iteration 59210/ 173500 | consumed samples: 15157760 | consumed tokens: 31043092480 | elapsed time per iteration (s): 0.08 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.555851E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.339 | TFLOPs: 11.85 | 7: iteration 59220/ 173500 | consumed samples: 15160320 | consumed tokens: 31048335360 | elapsed time per iteration (s): 0.08 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.548678E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.368 | TFLOPs: 11.87 | 7: iteration 59230/ 173500 | consumed samples: 15162880 | consumed tokens: 31053578240 | elapsed time per iteration (s): 0.08 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 4.563554E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.656 | TFLOPs: 11.90 | 7: iteration 59240/ 173500 | consumed samples: 15165440 | consumed tokens: 31058821120 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.558298E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.070 | TFLOPs: 11.88 | 7: iteration 59250/ 173500 | consumed samples: 15168000 | consumed tokens: 31064064000 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.544128E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.881 | TFLOPs: 11.91 | 7: iteration 59260/ 173500 | consumed samples: 15170560 | consumed tokens: 31069306880 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.545581E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.247 | TFLOPs: 11.86 | 7: iteration 59270/ 173500 | consumed samples: 15173120 | consumed tokens: 31074549760 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.562988E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.657 | TFLOPs: 11.93 | 7: iteration 59280/ 173500 | consumed samples: 15175680 | consumed tokens: 31079792640 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.532713E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.640 | TFLOPs: 11.93 | 7: iteration 59290/ 173500 | consumed samples: 15178240 | consumed tokens: 31085035520 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.542249E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.303 | TFLOPs: 11.95 | 7: iteration 59300/ 173500 | consumed samples: 15180800 | consumed tokens: 31090278400 | elapsed time per iteration (s): 0.08 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 4.552818E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.274 | TFLOPs: 11.64 | 7: iteration 59310/ 173500 | consumed samples: 15183360 | consumed tokens: 31095521280 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.552719E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.982 | TFLOPs: 11.91 | 7: iteration 59320/ 173500 | consumed samples: 15185920 | consumed tokens: 31100764160 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.543243E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.271 | TFLOPs: 11.88 | 7: iteration 59330/ 173500 | consumed samples: 15188480 | consumed tokens: 31106007040 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.556347E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.606 | TFLOPs: 11.75 | 7: iteration 59340/ 173500 | consumed samples: 15191040 | consumed tokens: 31111249920 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.551410E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.582 | TFLOPs: 11.83 | 7: iteration 59350/ 173500 | consumed samples: 15193600 | consumed tokens: 31116492800 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.544964E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.894 | TFLOPs: 11.89 | 7: iteration 59360/ 173500 | consumed samples: 15196160 | consumed tokens: 31121735680 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.542797E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.179 | TFLOPs: 11.85 | 7: iteration 59370/ 173500 | consumed samples: 15198720 | consumed tokens: 31126978560 | elapsed time per iteration (s): 0.08 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.545802E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.766 | TFLOPs: 11.77 | 7: iteration 59380/ 173500 | consumed samples: 15201280 | consumed tokens: 31132221440 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.552245E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.515 | TFLOPs: 11.90 | 7: iteration 59390/ 173500 | consumed samples: 15203840 | consumed tokens: 31137464320 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.554050E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.628 | TFLOPs: 11.89 | 7: iteration 59400/ 173500 | consumed samples: 15206400 | consumed tokens: 31142707200 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.551009E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.710 | TFLOPs: 11.87 | 7: iteration 59410/ 173500 | consumed samples: 15208960 | consumed tokens: 31147950080 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.538446E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.894 | TFLOPs: 11.83 | 7: iteration 59420/ 173500 | consumed samples: 15211520 | consumed tokens: 31153192960 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.549366E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.811 | TFLOPs: 11.86 | 7: iteration 59430/ 173500 | consumed samples: 15214080 | consumed tokens: 31158435840 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.555894E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.387 | TFLOPs: 11.87 | 7: iteration 59440/ 173500 | consumed samples: 15216640 | consumed tokens: 31163678720 | elapsed time per iteration (s): 0.08 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 4.547461E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.721 | TFLOPs: 11.58 | 7: iteration 59450/ 173500 | consumed samples: 15219200 | consumed tokens: 31168921600 | elapsed time per iteration (s): 0.08 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.555359E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.763 | TFLOPs: 11.85 | 7: iteration 59460/ 173500 | consumed samples: 15221760 | consumed tokens: 31174164480 | elapsed time per iteration (s): 0.08 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.551548E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.112 | TFLOPs: 11.73 | 7: iteration 59470/ 173500 | consumed samples: 15224320 | consumed tokens: 31179407360 | elapsed time per iteration (s): 0.09 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.536760E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.145 | TFLOPs: 10.76 | 7: iteration 59480/ 173500 | consumed samples: 15226880 | consumed tokens: 31184650240 | elapsed time per iteration (s): 0.10 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.536096E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.736 | TFLOPs: 9.24 | 7: iteration 59490/ 173500 | consumed samples: 15229440 | consumed tokens: 31189893120 | elapsed time per iteration (s): 0.08 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.542820E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.127 | TFLOPs: 11.35 | 7: iteration 59500/ 173500 | consumed samples: 15232000 | consumed tokens: 31195136000 | elapsed time per iteration (s): 0.09 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.541618E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.478 | TFLOPs: 11.11 | 7: iteration 59510/ 173500 | consumed samples: 15234560 | consumed tokens: 31200378880 | elapsed time per iteration (s): 0.08 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 4.544771E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.563 | TFLOPs: 11.30 | 7: iteration 59520/ 173500 | consumed samples: 15237120 | consumed tokens: 31205621760 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.543806E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.893 | TFLOPs: 11.88 | 7: iteration 59530/ 173500 | consumed samples: 15239680 | consumed tokens: 31210864640 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.555063E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.232 | TFLOPs: 11.87 | 7: iteration 59540/ 173500 | consumed samples: 15242240 | consumed tokens: 31216107520 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.545689E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.177 | TFLOPs: 11.78 | 7: iteration 59550/ 173500 | consumed samples: 15244800 | consumed tokens: 31221350400 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.541545E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.754 | TFLOPs: 11.87 | 7: iteration 59560/ 173500 | consumed samples: 15247360 | consumed tokens: 31226593280 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.551071E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.497 | TFLOPs: 11.87 | 7: iteration 59570/ 173500 | consumed samples: 15249920 | consumed tokens: 31231836160 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.544939E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.427 | TFLOPs: 11.90 | 7: iteration 59580/ 173500 | consumed samples: 15252480 | consumed tokens: 31237079040 | elapsed time per iteration (s): 0.08 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 4.536269E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.335 | TFLOPs: 11.91 | 7: iteration 59590/ 173500 | consumed samples: 15255040 | consumed tokens: 31242321920 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.537501E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.289 | TFLOPs: 11.86 | 7: iteration 59600/ 173500 | consumed samples: 15257600 | consumed tokens: 31247564800 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.552686E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.353 | TFLOPs: 11.86 | 7: iteration 59610/ 173500 | consumed samples: 15260160 | consumed tokens: 31252807680 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.543414E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.007 | TFLOPs: 11.83 | 7: iteration 59620/ 173500 | consumed samples: 15262720 | consumed tokens: 31258050560 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.543930E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.014 | TFLOPs: 11.71 | 7: iteration 59630/ 173500 | consumed samples: 15265280 | consumed tokens: 31263293440 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.543924E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.653 | TFLOPs: 11.80 | 7: iteration 59640/ 173500 | consumed samples: 15267840 | consumed tokens: 31268536320 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.557777E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.525 | TFLOPs: 11.85 | 7: iteration 59650/ 173500 | consumed samples: 15270400 | consumed tokens: 31273779200 | elapsed time per iteration (s): 0.08 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 4.555991E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.173 | TFLOPs: 11.82 | 7: iteration 59660/ 173500 | consumed samples: 15272960 | consumed tokens: 31279022080 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.538815E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.906 | TFLOPs: 11.77 | 7: iteration 59670/ 173500 | consumed samples: 15275520 | consumed tokens: 31284264960 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.542079E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.883 | TFLOPs: 11.90 | 7: iteration 59680/ 173500 | consumed samples: 15278080 | consumed tokens: 31289507840 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.548866E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.819 | TFLOPs: 11.83 | 7: iteration 59690/ 173500 | consumed samples: 15280640 | consumed tokens: 31294750720 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.546260E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.018 | TFLOPs: 11.79 | 7: iteration 59700/ 173500 | consumed samples: 15283200 | consumed tokens: 31299993600 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.539427E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.291 | TFLOPs: 11.81 | 7: iteration 59710/ 173500 | consumed samples: 15285760 | consumed tokens: 31305236480 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.548102E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.062 | TFLOPs: 11.86 | 7: iteration 59720/ 173500 | consumed samples: 15288320 | consumed tokens: 31310479360 | elapsed time per iteration (s): 0.08 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 4.558602E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.721 | TFLOPs: 11.87 | 7: iteration 59730/ 173500 | consumed samples: 15290880 | consumed tokens: 31315722240 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.547432E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.538 | TFLOPs: 11.87 | 7: iteration 59740/ 173500 | consumed samples: 15293440 | consumed tokens: 31320965120 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.541433E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.781 | TFLOPs: 11.87 | 7: iteration 59750/ 173500 | consumed samples: 15296000 | consumed tokens: 31326208000 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.540191E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.476 | TFLOPs: 11.33 | 7: iteration 59760/ 173500 | consumed samples: 15298560 | consumed tokens: 31331450880 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.540470E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.964 | TFLOPs: 11.84 | 7: iteration 59770/ 173500 | consumed samples: 15301120 | consumed tokens: 31336693760 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.541116E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.635 | TFLOPs: 11.84 | 7: iteration 59780/ 173500 | consumed samples: 15303680 | consumed tokens: 31341936640 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.553721E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.970 | TFLOPs: 11.56 | 7: iteration 59790/ 173500 | consumed samples: 15306240 | consumed tokens: 31347179520 | elapsed time per iteration (s): 0.08 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 4.547149E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.030 | TFLOPs: 11.89 | 7: iteration 59800/ 173500 | consumed samples: 15308800 | consumed tokens: 31352422400 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.547536E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.409 | TFLOPs: 11.89 | 7: iteration 59810/ 173500 | consumed samples: 15311360 | consumed tokens: 31357665280 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.547707E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.652 | TFLOPs: 11.87 | 7: iteration 59820/ 173500 | consumed samples: 15313920 | consumed tokens: 31362908160 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.534060E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.469 | TFLOPs: 11.37 | 7: iteration 59830/ 173500 | consumed samples: 15316480 | consumed tokens: 31368151040 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.539524E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.503 | TFLOPs: 11.89 | 7: iteration 59840/ 173500 | consumed samples: 15319040 | consumed tokens: 31373393920 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.548433E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.164 | TFLOPs: 11.84 | 7: iteration 59850/ 173500 | consumed samples: 15321600 | consumed tokens: 31378636800 | elapsed time per iteration (s): 0.08 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.548616E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.491 | TFLOPs: 11.89 | 7: iteration 59860/ 173500 | consumed samples: 15324160 | consumed tokens: 31383879680 | elapsed time per iteration (s): 0.09 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 4.546239E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2921.377 | TFLOPs: 10.87 | 7: iteration 59870/ 173500 | consumed samples: 15326720 | consumed tokens: 31389122560 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.540110E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.307 | TFLOPs: 11.51 | 7: iteration 59880/ 173500 | consumed samples: 15329280 | consumed tokens: 31394365440 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.537886E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.762 | TFLOPs: 11.92 | 7: iteration 59890/ 173500 | consumed samples: 15331840 | consumed tokens: 31399608320 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.548710E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.753 | TFLOPs: 11.91 | 7: iteration 59900/ 173500 | consumed samples: 15334400 | consumed tokens: 31404851200 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.551667E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.333 | TFLOPs: 11.94 | 7: iteration 59910/ 173500 | consumed samples: 15336960 | consumed tokens: 31410094080 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.547246E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.949 | TFLOPs: 11.92 | 7: iteration 59920/ 173500 | consumed samples: 15339520 | consumed tokens: 31415336960 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.543447E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.877 | TFLOPs: 11.91 | 7: iteration 59930/ 173500 | consumed samples: 15342080 | consumed tokens: 31420579840 | elapsed time per iteration (s): 0.08 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 4.555096E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.714 | TFLOPs: 11.96 | 7: iteration 59940/ 173500 | consumed samples: 15344640 | consumed tokens: 31425822720 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.556441E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.299 | TFLOPs: 11.96 | 7: iteration 59950/ 173500 | consumed samples: 15347200 | consumed tokens: 31431065600 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.549206E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.741 | TFLOPs: 11.94 | 7: iteration 59960/ 173500 | consumed samples: 15349760 | consumed tokens: 31436308480 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.547227E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.288 | TFLOPs: 11.78 | 7: iteration 59970/ 173500 | consumed samples: 15352320 | consumed tokens: 31441551360 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.534484E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.939 | TFLOPs: 11.88 | 7: iteration 59980/ 173500 | consumed samples: 15354880 | consumed tokens: 31446794240 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.559246E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.374 | TFLOPs: 11.61 | 7: iteration 59990/ 173500 | consumed samples: 15357440 | consumed tokens: 31452037120 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.555111E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.945 | TFLOPs: 11.59 | 0: [2023-03-17 01:43:38,997] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=0, lr=[0.00015355285563304073, 0.00015355285563304073, 0.00015355285563304073], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 60000/ 173500 | consumed samples: 15360000 | consumed tokens: 31457280000 | elapsed time per iteration (s): 0.08 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 4.542207E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.772 | TFLOPs: 11.78 | 0: steps: 60000 loss: 4.5849 iter time (s): 0.081 samples/sec: 3164.325 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 60000 | lm loss value: 4.421258E+00 | lm loss PPL: 8.320092E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 60000 to checkpoints_14m91b100m 0: [2023-03-17 01:43:39,055] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step60000 is begin to save! 0: [2023-03-17 01:43:39,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:43:39,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:43:39,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:43:39,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:43:39,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:43:39,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:43:39,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:43:39,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:43:39,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:43:39,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:43:39,097] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:43:39,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:43:39,098] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step60000/mp_rank_00_model_states.pt 0: [2023-03-17 01:43:39,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:43:39,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:43:39,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:43:39,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 7: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2023-03-17 01:43:39,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: successfully saved checkpoint at iteration 60000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.55 7: iteration 60010/ 173500 | consumed samples: 15362560 | consumed tokens: 31462522880 | elapsed time per iteration (s): 0.09 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.550858E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.026 | TFLOPs: 10.24 | 7: iteration 60020/ 173500 | consumed samples: 15365120 | consumed tokens: 31467765760 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.551983E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.244 | TFLOPs: 11.89 | 7: iteration 60030/ 173500 | consumed samples: 15367680 | consumed tokens: 31473008640 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.542085E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.864 | TFLOPs: 11.91 | 7: iteration 60040/ 173500 | consumed samples: 15370240 | consumed tokens: 31478251520 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.556280E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.605 | TFLOPs: 11.92 | 7: iteration 60050/ 173500 | consumed samples: 15372800 | consumed tokens: 31483494400 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.540568E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.482 | TFLOPs: 11.62 | 7: iteration 60060/ 173500 | consumed samples: 15375360 | consumed tokens: 31488737280 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.550615E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.895 | TFLOPs: 11.79 | 7: iteration 60070/ 173500 | consumed samples: 15377920 | consumed tokens: 31493980160 | elapsed time per iteration (s): 0.08 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 4.552821E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.668 | TFLOPs: 11.89 | 7: iteration 60080/ 173500 | consumed samples: 15380480 | consumed tokens: 31499223040 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.554773E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.638 | TFLOPs: 11.91 | 7: iteration 60090/ 173500 | consumed samples: 15383040 | consumed tokens: 31504465920 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.550517E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.176 | TFLOPs: 11.66 | 7: iteration 60100/ 173500 | consumed samples: 15385600 | consumed tokens: 31509708800 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.543210E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.659 | TFLOPs: 11.78 | 7: iteration 60110/ 173500 | consumed samples: 15388160 | consumed tokens: 31514951680 | elapsed time per iteration (s): 0.09 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.549282E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.473 | TFLOPs: 11.03 | 7: iteration 60120/ 173500 | consumed samples: 15390720 | consumed tokens: 31520194560 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.560369E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.108 | TFLOPs: 11.90 | 7: iteration 60130/ 173500 | consumed samples: 15393280 | consumed tokens: 31525437440 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.542679E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.103 | TFLOPs: 11.28 | 7: iteration 60140/ 173500 | consumed samples: 15395840 | consumed tokens: 31530680320 | elapsed time per iteration (s): 0.08 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 4.540097E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.512 | TFLOPs: 11.88 | 7: iteration 60150/ 173500 | consumed samples: 15398400 | consumed tokens: 31535923200 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.550365E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.682 | TFLOPs: 11.78 | 7: iteration 60160/ 173500 | consumed samples: 15400960 | consumed tokens: 31541166080 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.552465E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.100 | TFLOPs: 11.82 | 7: iteration 60170/ 173500 | consumed samples: 15403520 | consumed tokens: 31546408960 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.546914E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.714 | TFLOPs: 11.84 | 7: iteration 60180/ 173500 | consumed samples: 15406080 | consumed tokens: 31551651840 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.549902E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.879 | TFLOPs: 11.80 | 7: iteration 60190/ 173500 | consumed samples: 15408640 | consumed tokens: 31556894720 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.551005E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.441 | TFLOPs: 11.81 | 7: iteration 60200/ 173500 | consumed samples: 15411200 | consumed tokens: 31562137600 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.555141E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.536 | TFLOPs: 11.61 | 7: iteration 60210/ 173500 | consumed samples: 15413760 | consumed tokens: 31567380480 | elapsed time per iteration (s): 0.08 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 4.537974E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.729 | TFLOPs: 11.93 | 7: iteration 60220/ 173500 | consumed samples: 15416320 | consumed tokens: 31572623360 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.541045E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.631 | TFLOPs: 11.92 | 7: iteration 60230/ 173500 | consumed samples: 15418880 | consumed tokens: 31577866240 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.546489E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.465 | TFLOPs: 11.90 | 7: iteration 60240/ 173500 | consumed samples: 15421440 | consumed tokens: 31583109120 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.557816E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.783 | TFLOPs: 11.27 | 7: iteration 60250/ 173500 | consumed samples: 15424000 | consumed tokens: 31588352000 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.545679E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.823 | TFLOPs: 11.91 | 7: iteration 60260/ 173500 | consumed samples: 15426560 | consumed tokens: 31593594880 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.548050E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.748 | TFLOPs: 11.92 | 7: iteration 60270/ 173500 | consumed samples: 15429120 | consumed tokens: 31598837760 | elapsed time per iteration (s): 0.08 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 4.538510E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.678 | TFLOPs: 11.88 | 7: iteration 60280/ 173500 | consumed samples: 15431680 | consumed tokens: 31604080640 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.551670E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.270 | TFLOPs: 11.86 | 7: iteration 60290/ 173500 | consumed samples: 15434240 | consumed tokens: 31609323520 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.542582E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.478 | TFLOPs: 11.90 | 7: iteration 60300/ 173500 | consumed samples: 15436800 | consumed tokens: 31614566400 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.547577E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.636 | TFLOPs: 11.93 | 7: iteration 60310/ 173500 | consumed samples: 15439360 | consumed tokens: 31619809280 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.555793E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.386 | TFLOPs: 11.87 | 7: iteration 60320/ 173500 | consumed samples: 15441920 | consumed tokens: 31625052160 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.552735E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.552 | TFLOPs: 11.90 | 7: iteration 60330/ 173500 | consumed samples: 15444480 | consumed tokens: 31630295040 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.544647E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.161 | TFLOPs: 11.89 | 7: iteration 60340/ 173500 | consumed samples: 15447040 | consumed tokens: 31635537920 | elapsed time per iteration (s): 0.08 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 4.548507E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.516 | TFLOPs: 11.83 | 7: iteration 60350/ 173500 | consumed samples: 15449600 | consumed tokens: 31640780800 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.551871E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.041 | TFLOPs: 11.92 | 7: iteration 60360/ 173500 | consumed samples: 15452160 | consumed tokens: 31646023680 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.559276E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.044 | TFLOPs: 11.88 | 7: iteration 60370/ 173500 | consumed samples: 15454720 | consumed tokens: 31651266560 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.557309E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.983 | TFLOPs: 11.95 | 7: iteration 60380/ 173500 | consumed samples: 15457280 | consumed tokens: 31656509440 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.559174E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.335 | TFLOPs: 11.92 | 7: iteration 60390/ 173500 | consumed samples: 15459840 | consumed tokens: 31661752320 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.550418E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.036 | TFLOPs: 11.91 | 7: iteration 60400/ 173500 | consumed samples: 15462400 | consumed tokens: 31666995200 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.543262E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.699 | TFLOPs: 11.80 | 7: iteration 60410/ 173500 | consumed samples: 15464960 | consumed tokens: 31672238080 | elapsed time per iteration (s): 0.08 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 4.545754E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.250 | TFLOPs: 11.94 | 7: iteration 60420/ 173500 | consumed samples: 15467520 | consumed tokens: 31677480960 | elapsed time per iteration (s): 0.08 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.549771E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.602 | TFLOPs: 11.99 | 7: iteration 60430/ 173500 | consumed samples: 15470080 | consumed tokens: 31682723840 | elapsed time per iteration (s): 0.10 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.558226E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.932 | TFLOPs: 9.42 | 7: iteration 60440/ 173500 | consumed samples: 15472640 | consumed tokens: 31687966720 | elapsed time per iteration (s): 0.08 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.536951E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.012 | TFLOPs: 11.94 | 7: iteration 60450/ 173500 | consumed samples: 15475200 | consumed tokens: 31693209600 | elapsed time per iteration (s): 0.08 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.557717E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.939 | TFLOPs: 11.91 | 7: iteration 60460/ 173500 | consumed samples: 15477760 | consumed tokens: 31698452480 | elapsed time per iteration (s): 0.08 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.553945E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.736 | TFLOPs: 11.92 | 7: iteration 60470/ 173500 | consumed samples: 15480320 | consumed tokens: 31703695360 | elapsed time per iteration (s): 0.08 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.547368E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.313 | TFLOPs: 11.89 | 7: iteration 60480/ 173500 | consumed samples: 15482880 | consumed tokens: 31708938240 | elapsed time per iteration (s): 0.09 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 4.544834E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.285 | TFLOPs: 10.94 | 7: iteration 60490/ 173500 | consumed samples: 15485440 | consumed tokens: 31714181120 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.543335E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.044 | TFLOPs: 11.89 | 7: iteration 60500/ 173500 | consumed samples: 15488000 | consumed tokens: 31719424000 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.551672E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.247 | TFLOPs: 11.88 | 7: iteration 60510/ 173500 | consumed samples: 15490560 | consumed tokens: 31724666880 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.551294E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.838 | TFLOPs: 11.90 | 7: iteration 60520/ 173500 | consumed samples: 15493120 | consumed tokens: 31729909760 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.534776E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.849 | TFLOPs: 11.94 | 7: iteration 60530/ 173500 | consumed samples: 15495680 | consumed tokens: 31735152640 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.546450E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.972 | TFLOPs: 11.92 | 7: iteration 60540/ 173500 | consumed samples: 15498240 | consumed tokens: 31740395520 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.545879E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.763 | TFLOPs: 11.88 | 7: iteration 60550/ 173500 | consumed samples: 15500800 | consumed tokens: 31745638400 | elapsed time per iteration (s): 0.08 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.561199E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.358 | TFLOPs: 11.93 | 7: iteration 60560/ 173500 | consumed samples: 15503360 | consumed tokens: 31750881280 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.543072E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.719 | TFLOPs: 11.92 | 7: iteration 60570/ 173500 | consumed samples: 15505920 | consumed tokens: 31756124160 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.545249E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.622 | TFLOPs: 11.93 | 7: iteration 60580/ 173500 | consumed samples: 15508480 | consumed tokens: 31761367040 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.541248E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.202 | TFLOPs: 11.92 | 7: iteration 60590/ 173500 | consumed samples: 15511040 | consumed tokens: 31766609920 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.538005E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.847 | TFLOPs: 11.91 | 7: iteration 60600/ 173500 | consumed samples: 15513600 | consumed tokens: 31771852800 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.537581E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.347 | TFLOPs: 11.81 | 7: iteration 60610/ 173500 | consumed samples: 15516160 | consumed tokens: 31777095680 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.559476E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.874 | TFLOPs: 11.85 | 7: iteration 60620/ 173500 | consumed samples: 15518720 | consumed tokens: 31782338560 | elapsed time per iteration (s): 0.08 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 4.543796E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.700 | TFLOPs: 11.87 | 7: iteration 60630/ 173500 | consumed samples: 15521280 | consumed tokens: 31787581440 | elapsed time per iteration (s): 0.08 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.551493E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.405 | TFLOPs: 11.82 | 7: iteration 60640/ 173500 | consumed samples: 15523840 | consumed tokens: 31792824320 | elapsed time per iteration (s): 0.08 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.534560E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.937 | TFLOPs: 11.84 | 7: iteration 60650/ 173500 | consumed samples: 15526400 | consumed tokens: 31798067200 | elapsed time per iteration (s): 0.08 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.552257E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.414 | TFLOPs: 11.84 | 7: iteration 60660/ 173500 | consumed samples: 15528960 | consumed tokens: 31803310080 | elapsed time per iteration (s): 0.08 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.556203E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.144 | TFLOPs: 11.87 | 7: iteration 60670/ 173500 | consumed samples: 15531520 | consumed tokens: 31808552960 | elapsed time per iteration (s): 0.09 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.544468E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2763.664 | TFLOPs: 10.28 | 7: iteration 60680/ 173500 | consumed samples: 15534080 | consumed tokens: 31813795840 | elapsed time per iteration (s): 0.11 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.553140E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2375.881 | TFLOPs: 8.84 | 7: iteration 60690/ 173500 | consumed samples: 15536640 | consumed tokens: 31819038720 | elapsed time per iteration (s): 0.08 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 4.540882E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.120 | TFLOPs: 11.84 | 7: iteration 60700/ 173500 | consumed samples: 15539200 | consumed tokens: 31824281600 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.549323E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.519 | TFLOPs: 11.90 | 7: iteration 60710/ 173500 | consumed samples: 15541760 | consumed tokens: 31829524480 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.550164E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.022 | TFLOPs: 11.86 | 7: iteration 60720/ 173500 | consumed samples: 15544320 | consumed tokens: 31834767360 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.546111E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.081 | TFLOPs: 11.90 | 7: iteration 60730/ 173500 | consumed samples: 15546880 | consumed tokens: 31840010240 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.543444E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.608 | TFLOPs: 11.79 | 7: iteration 60740/ 173500 | consumed samples: 15549440 | consumed tokens: 31845253120 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.546139E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.372 | TFLOPs: 11.98 | 7: iteration 60750/ 173500 | consumed samples: 15552000 | consumed tokens: 31850496000 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.531576E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.073 | TFLOPs: 11.87 | 7: iteration 60760/ 173500 | consumed samples: 15554560 | consumed tokens: 31855738880 | elapsed time per iteration (s): 0.08 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 4.549286E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.523 | TFLOPs: 11.88 | 7: iteration 60770/ 173500 | consumed samples: 15557120 | consumed tokens: 31860981760 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.536900E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.920 | TFLOPs: 11.95 | 7: iteration 60780/ 173500 | consumed samples: 15559680 | consumed tokens: 31866224640 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.549597E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.203 | TFLOPs: 11.94 | 7: iteration 60790/ 173500 | consumed samples: 15562240 | consumed tokens: 31871467520 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.543403E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.065 | TFLOPs: 11.95 | 7: iteration 60800/ 173500 | consumed samples: 15564800 | consumed tokens: 31876710400 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.557788E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.593 | TFLOPs: 11.82 | 7: iteration 60810/ 173500 | consumed samples: 15567360 | consumed tokens: 31881953280 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.553239E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.790 | TFLOPs: 11.96 | 7: iteration 60820/ 173500 | consumed samples: 15569920 | consumed tokens: 31887196160 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.545155E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.162 | TFLOPs: 11.88 | 7: iteration 60830/ 173500 | consumed samples: 15572480 | consumed tokens: 31892439040 | elapsed time per iteration (s): 0.08 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 4.552593E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.162 | TFLOPs: 11.93 | 7: iteration 60840/ 173500 | consumed samples: 15575040 | consumed tokens: 31897681920 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.550570E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.004 | TFLOPs: 11.95 | 7: iteration 60850/ 173500 | consumed samples: 15577600 | consumed tokens: 31902924800 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.553386E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.822 | TFLOPs: 11.96 | 7: iteration 60860/ 173500 | consumed samples: 15580160 | consumed tokens: 31908167680 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.541623E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.425 | TFLOPs: 11.96 | 7: iteration 60870/ 173500 | consumed samples: 15582720 | consumed tokens: 31913410560 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.546648E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.679 | TFLOPs: 11.96 | 7: iteration 60880/ 173500 | consumed samples: 15585280 | consumed tokens: 31918653440 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.547996E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.652 | TFLOPs: 11.64 | 7: iteration 60890/ 173500 | consumed samples: 15587840 | consumed tokens: 31923896320 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.564088E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.230 | TFLOPs: 11.90 | 7: iteration 60900/ 173500 | consumed samples: 15590400 | consumed tokens: 31929139200 | elapsed time per iteration (s): 0.08 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 4.538112E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.465 | TFLOPs: 11.93 | 7: iteration 60910/ 173500 | consumed samples: 15592960 | consumed tokens: 31934382080 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.536022E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.846 | TFLOPs: 11.92 | 7: iteration 60920/ 173500 | consumed samples: 15595520 | consumed tokens: 31939624960 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.549062E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.907 | TFLOPs: 11.85 | 7: iteration 60930/ 173500 | consumed samples: 15598080 | consumed tokens: 31944867840 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.553540E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.108 | TFLOPs: 11.91 | 7: iteration 60940/ 173500 | consumed samples: 15600640 | consumed tokens: 31950110720 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.572652E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.003 | TFLOPs: 11.91 | 7: iteration 60950/ 173500 | consumed samples: 15603200 | consumed tokens: 31955353600 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.539619E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.998 | TFLOPs: 11.94 | 7: iteration 60960/ 173500 | consumed samples: 15605760 | consumed tokens: 31960596480 | elapsed time per iteration (s): 0.08 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 4.555524E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.360 | TFLOPs: 11.96 | 7: iteration 60970/ 173500 | consumed samples: 15608320 | consumed tokens: 31965839360 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.541846E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.760 | TFLOPs: 11.96 | 7: iteration 60980/ 173500 | consumed samples: 15610880 | consumed tokens: 31971082240 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.543579E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.258 | TFLOPs: 11.96 | 7: iteration 60990/ 173500 | consumed samples: 15613440 | consumed tokens: 31976325120 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.555651E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.445 | TFLOPs: 11.97 | 7: iteration 61000/ 173500 | consumed samples: 15616000 | consumed tokens: 31981568000 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.543299E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.133 | TFLOPs: 11.92 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 61000 | lm loss value: 4.419524E+00 | lm loss PPL: 8.305676E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 61000 to checkpoints_14m91b100m 0: [2023-03-17 01:45:00,116] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step61000 is begin to save! 0: [2023-03-17 01:45:00,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:45:00,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:45:00,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:45:00,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:45:00,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:45:00,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:45:00,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:45:00,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:45:00,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:45:00,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:45:00,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:45:00,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:45:00,159] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step61000/mp_rank_00_model_states.pt 0: [2023-03-17 01:45:00,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:45:00,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:45:00,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2023-03-17 01:45:00,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 01:45:00,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2023-03-17 01:45:00,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: successfully saved checkpoint at iteration 61000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.56 7: iteration 61010/ 173500 | consumed samples: 15618560 | consumed tokens: 31986810880 | elapsed time per iteration (s): 0.09 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.550996E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.926 | TFLOPs: 10.48 | 7: iteration 61020/ 173500 | consumed samples: 15621120 | consumed tokens: 31992053760 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.547683E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.363 | TFLOPs: 11.97 | 7: iteration 61030/ 173500 | consumed samples: 15623680 | consumed tokens: 31997296640 | elapsed time per iteration (s): 0.08 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 4.542727E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.382 | TFLOPs: 11.99 | 7: iteration 61040/ 173500 | consumed samples: 15626240 | consumed tokens: 32002539520 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.536069E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.490 | TFLOPs: 11.97 | 7: iteration 61050/ 173500 | consumed samples: 15628800 | consumed tokens: 32007782400 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.553822E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.732 | TFLOPs: 11.65 | 7: iteration 61060/ 173500 | consumed samples: 15631360 | consumed tokens: 32013025280 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.539222E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.037 | TFLOPs: 11.97 | 7: iteration 61070/ 173500 | consumed samples: 15633920 | consumed tokens: 32018268160 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.542757E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.953 | TFLOPs: 11.98 | 7: iteration 61080/ 173500 | consumed samples: 15636480 | consumed tokens: 32023511040 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.550920E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.506 | TFLOPs: 11.99 | 7: iteration 61090/ 173500 | consumed samples: 15639040 | consumed tokens: 32028753920 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.547814E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.877 | TFLOPs: 11.42 | 7: iteration 61100/ 173500 | consumed samples: 15641600 | consumed tokens: 32033996800 | elapsed time per iteration (s): 0.08 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 4.550203E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.632 | TFLOPs: 11.94 | 7: iteration 61110/ 173500 | consumed samples: 15644160 | consumed tokens: 32039239680 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.531855E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.814 | TFLOPs: 11.61 | 7: iteration 61120/ 173500 | consumed samples: 15646720 | consumed tokens: 32044482560 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.535835E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.462 | TFLOPs: 11.90 | 7: iteration 61130/ 173500 | consumed samples: 15649280 | consumed tokens: 32049725440 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.553275E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.889 | TFLOPs: 11.90 | 7: iteration 61140/ 173500 | consumed samples: 15651840 | consumed tokens: 32054968320 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.535947E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.069 | TFLOPs: 11.87 | 7: iteration 61150/ 173500 | consumed samples: 15654400 | consumed tokens: 32060211200 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.552287E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.760 | TFLOPs: 11.88 | 7: iteration 61160/ 173500 | consumed samples: 15656960 | consumed tokens: 32065454080 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.542554E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.722 | TFLOPs: 11.79 | 7: iteration 61170/ 173500 | consumed samples: 15659520 | consumed tokens: 32070696960 | elapsed time per iteration (s): 0.08 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 4.560167E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.596 | TFLOPs: 11.89 | 7: iteration 61180/ 173500 | consumed samples: 15662080 | consumed tokens: 32075939840 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.540796E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.816 | TFLOPs: 11.85 | 7: iteration 61190/ 173500 | consumed samples: 15664640 | consumed tokens: 32081182720 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.539384E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.377 | TFLOPs: 11.90 | 7: iteration 61200/ 173500 | consumed samples: 15667200 | consumed tokens: 32086425600 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.553765E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.121 | TFLOPs: 11.93 | 7: iteration 61210/ 173500 | consumed samples: 15669760 | consumed tokens: 32091668480 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.558253E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.681 | TFLOPs: 11.93 | 7: iteration 61220/ 173500 | consumed samples: 15672320 | consumed tokens: 32096911360 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.543040E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.528 | TFLOPs: 11.91 | 7: iteration 61230/ 173500 | consumed samples: 15674880 | consumed tokens: 32102154240 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.540547E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.180 | TFLOPs: 11.94 | 7: iteration 61240/ 173500 | consumed samples: 15677440 | consumed tokens: 32107397120 | elapsed time per iteration (s): 0.08 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 4.539932E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.877 | TFLOPs: 11.94 | 7: iteration 61250/ 173500 | consumed samples: 15680000 | consumed tokens: 32112640000 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.530399E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.394 | TFLOPs: 11.87 | 7: iteration 61260/ 173500 | consumed samples: 15682560 | consumed tokens: 32117882880 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.546988E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.393 | TFLOPs: 11.88 | 7: iteration 61270/ 173500 | consumed samples: 15685120 | consumed tokens: 32123125760 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.553754E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.040 | TFLOPs: 11.88 | 7: iteration 61280/ 173500 | consumed samples: 15687680 | consumed tokens: 32128368640 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.536986E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.143 | TFLOPs: 11.87 | 7: iteration 61290/ 173500 | consumed samples: 15690240 | consumed tokens: 32133611520 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.549685E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.585 | TFLOPs: 11.88 | 7: iteration 61300/ 173500 | consumed samples: 15692800 | consumed tokens: 32138854400 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.550010E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.882 | TFLOPs: 11.84 | 7: iteration 61310/ 173500 | consumed samples: 15695360 | consumed tokens: 32144097280 | elapsed time per iteration (s): 0.08 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 4.545915E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.019 | TFLOPs: 11.58 | 7: iteration 61320/ 173500 | consumed samples: 15697920 | consumed tokens: 32149340160 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.550031E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.672 | TFLOPs: 11.87 | 7: iteration 61330/ 173500 | consumed samples: 15700480 | consumed tokens: 32154583040 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.553332E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.131 | TFLOPs: 11.84 | 7: iteration 61340/ 173500 | consumed samples: 15703040 | consumed tokens: 32159825920 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.551939E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.976 | TFLOPs: 11.90 | 7: iteration 61350/ 173500 | consumed samples: 15705600 | consumed tokens: 32165068800 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.541032E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.175 | TFLOPs: 11.87 | 7: iteration 61360/ 173500 | consumed samples: 15708160 | consumed tokens: 32170311680 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.539769E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.446 | TFLOPs: 11.67 | 7: iteration 61370/ 173500 | consumed samples: 15710720 | consumed tokens: 32175554560 | elapsed time per iteration (s): 0.08 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.544063E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.822 | TFLOPs: 11.81 | 7: iteration 61380/ 173500 | consumed samples: 15713280 | consumed tokens: 32180797440 | elapsed time per iteration (s): 0.09 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 4.561445E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2983.156 | TFLOPs: 11.10 | 7: iteration 61390/ 173500 | consumed samples: 15715840 | consumed tokens: 32186040320 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.540894E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.839 | TFLOPs: 11.44 | 7: iteration 61400/ 173500 | consumed samples: 15718400 | consumed tokens: 32191283200 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.534976E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.177 | TFLOPs: 11.86 | 7: iteration 61410/ 173500 | consumed samples: 15720960 | consumed tokens: 32196526080 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.544133E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.412 | TFLOPs: 11.84 | 7: iteration 61420/ 173500 | consumed samples: 15723520 | consumed tokens: 32201768960 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.540695E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.544 | TFLOPs: 11.86 | 7: iteration 61430/ 173500 | consumed samples: 15726080 | consumed tokens: 32207011840 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.547963E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.569 | TFLOPs: 11.86 | 7: iteration 61440/ 173500 | consumed samples: 15728640 | consumed tokens: 32212254720 | elapsed time per iteration (s): 0.08 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 4.545942E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.957 | TFLOPs: 11.83 | 7: iteration 61450/ 173500 | consumed samples: 15731200 | consumed tokens: 32217497600 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.539259E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.570 | TFLOPs: 11.87 | 7: iteration 61460/ 173500 | consumed samples: 15733760 | consumed tokens: 32222740480 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.543728E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.684 | TFLOPs: 11.86 | 7: iteration 61470/ 173500 | consumed samples: 15736320 | consumed tokens: 32227983360 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.543491E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.655 | TFLOPs: 11.60 | 7: iteration 61480/ 173500 | consumed samples: 15738880 | consumed tokens: 32233226240 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.532028E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.288 | TFLOPs: 11.87 | 7: iteration 61490/ 173500 | consumed samples: 15741440 | consumed tokens: 32238469120 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.545737E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.745 | TFLOPs: 11.59 | 7: iteration 61500/ 173500 | consumed samples: 15744000 | consumed tokens: 32243712000 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.556417E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.124 | TFLOPs: 11.85 | 7: iteration 61510/ 173500 | consumed samples: 15746560 | consumed tokens: 32248954880 | elapsed time per iteration (s): 0.08 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 4.544028E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.858 | TFLOPs: 11.84 | 7: iteration 61520/ 173500 | consumed samples: 15749120 | consumed tokens: 32254197760 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.548276E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.683 | TFLOPs: 11.77 | 7: iteration 61530/ 173500 | consumed samples: 15751680 | consumed tokens: 32259440640 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.536823E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.245 | TFLOPs: 11.84 | 7: iteration 61540/ 173500 | consumed samples: 15754240 | consumed tokens: 32264683520 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.535806E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.089 | TFLOPs: 11.85 | 7: iteration 61550/ 173500 | consumed samples: 15756800 | consumed tokens: 32269926400 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.552448E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.793 | TFLOPs: 11.81 | 7: iteration 61560/ 173500 | consumed samples: 15759360 | consumed tokens: 32275169280 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.550482E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.452 | TFLOPs: 11.82 | 7: iteration 61570/ 173500 | consumed samples: 15761920 | consumed tokens: 32280412160 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.543957E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.728 | TFLOPs: 11.88 | 7: iteration 61580/ 173500 | consumed samples: 15764480 | consumed tokens: 32285655040 | elapsed time per iteration (s): 0.08 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 4.541143E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.572 | TFLOPs: 11.83 | 7: iteration 61590/ 173500 | consumed samples: 15767040 | consumed tokens: 32290897920 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.554729E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.254 | TFLOPs: 11.78 | 7: iteration 61600/ 173500 | consumed samples: 15769600 | consumed tokens: 32296140800 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.541861E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.842 | TFLOPs: 11.95 | 7: iteration 61610/ 173500 | consumed samples: 15772160 | consumed tokens: 32301383680 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.556517E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.287 | TFLOPs: 12.01 | 7: iteration 61620/ 173500 | consumed samples: 15774720 | consumed tokens: 32306626560 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.543932E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.348 | TFLOPs: 11.97 | 7: iteration 61630/ 173500 | consumed samples: 15777280 | consumed tokens: 32311869440 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.540604E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.604 | TFLOPs: 12.04 | 7: iteration 61640/ 173500 | consumed samples: 15779840 | consumed tokens: 32317112320 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.557373E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.610 | TFLOPs: 11.74 | 7: iteration 61650/ 173500 | consumed samples: 15782400 | consumed tokens: 32322355200 | elapsed time per iteration (s): 0.08 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 4.544558E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.467 | TFLOPs: 12.01 | 7: iteration 61660/ 173500 | consumed samples: 15784960 | consumed tokens: 32327598080 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.534510E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.590 | TFLOPs: 11.82 | 7: iteration 61670/ 173500 | consumed samples: 15787520 | consumed tokens: 32332840960 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.538770E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.384 | TFLOPs: 11.60 | 7: iteration 61680/ 173500 | consumed samples: 15790080 | consumed tokens: 32338083840 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.540315E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.998 | TFLOPs: 11.73 | 7: iteration 61690/ 173500 | consumed samples: 15792640 | consumed tokens: 32343326720 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.536892E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.447 | TFLOPs: 11.91 | 7: iteration 61700/ 173500 | consumed samples: 15795200 | consumed tokens: 32348569600 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.541439E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.591 | TFLOPs: 11.87 | 7: iteration 61710/ 173500 | consumed samples: 15797760 | consumed tokens: 32353812480 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.542943E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.280 | TFLOPs: 11.87 | 7: iteration 61720/ 173500 | consumed samples: 15800320 | consumed tokens: 32359055360 | elapsed time per iteration (s): 0.08 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 4.544632E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.003 | TFLOPs: 11.90 | 7: iteration 61730/ 173500 | consumed samples: 15802880 | consumed tokens: 32364298240 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.551700E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.209 | TFLOPs: 11.87 | 7: iteration 61740/ 173500 | consumed samples: 15805440 | consumed tokens: 32369541120 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.538305E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.821 | TFLOPs: 11.92 | 7: iteration 61750/ 173500 | consumed samples: 15808000 | consumed tokens: 32374784000 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.550907E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.828 | TFLOPs: 11.88 | 7: iteration 61760/ 173500 | consumed samples: 15810560 | consumed tokens: 32380026880 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.551497E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.666 | TFLOPs: 11.91 | 7: iteration 61770/ 173500 | consumed samples: 15813120 | consumed tokens: 32385269760 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.551173E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.188 | TFLOPs: 11.89 | 7: iteration 61780/ 173500 | consumed samples: 15815680 | consumed tokens: 32390512640 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.550549E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.039 | TFLOPs: 11.90 | 7: iteration 61790/ 173500 | consumed samples: 15818240 | consumed tokens: 32395755520 | elapsed time per iteration (s): 0.08 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.539881E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.258 | TFLOPs: 11.85 | 7: iteration 61800/ 173500 | consumed samples: 15820800 | consumed tokens: 32400998400 | elapsed time per iteration (s): 0.08 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.551544E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.250 | TFLOPs: 11.90 | 7: iteration 61810/ 173500 | consumed samples: 15823360 | consumed tokens: 32406241280 | elapsed time per iteration (s): 0.10 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.553111E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2613.081 | TFLOPs: 9.72 | 7: iteration 61820/ 173500 | consumed samples: 15825920 | consumed tokens: 32411484160 | elapsed time per iteration (s): 0.08 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.531961E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.343 | TFLOPs: 11.47 | 7: iteration 61830/ 173500 | consumed samples: 15828480 | consumed tokens: 32416727040 | elapsed time per iteration (s): 0.08 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.547184E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.594 | TFLOPs: 11.58 | 7: iteration 61840/ 173500 | consumed samples: 15831040 | consumed tokens: 32421969920 | elapsed time per iteration (s): 0.08 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.550694E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.999 | TFLOPs: 11.85 | 7: iteration 61850/ 173500 | consumed samples: 15833600 | consumed tokens: 32427212800 | elapsed time per iteration (s): 0.08 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 4.542501E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.124 | TFLOPs: 11.59 | 7: iteration 61860/ 173500 | consumed samples: 15836160 | consumed tokens: 32432455680 | elapsed time per iteration (s): 0.08 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.545165E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.552 | TFLOPs: 11.83 | 7: iteration 61870/ 173500 | consumed samples: 15838720 | consumed tokens: 32437698560 | elapsed time per iteration (s): 0.09 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.537211E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.817 | TFLOPs: 11.01 | 7: iteration 61880/ 173500 | consumed samples: 15841280 | consumed tokens: 32442941440 | elapsed time per iteration (s): 0.09 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.545361E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.618 | TFLOPs: 10.60 | 7: iteration 61890/ 173500 | consumed samples: 15843840 | consumed tokens: 32448184320 | elapsed time per iteration (s): 0.08 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.538355E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.445 | TFLOPs: 11.86 | 7: iteration 61900/ 173500 | consumed samples: 15846400 | consumed tokens: 32453427200 | elapsed time per iteration (s): 0.08 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.557750E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.448 | TFLOPs: 11.85 | 7: iteration 61910/ 173500 | consumed samples: 15848960 | consumed tokens: 32458670080 | elapsed time per iteration (s): 0.09 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.544527E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.806 | TFLOPs: 11.17 | 7: iteration 61920/ 173500 | consumed samples: 15851520 | consumed tokens: 32463912960 | elapsed time per iteration (s): 0.09 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 4.545420E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2718.463 | TFLOPs: 10.11 | 7: iteration 61930/ 173500 | consumed samples: 15854080 | consumed tokens: 32469155840 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.546253E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.366 | TFLOPs: 11.49 | 7: iteration 61940/ 173500 | consumed samples: 15856640 | consumed tokens: 32474398720 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.538136E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.329 | TFLOPs: 11.81 | 7: iteration 61950/ 173500 | consumed samples: 15859200 | consumed tokens: 32479641600 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.541966E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.245 | TFLOPs: 11.78 | 7: iteration 61960/ 173500 | consumed samples: 15861760 | consumed tokens: 32484884480 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.555510E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.391 | TFLOPs: 11.76 | 7: iteration 61970/ 173500 | consumed samples: 15864320 | consumed tokens: 32490127360 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.535417E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.263 | TFLOPs: 11.56 | 7: iteration 61980/ 173500 | consumed samples: 15866880 | consumed tokens: 32495370240 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.540144E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.376 | TFLOPs: 11.93 | 7: iteration 61990/ 173500 | consumed samples: 15869440 | consumed tokens: 32500613120 | elapsed time per iteration (s): 0.08 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 4.545705E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.429 | TFLOPs: 12.00 | 0: [2023-03-17 01:46:21,224] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=0, lr=[0.00015064331838981058, 0.00015064331838981058, 0.00015064331838981058], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 62000/ 173500 | consumed samples: 15872000 | consumed tokens: 32505856000 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.532975E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.836 | TFLOPs: 11.99 | 0: steps: 62000 loss: 4.5424 iter time (s): 0.080 samples/sec: 3189.080 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 62000 | lm loss value: 4.434561E+00 | lm loss PPL: 8.431508E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 62000 to checkpoints_14m91b100m 0: [2023-03-17 01:46:21,281] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step62000 is begin to save! 0: [2023-03-17 01:46:21,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:46:21,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:46:21,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:46:21,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:46:21,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:46:21,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:46:21,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:46:21,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:46:21,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:46:21,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:46:21,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:46:21,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:46:21,324] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step62000/mp_rank_00_model_states.pt 0: [2023-03-17 01:46:21,324] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:46:21,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:46:21,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:46:21,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,347] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,347] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 01:46:21,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:46:21,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2023-03-17 01:46:21,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2023-03-17 01:46:21,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:46:21,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: successfully saved checkpoint at iteration 62000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.15 7: iteration 62010/ 173500 | consumed samples: 15874560 | consumed tokens: 32511098880 | elapsed time per iteration (s): 0.09 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.546931E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.888 | TFLOPs: 10.20 | 7: iteration 62020/ 173500 | consumed samples: 15877120 | consumed tokens: 32516341760 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.555228E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.434 | TFLOPs: 11.86 | 7: iteration 62030/ 173500 | consumed samples: 15879680 | consumed tokens: 32521584640 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.534028E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.529 | TFLOPs: 11.90 | 7: iteration 62040/ 173500 | consumed samples: 15882240 | consumed tokens: 32526827520 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.544868E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.137 | TFLOPs: 11.92 | 7: iteration 62050/ 173500 | consumed samples: 15884800 | consumed tokens: 32532070400 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.539903E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.593 | TFLOPs: 11.86 | 7: iteration 62060/ 173500 | consumed samples: 15887360 | consumed tokens: 32537313280 | elapsed time per iteration (s): 0.08 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 4.544794E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.745 | TFLOPs: 11.60 | 7: iteration 62070/ 173500 | consumed samples: 15889920 | consumed tokens: 32542556160 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.547138E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.237 | TFLOPs: 11.43 | 7: iteration 62080/ 173500 | consumed samples: 15892480 | consumed tokens: 32547799040 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.543741E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.333 | TFLOPs: 11.34 | 7: iteration 62090/ 173500 | consumed samples: 15895040 | consumed tokens: 32553041920 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.534017E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.649 | TFLOPs: 11.88 | 7: iteration 62100/ 173500 | consumed samples: 15897600 | consumed tokens: 32558284800 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.538987E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.881 | TFLOPs: 11.92 | 7: iteration 62110/ 173500 | consumed samples: 15900160 | consumed tokens: 32563527680 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.539317E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.532 | TFLOPs: 11.92 | 7: iteration 62120/ 173500 | consumed samples: 15902720 | consumed tokens: 32568770560 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.550991E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.387 | TFLOPs: 11.80 | 7: iteration 62130/ 173500 | consumed samples: 15905280 | consumed tokens: 32574013440 | elapsed time per iteration (s): 0.08 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 4.549965E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.094 | TFLOPs: 11.92 | 7: iteration 62140/ 173500 | consumed samples: 15907840 | consumed tokens: 32579256320 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.543063E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.028 | TFLOPs: 11.90 | 7: iteration 62150/ 173500 | consumed samples: 15910400 | consumed tokens: 32584499200 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.553321E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.767 | TFLOPs: 11.90 | 7: iteration 62160/ 173500 | consumed samples: 15912960 | consumed tokens: 32589742080 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.548967E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.982 | TFLOPs: 11.56 | 7: iteration 62170/ 173500 | consumed samples: 15915520 | consumed tokens: 32594984960 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.551807E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.501 | TFLOPs: 11.97 | 7: iteration 62180/ 173500 | consumed samples: 15918080 | consumed tokens: 32600227840 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.533566E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.844 | TFLOPs: 11.71 | 7: iteration 62190/ 173500 | consumed samples: 15920640 | consumed tokens: 32605470720 | elapsed time per iteration (s): 0.08 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 4.547449E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.118 | TFLOPs: 11.71 | 7: iteration 62200/ 173500 | consumed samples: 15923200 | consumed tokens: 32610713600 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.546223E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.390 | TFLOPs: 11.46 | 7: iteration 62210/ 173500 | consumed samples: 15925760 | consumed tokens: 32615956480 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.543851E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.574 | TFLOPs: 11.72 | 7: iteration 62220/ 173500 | consumed samples: 15928320 | consumed tokens: 32621199360 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.542208E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.229 | TFLOPs: 12.01 | 7: iteration 62230/ 173500 | consumed samples: 15930880 | consumed tokens: 32626442240 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.551383E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.404 | TFLOPs: 11.95 | 7: iteration 62240/ 173500 | consumed samples: 15933440 | consumed tokens: 32631685120 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.546221E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.563 | TFLOPs: 11.93 | 7: iteration 62250/ 173500 | consumed samples: 15936000 | consumed tokens: 32636928000 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.546876E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.932 | TFLOPs: 11.71 | 7: iteration 62260/ 173500 | consumed samples: 15938560 | consumed tokens: 32642170880 | elapsed time per iteration (s): 0.08 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 4.554584E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.683 | TFLOPs: 11.92 | 7: iteration 62270/ 173500 | consumed samples: 15941120 | consumed tokens: 32647413760 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.545200E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.513 | TFLOPs: 11.99 | 7: iteration 62280/ 173500 | consumed samples: 15943680 | consumed tokens: 32652656640 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.543197E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.698 | TFLOPs: 11.96 | 7: iteration 62290/ 173500 | consumed samples: 15946240 | consumed tokens: 32657899520 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.533109E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.233 | TFLOPs: 11.99 | 7: iteration 62300/ 173500 | consumed samples: 15948800 | consumed tokens: 32663142400 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.537845E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.413 | TFLOPs: 11.81 | 7: iteration 62310/ 173500 | consumed samples: 15951360 | consumed tokens: 32668385280 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.528228E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.679 | TFLOPs: 11.97 | 7: iteration 62320/ 173500 | consumed samples: 15953920 | consumed tokens: 32673628160 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.541323E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.462 | TFLOPs: 11.89 | 7: iteration 62330/ 173500 | consumed samples: 15956480 | consumed tokens: 32678871040 | elapsed time per iteration (s): 0.08 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 4.533758E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.886 | TFLOPs: 11.47 | 7: iteration 62340/ 173500 | consumed samples: 15959040 | consumed tokens: 32684113920 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.549836E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.428 | TFLOPs: 11.96 | 7: iteration 62350/ 173500 | consumed samples: 15961600 | consumed tokens: 32689356800 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.538464E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.915 | TFLOPs: 11.99 | 7: iteration 62360/ 173500 | consumed samples: 15964160 | consumed tokens: 32694599680 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.539392E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.648 | TFLOPs: 11.86 | 7: iteration 62370/ 173500 | consumed samples: 15966720 | consumed tokens: 32699842560 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.543802E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.822 | TFLOPs: 11.97 | 7: iteration 62380/ 173500 | consumed samples: 15969280 | consumed tokens: 32705085440 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.549915E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.912 | TFLOPs: 11.97 | 7: iteration 62390/ 173500 | consumed samples: 15971840 | consumed tokens: 32710328320 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.546123E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.140 | TFLOPs: 11.98 | 7: iteration 62400/ 173500 | consumed samples: 15974400 | consumed tokens: 32715571200 | elapsed time per iteration (s): 0.08 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 4.540076E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.304 | TFLOPs: 11.81 | 7: iteration 62410/ 173500 | consumed samples: 15976960 | consumed tokens: 32720814080 | elapsed time per iteration (s): 0.08 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.542823E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.102 | TFLOPs: 11.97 | 7: iteration 62420/ 173500 | consumed samples: 15979520 | consumed tokens: 32726056960 | elapsed time per iteration (s): 0.08 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.542686E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.398 | TFLOPs: 11.98 | 7: iteration 62430/ 173500 | consumed samples: 15982080 | consumed tokens: 32731299840 | elapsed time per iteration (s): 0.09 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.551191E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2869.625 | TFLOPs: 10.67 | 7: iteration 62440/ 173500 | consumed samples: 15984640 | consumed tokens: 32736542720 | elapsed time per iteration (s): 0.08 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.546798E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.362 | TFLOPs: 11.96 | 7: iteration 62450/ 173500 | consumed samples: 15987200 | consumed tokens: 32741785600 | elapsed time per iteration (s): 0.08 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.532277E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.405 | TFLOPs: 11.49 | 7: iteration 62460/ 173500 | consumed samples: 15989760 | consumed tokens: 32747028480 | elapsed time per iteration (s): 0.08 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.542636E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.630 | TFLOPs: 11.27 | 7: iteration 62470/ 173500 | consumed samples: 15992320 | consumed tokens: 32752271360 | elapsed time per iteration (s): 0.09 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 4.542144E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.532 | TFLOPs: 10.17 | 7: iteration 62480/ 173500 | consumed samples: 15994880 | consumed tokens: 32757514240 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.549305E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.917 | TFLOPs: 11.53 | 7: iteration 62490/ 173500 | consumed samples: 15997440 | consumed tokens: 32762757120 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.552026E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.138 | TFLOPs: 11.91 | 7: iteration 62500/ 173500 | consumed samples: 16000000 | consumed tokens: 32768000000 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.538835E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.081 | TFLOPs: 11.99 | 7: iteration 62510/ 173500 | consumed samples: 16002560 | consumed tokens: 32773242880 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.547593E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.628 | TFLOPs: 11.72 | 7: iteration 62520/ 173500 | consumed samples: 16005120 | consumed tokens: 32778485760 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.551419E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.914 | TFLOPs: 12.01 | 7: iteration 62530/ 173500 | consumed samples: 16007680 | consumed tokens: 32783728640 | elapsed time per iteration (s): 0.08 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 4.550441E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.386 | TFLOPs: 11.46 | 7: iteration 62540/ 173500 | consumed samples: 16010240 | consumed tokens: 32788971520 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.537765E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.818 | TFLOPs: 11.73 | 7: iteration 62550/ 173500 | consumed samples: 16012800 | consumed tokens: 32794214400 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.540433E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.885 | TFLOPs: 11.96 | 7: iteration 62560/ 173500 | consumed samples: 16015360 | consumed tokens: 32799457280 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.537215E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.177 | TFLOPs: 12.01 | 7: iteration 62570/ 173500 | consumed samples: 16017920 | consumed tokens: 32804700160 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.541072E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.459 | TFLOPs: 12.02 | 7: iteration 62580/ 173500 | consumed samples: 16020480 | consumed tokens: 32809943040 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.540756E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.279 | TFLOPs: 12.02 | 7: iteration 62590/ 173500 | consumed samples: 16023040 | consumed tokens: 32815185920 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.551619E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.436 | TFLOPs: 12.02 | 7: iteration 62600/ 173500 | consumed samples: 16025600 | consumed tokens: 32820428800 | elapsed time per iteration (s): 0.08 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 4.549555E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.833 | TFLOPs: 11.71 | 7: iteration 62610/ 173500 | consumed samples: 16028160 | consumed tokens: 32825671680 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.540929E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.166 | TFLOPs: 11.91 | 7: iteration 62620/ 173500 | consumed samples: 16030720 | consumed tokens: 32830914560 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.532871E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.041 | TFLOPs: 12.01 | 7: iteration 62630/ 173500 | consumed samples: 16033280 | consumed tokens: 32836157440 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.537040E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.474 | TFLOPs: 11.96 | 7: iteration 62640/ 173500 | consumed samples: 16035840 | consumed tokens: 32841400320 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.537490E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.534 | TFLOPs: 12.00 | 7: iteration 62650/ 173500 | consumed samples: 16038400 | consumed tokens: 32846643200 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.534839E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.872 | TFLOPs: 11.68 | 7: iteration 62660/ 173500 | consumed samples: 16040960 | consumed tokens: 32851886080 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.554233E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.792 | TFLOPs: 11.70 | 7: iteration 62670/ 173500 | consumed samples: 16043520 | consumed tokens: 32857128960 | elapsed time per iteration (s): 0.08 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 4.543807E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.810 | TFLOPs: 11.96 | 7: iteration 62680/ 173500 | consumed samples: 16046080 | consumed tokens: 32862371840 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.532026E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.375 | TFLOPs: 11.69 | 7: iteration 62690/ 173500 | consumed samples: 16048640 | consumed tokens: 32867614720 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.553230E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.330 | TFLOPs: 11.85 | 7: iteration 62700/ 173500 | consumed samples: 16051200 | consumed tokens: 32872857600 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.532275E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.422 | TFLOPs: 11.89 | 7: iteration 62710/ 173500 | consumed samples: 16053760 | consumed tokens: 32878100480 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.557176E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.294 | TFLOPs: 11.87 | 7: iteration 62720/ 173500 | consumed samples: 16056320 | consumed tokens: 32883343360 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.532160E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.530 | TFLOPs: 11.62 | 7: iteration 62730/ 173500 | consumed samples: 16058880 | consumed tokens: 32888586240 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.543376E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.520 | TFLOPs: 11.93 | 7: iteration 62740/ 173500 | consumed samples: 16061440 | consumed tokens: 32893829120 | elapsed time per iteration (s): 0.08 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 4.535455E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.818 | TFLOPs: 11.94 | 7: iteration 62750/ 173500 | consumed samples: 16064000 | consumed tokens: 32899072000 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.556206E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.150 | TFLOPs: 11.88 | 7: iteration 62760/ 173500 | consumed samples: 16066560 | consumed tokens: 32904314880 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.542823E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.092 | TFLOPs: 11.90 | 7: iteration 62770/ 173500 | consumed samples: 16069120 | consumed tokens: 32909557760 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.539402E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.997 | TFLOPs: 11.91 | 7: iteration 62780/ 173500 | consumed samples: 16071680 | consumed tokens: 32914800640 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.555542E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.906 | TFLOPs: 11.60 | 7: iteration 62790/ 173500 | consumed samples: 16074240 | consumed tokens: 32920043520 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.538931E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.965 | TFLOPs: 11.88 | 7: iteration 62800/ 173500 | consumed samples: 16076800 | consumed tokens: 32925286400 | elapsed time per iteration (s): 0.08 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 4.537012E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.581 | TFLOPs: 11.91 | 7: iteration 62810/ 173500 | consumed samples: 16079360 | consumed tokens: 32930529280 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.544463E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.961 | TFLOPs: 11.87 | 7: iteration 62820/ 173500 | consumed samples: 16081920 | consumed tokens: 32935772160 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.547469E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.227 | TFLOPs: 11.88 | 7: iteration 62830/ 173500 | consumed samples: 16084480 | consumed tokens: 32941015040 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.540210E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.311 | TFLOPs: 11.56 | 7: iteration 62840/ 173500 | consumed samples: 16087040 | consumed tokens: 32946257920 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.531515E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.860 | TFLOPs: 11.89 | 7: iteration 62850/ 173500 | consumed samples: 16089600 | consumed tokens: 32951500800 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.543272E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.084 | TFLOPs: 11.74 | 7: iteration 62860/ 173500 | consumed samples: 16092160 | consumed tokens: 32956743680 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.536654E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.676 | TFLOPs: 11.85 | 7: iteration 62870/ 173500 | consumed samples: 16094720 | consumed tokens: 32961986560 | elapsed time per iteration (s): 0.08 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 4.547498E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.734 | TFLOPs: 11.28 | 7: iteration 62880/ 173500 | consumed samples: 16097280 | consumed tokens: 32967229440 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.542778E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.935 | TFLOPs: 11.81 | 7: iteration 62890/ 173500 | consumed samples: 16099840 | consumed tokens: 32972472320 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.547583E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.821 | TFLOPs: 11.85 | 7: iteration 62900/ 173500 | consumed samples: 16102400 | consumed tokens: 32977715200 | elapsed time per iteration (s): 0.09 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.543339E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.148 | TFLOPs: 11.11 | 7: iteration 62910/ 173500 | consumed samples: 16104960 | consumed tokens: 32982958080 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.538252E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.109 | TFLOPs: 11.59 | 7: iteration 62920/ 173500 | consumed samples: 16107520 | consumed tokens: 32988200960 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.555925E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.325 | TFLOPs: 11.83 | 7: iteration 62930/ 173500 | consumed samples: 16110080 | consumed tokens: 32993443840 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.555424E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.042 | TFLOPs: 11.79 | 7: iteration 62940/ 173500 | consumed samples: 16112640 | consumed tokens: 32998686720 | elapsed time per iteration (s): 0.08 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.538764E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.388 | TFLOPs: 11.57 | 7: iteration 62950/ 173500 | consumed samples: 16115200 | consumed tokens: 33003929600 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.544071E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.436 | TFLOPs: 11.84 | 7: iteration 62960/ 173500 | consumed samples: 16117760 | consumed tokens: 33009172480 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.539340E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.069 | TFLOPs: 11.81 | 7: iteration 62970/ 173500 | consumed samples: 16120320 | consumed tokens: 33014415360 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.541796E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.013 | TFLOPs: 11.81 | 7: iteration 62980/ 173500 | consumed samples: 16122880 | consumed tokens: 33019658240 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.546944E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.879 | TFLOPs: 11.81 | 7: iteration 62990/ 173500 | consumed samples: 16125440 | consumed tokens: 33024901120 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.547852E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.298 | TFLOPs: 11.83 | 7: iteration 63000/ 173500 | consumed samples: 16128000 | consumed tokens: 33030144000 | elapsed time per iteration (s): 0.08 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.538293E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.686 | TFLOPs: 11.78 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 63000 | lm loss value: 4.413164E+00 | lm loss PPL: 8.253015E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 63000 to checkpoints_14m91b100m 0: [2023-03-17 01:47:42,245] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step63000 is begin to save! 0: [2023-03-17 01:47:42,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:47:42,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:47:42,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:47:42,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:47:42,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:47:42,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:47:42,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:47:42,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:47:42,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:47:42,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:47:42,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:47:42,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:47:42,287] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step63000/mp_rank_00_model_states.pt 0: [2023-03-17 01:47:42,287] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:47:42,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:47:42,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:47:42,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:47:42,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:47:42,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:47:42,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2023-03-17 01:47:42,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2023-03-17 01:47:42,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:47:42,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:47:42,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: successfully saved checkpoint at iteration 63000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.10 7: iteration 63010/ 173500 | consumed samples: 16130560 | consumed tokens: 33035386880 | elapsed time per iteration (s): 0.09 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 4.546836E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.761 | TFLOPs: 10.36 | 7: iteration 63020/ 173500 | consumed samples: 16133120 | consumed tokens: 33040629760 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.545248E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.520 | TFLOPs: 11.85 | 7: iteration 63030/ 173500 | consumed samples: 16135680 | consumed tokens: 33045872640 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.542843E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.296 | TFLOPs: 11.81 | 7: iteration 63040/ 173500 | consumed samples: 16138240 | consumed tokens: 33051115520 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.536160E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.369 | TFLOPs: 11.78 | 7: iteration 63050/ 173500 | consumed samples: 16140800 | consumed tokens: 33056358400 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.536382E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.724 | TFLOPs: 11.80 | 7: iteration 63060/ 173500 | consumed samples: 16143360 | consumed tokens: 33061601280 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.552127E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.850 | TFLOPs: 11.59 | 7: iteration 63070/ 173500 | consumed samples: 16145920 | consumed tokens: 33066844160 | elapsed time per iteration (s): 0.08 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 4.540022E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.437 | TFLOPs: 11.77 | 7: iteration 63080/ 173500 | consumed samples: 16148480 | consumed tokens: 33072087040 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.555097E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.671 | TFLOPs: 11.80 | 7: iteration 63090/ 173500 | consumed samples: 16151040 | consumed tokens: 33077329920 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.537503E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.279 | TFLOPs: 11.83 | 7: iteration 63100/ 173500 | consumed samples: 16153600 | consumed tokens: 33082572800 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.533979E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.277 | TFLOPs: 11.91 | 7: iteration 63110/ 173500 | consumed samples: 16156160 | consumed tokens: 33087815680 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.526097E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.908 | TFLOPs: 11.91 | 7: iteration 63120/ 173500 | consumed samples: 16158720 | consumed tokens: 33093058560 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.542785E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.772 | TFLOPs: 11.90 | 7: iteration 63130/ 173500 | consumed samples: 16161280 | consumed tokens: 33098301440 | elapsed time per iteration (s): 0.08 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.544776E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.747 | TFLOPs: 11.89 | 7: iteration 63140/ 173500 | consumed samples: 16163840 | consumed tokens: 33103544320 | elapsed time per iteration (s): 0.09 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 4.552989E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.762 | TFLOPs: 10.80 | 7: iteration 63150/ 173500 | consumed samples: 16166400 | consumed tokens: 33108787200 | elapsed time per iteration (s): 0.08 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.543777E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.615 | TFLOPs: 11.87 | 7: iteration 63160/ 173500 | consumed samples: 16168960 | consumed tokens: 33114030080 | elapsed time per iteration (s): 0.08 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.530494E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.434 | TFLOPs: 11.90 | 7: iteration 63170/ 173500 | consumed samples: 16171520 | consumed tokens: 33119272960 | elapsed time per iteration (s): 0.08 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.548708E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.470 | TFLOPs: 11.88 | 7: iteration 63180/ 173500 | consumed samples: 16174080 | consumed tokens: 33124515840 | elapsed time per iteration (s): 0.08 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.544558E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.028 | TFLOPs: 11.90 | 7: iteration 63190/ 173500 | consumed samples: 16176640 | consumed tokens: 33129758720 | elapsed time per iteration (s): 0.08 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.545441E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.539 | TFLOPs: 11.37 | 7: iteration 63200/ 173500 | consumed samples: 16179200 | consumed tokens: 33135001600 | elapsed time per iteration (s): 0.09 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.545901E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2753.236 | TFLOPs: 10.24 | 7: iteration 63210/ 173500 | consumed samples: 16181760 | consumed tokens: 33140244480 | elapsed time per iteration (s): 0.09 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 4.550337E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.786 | TFLOPs: 10.10 | 7: iteration 63220/ 173500 | consumed samples: 16184320 | consumed tokens: 33145487360 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.543980E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.601 | TFLOPs: 10.04 | 7: iteration 63230/ 173500 | consumed samples: 16186880 | consumed tokens: 33150730240 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.538122E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.427 | TFLOPs: 10.43 | 7: iteration 63240/ 173500 | consumed samples: 16189440 | consumed tokens: 33155973120 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.544475E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2712.707 | TFLOPs: 10.09 | 7: iteration 63250/ 173500 | consumed samples: 16192000 | consumed tokens: 33161216000 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.539484E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2710.684 | TFLOPs: 10.08 | 7: iteration 63260/ 173500 | consumed samples: 16194560 | consumed tokens: 33166458880 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.542292E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.386 | TFLOPs: 10.54 | 7: iteration 63270/ 173500 | consumed samples: 16197120 | consumed tokens: 33171701760 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.544642E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.721 | TFLOPs: 10.09 | 7: iteration 63280/ 173500 | consumed samples: 16199680 | consumed tokens: 33176944640 | elapsed time per iteration (s): 0.09 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 4.541489E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.992 | TFLOPs: 10.21 | 7: iteration 63290/ 173500 | consumed samples: 16202240 | consumed tokens: 33182187520 | elapsed time per iteration (s): 0.09 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.535117E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.196 | TFLOPs: 10.14 | 7: iteration 63300/ 173500 | consumed samples: 16204800 | consumed tokens: 33187430400 | elapsed time per iteration (s): 0.09 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.548902E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2720.750 | TFLOPs: 10.12 | 7: iteration 63310/ 173500 | consumed samples: 16207360 | consumed tokens: 33192673280 | elapsed time per iteration (s): 0.09 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.537120E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.847 | TFLOPs: 10.97 | 7: iteration 63320/ 173500 | consumed samples: 16209920 | consumed tokens: 33197916160 | elapsed time per iteration (s): 0.09 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.546594E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.813 | TFLOPs: 10.36 | 7: iteration 63330/ 173500 | consumed samples: 16212480 | consumed tokens: 33203159040 | elapsed time per iteration (s): 0.09 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.542912E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.745 | TFLOPs: 10.60 | 7: iteration 63340/ 173500 | consumed samples: 16215040 | consumed tokens: 33208401920 | elapsed time per iteration (s): 0.10 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 4.544613E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.755 | TFLOPs: 9.82 | 7: iteration 63350/ 173500 | consumed samples: 16217600 | consumed tokens: 33213644800 | elapsed time per iteration (s): 0.09 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.542963E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2710.629 | TFLOPs: 10.08 | 7: iteration 63360/ 173500 | consumed samples: 16220160 | consumed tokens: 33218887680 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.538876E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.276 | TFLOPs: 9.15 | 7: iteration 63370/ 173500 | consumed samples: 16222720 | consumed tokens: 33224130560 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.542128E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2659.905 | TFLOPs: 9.89 | 7: iteration 63380/ 173500 | consumed samples: 16225280 | consumed tokens: 33229373440 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.545654E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.604 | TFLOPs: 9.44 | 7: iteration 63390/ 173500 | consumed samples: 16227840 | consumed tokens: 33234616320 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.538581E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.037 | TFLOPs: 9.26 | 7: iteration 63400/ 173500 | consumed samples: 16230400 | consumed tokens: 33239859200 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.529462E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2504.053 | TFLOPs: 9.31 | 7: iteration 63410/ 173500 | consumed samples: 16232960 | consumed tokens: 33245102080 | elapsed time per iteration (s): 0.10 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 4.538431E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.664 | TFLOPs: 9.42 | 7: iteration 63420/ 173500 | consumed samples: 16235520 | consumed tokens: 33250344960 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.537360E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.251 | TFLOPs: 9.24 | 7: iteration 63430/ 173500 | consumed samples: 16238080 | consumed tokens: 33255587840 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.542694E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.059 | TFLOPs: 9.41 | 7: iteration 63440/ 173500 | consumed samples: 16240640 | consumed tokens: 33260830720 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.534217E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.707 | TFLOPs: 9.45 | 7: iteration 63450/ 173500 | consumed samples: 16243200 | consumed tokens: 33266073600 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.536682E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2548.685 | TFLOPs: 9.48 | 7: iteration 63460/ 173500 | consumed samples: 16245760 | consumed tokens: 33271316480 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.542726E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2528.024 | TFLOPs: 9.40 | 7: iteration 63470/ 173500 | consumed samples: 16248320 | consumed tokens: 33276559360 | elapsed time per iteration (s): 0.10 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.545267E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.888 | TFLOPs: 9.30 | 7: iteration 63480/ 173500 | consumed samples: 16250880 | consumed tokens: 33281802240 | elapsed time per iteration (s): 0.11 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 4.540622E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2399.159 | TFLOPs: 8.92 | 7: iteration 63490/ 173500 | consumed samples: 16253440 | consumed tokens: 33287045120 | elapsed time per iteration (s): 0.10 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.540488E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.944 | TFLOPs: 9.46 | 7: iteration 63500/ 173500 | consumed samples: 16256000 | consumed tokens: 33292288000 | elapsed time per iteration (s): 0.10 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.527946E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2628.920 | TFLOPs: 9.78 | 7: iteration 63510/ 173500 | consumed samples: 16258560 | consumed tokens: 33297530880 | elapsed time per iteration (s): 0.09 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.540434E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.266 | TFLOPs: 10.82 | 7: iteration 63520/ 173500 | consumed samples: 16261120 | consumed tokens: 33302773760 | elapsed time per iteration (s): 0.08 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.541284E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.205 | TFLOPs: 11.69 | 7: iteration 63530/ 173500 | consumed samples: 16263680 | consumed tokens: 33308016640 | elapsed time per iteration (s): 0.09 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.540010E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.915 | TFLOPs: 10.78 | 7: iteration 63540/ 173500 | consumed samples: 16266240 | consumed tokens: 33313259520 | elapsed time per iteration (s): 0.10 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.543824E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.600 | TFLOPs: 9.92 | 7: iteration 63550/ 173500 | consumed samples: 16268800 | consumed tokens: 33318502400 | elapsed time per iteration (s): 0.10 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 4.546737E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.139 | TFLOPs: 9.56 | 7: iteration 63560/ 173500 | consumed samples: 16271360 | consumed tokens: 33323745280 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.563427E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.130 | TFLOPs: 9.64 | 7: iteration 63570/ 173500 | consumed samples: 16273920 | consumed tokens: 33328988160 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.543839E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.499 | TFLOPs: 9.68 | 7: iteration 63580/ 173500 | consumed samples: 16276480 | consumed tokens: 33334231040 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.545169E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2627.823 | TFLOPs: 9.77 | 7: iteration 63590/ 173500 | consumed samples: 16279040 | consumed tokens: 33339473920 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.547500E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2679.461 | TFLOPs: 9.97 | 7: iteration 63600/ 173500 | consumed samples: 16281600 | consumed tokens: 33344716800 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.529464E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2656.739 | TFLOPs: 9.88 | 7: iteration 63610/ 173500 | consumed samples: 16284160 | consumed tokens: 33349959680 | elapsed time per iteration (s): 0.10 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 4.541487E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2625.209 | TFLOPs: 9.76 | 7: iteration 63620/ 173500 | consumed samples: 16286720 | consumed tokens: 33355202560 | elapsed time per iteration (s): 0.10 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.546515E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2620.549 | TFLOPs: 9.75 | 7: iteration 63630/ 173500 | consumed samples: 16289280 | consumed tokens: 33360445440 | elapsed time per iteration (s): 0.09 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.532104E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.194 | TFLOPs: 10.14 | 7: iteration 63640/ 173500 | consumed samples: 16291840 | consumed tokens: 33365688320 | elapsed time per iteration (s): 0.09 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.536080E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.141 | TFLOPs: 10.17 | 7: iteration 63650/ 173500 | consumed samples: 16294400 | consumed tokens: 33370931200 | elapsed time per iteration (s): 0.10 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.533488E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2622.474 | TFLOPs: 9.75 | 7: iteration 63660/ 173500 | consumed samples: 16296960 | consumed tokens: 33376174080 | elapsed time per iteration (s): 0.09 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.547517E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.206 | TFLOPs: 10.17 | 7: iteration 63670/ 173500 | consumed samples: 16299520 | consumed tokens: 33381416960 | elapsed time per iteration (s): 0.10 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.552462E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2665.042 | TFLOPs: 9.91 | 7: iteration 63680/ 173500 | consumed samples: 16302080 | consumed tokens: 33386659840 | elapsed time per iteration (s): 0.10 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 4.550915E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2671.050 | TFLOPs: 9.94 | 7: iteration 63690/ 173500 | consumed samples: 16304640 | consumed tokens: 33391902720 | elapsed time per iteration (s): 0.10 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.531905E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2512.816 | TFLOPs: 9.35 | 7: iteration 63700/ 173500 | consumed samples: 16307200 | consumed tokens: 33397145600 | elapsed time per iteration (s): 0.09 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.543223E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.973 | TFLOPs: 11.08 | 7: iteration 63710/ 173500 | consumed samples: 16309760 | consumed tokens: 33402388480 | elapsed time per iteration (s): 0.08 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.544345E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.669 | TFLOPs: 11.87 | 7: iteration 63720/ 173500 | consumed samples: 16312320 | consumed tokens: 33407631360 | elapsed time per iteration (s): 0.08 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.544637E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.135 | TFLOPs: 11.83 | 7: iteration 63730/ 173500 | consumed samples: 16314880 | consumed tokens: 33412874240 | elapsed time per iteration (s): 0.10 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.540032E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2547.974 | TFLOPs: 9.48 | 7: iteration 63740/ 173500 | consumed samples: 16317440 | consumed tokens: 33418117120 | elapsed time per iteration (s): 0.08 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.547419E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.278 | TFLOPs: 11.43 | 7: iteration 63750/ 173500 | consumed samples: 16320000 | consumed tokens: 33423360000 | elapsed time per iteration (s): 0.08 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 4.544472E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.361 | TFLOPs: 11.89 | 7: iteration 63760/ 173500 | consumed samples: 16322560 | consumed tokens: 33428602880 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.549050E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.923 | TFLOPs: 11.86 | 7: iteration 63770/ 173500 | consumed samples: 16325120 | consumed tokens: 33433845760 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.560284E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.321 | TFLOPs: 11.89 | 7: iteration 63780/ 173500 | consumed samples: 16327680 | consumed tokens: 33439088640 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.542812E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.060 | TFLOPs: 11.83 | 7: iteration 63790/ 173500 | consumed samples: 16330240 | consumed tokens: 33444331520 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.537941E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.092 | TFLOPs: 11.88 | 7: iteration 63800/ 173500 | consumed samples: 16332800 | consumed tokens: 33449574400 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.535985E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.261 | TFLOPs: 11.87 | 7: iteration 63810/ 173500 | consumed samples: 16335360 | consumed tokens: 33454817280 | elapsed time per iteration (s): 0.08 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 4.546376E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.123 | TFLOPs: 11.88 | 7: iteration 63820/ 173500 | consumed samples: 16337920 | consumed tokens: 33460060160 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.535161E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.330 | TFLOPs: 11.84 | 7: iteration 63830/ 173500 | consumed samples: 16340480 | consumed tokens: 33465303040 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.553357E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.167 | TFLOPs: 11.85 | 7: iteration 63840/ 173500 | consumed samples: 16343040 | consumed tokens: 33470545920 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.545657E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.120 | TFLOPs: 11.86 | 7: iteration 63850/ 173500 | consumed samples: 16345600 | consumed tokens: 33475788800 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.556403E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.734 | TFLOPs: 11.82 | 7: iteration 63860/ 173500 | consumed samples: 16348160 | consumed tokens: 33481031680 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.536156E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.494 | TFLOPs: 11.79 | 7: iteration 63870/ 173500 | consumed samples: 16350720 | consumed tokens: 33486274560 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.547789E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.447 | TFLOPs: 11.84 | 7: iteration 63880/ 173500 | consumed samples: 16353280 | consumed tokens: 33491517440 | elapsed time per iteration (s): 0.08 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 4.539386E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.001 | TFLOPs: 11.49 | 7: iteration 63890/ 173500 | consumed samples: 16355840 | consumed tokens: 33496760320 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.529723E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.780 | TFLOPs: 11.87 | 7: iteration 63900/ 173500 | consumed samples: 16358400 | consumed tokens: 33502003200 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.539781E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.075 | TFLOPs: 11.82 | 7: iteration 63910/ 173500 | consumed samples: 16360960 | consumed tokens: 33507246080 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.529341E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.840 | TFLOPs: 11.85 | 7: iteration 63920/ 173500 | consumed samples: 16363520 | consumed tokens: 33512488960 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.545104E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.257 | TFLOPs: 11.80 | 7: iteration 63930/ 173500 | consumed samples: 16366080 | consumed tokens: 33517731840 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.537033E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.815 | TFLOPs: 11.79 | 7: iteration 63940/ 173500 | consumed samples: 16368640 | consumed tokens: 33522974720 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.542085E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.730 | TFLOPs: 11.87 | 7: iteration 63950/ 173500 | consumed samples: 16371200 | consumed tokens: 33528217600 | elapsed time per iteration (s): 0.08 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 4.538047E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.071 | TFLOPs: 11.82 | 7: iteration 63960/ 173500 | consumed samples: 16373760 | consumed tokens: 33533460480 | elapsed time per iteration (s): 0.08 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.535966E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.586 | TFLOPs: 11.88 | 7: iteration 63970/ 173500 | consumed samples: 16376320 | consumed tokens: 33538703360 | elapsed time per iteration (s): 0.08 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.533836E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.083 | TFLOPs: 11.82 | 7: iteration 63980/ 173500 | consumed samples: 16378880 | consumed tokens: 33543946240 | elapsed time per iteration (s): 0.08 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.530906E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.379 | TFLOPs: 11.88 | 7: iteration 63990/ 173500 | consumed samples: 16381440 | consumed tokens: 33549189120 | elapsed time per iteration (s): 0.08 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.544516E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.030 | TFLOPs: 11.86 | 0: [2023-03-17 01:49:11,111] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=0, lr=[0.0001476794025098283, 0.0001476794025098283, 0.0001476794025098283], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 64000/ 173500 | consumed samples: 16384000 | consumed tokens: 33554432000 | elapsed time per iteration (s): 0.08 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.536380E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.311 | TFLOPs: 11.88 | 0: steps: 64000 loss: 4.5398 iter time (s): 0.084 samples/sec: 3039.100 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 64000 | lm loss value: 4.462161E+00 | lm loss PPL: 8.667462E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 64000 to checkpoints_14m91b100m 0: [2023-03-17 01:49:11,169] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step64000 is begin to save! 0: [2023-03-17 01:49:11,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:49:11,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:49:11,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:49:11,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:49:11,201] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:49:11,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:49:11,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:49:11,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:49:11,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:49:11,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:49:11,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:49:11,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:49:11,211] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step64000/mp_rank_00_model_states.pt 0: [2023-03-17 01:49:11,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:49:11,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:49:11,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:49:11,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:49:11,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2023-03-17 01:49:11,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:49:11,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2023-03-17 01:49:11,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:49:11,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:49:11,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2023-03-17 01:49:11,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:49:11,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:49:11,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: successfully saved checkpoint at iteration 64000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.86 7: iteration 64010/ 173500 | consumed samples: 16386560 | consumed tokens: 33559674880 | elapsed time per iteration (s): 0.09 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 4.537960E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2745.919 | TFLOPs: 10.21 | 7: iteration 64020/ 173500 | consumed samples: 16389120 | consumed tokens: 33564917760 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.540106E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.651 | TFLOPs: 11.88 | 7: iteration 64030/ 173500 | consumed samples: 16391680 | consumed tokens: 33570160640 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.556333E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.595 | TFLOPs: 11.74 | 7: iteration 64040/ 173500 | consumed samples: 16394240 | consumed tokens: 33575403520 | elapsed time per iteration (s): 0.09 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.546823E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.479 | TFLOPs: 11.19 | 7: iteration 64050/ 173500 | consumed samples: 16396800 | consumed tokens: 33580646400 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.546878E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.270 | TFLOPs: 11.91 | 7: iteration 64060/ 173500 | consumed samples: 16399360 | consumed tokens: 33585889280 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.542999E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.633 | TFLOPs: 11.89 | 7: iteration 64070/ 173500 | consumed samples: 16401920 | consumed tokens: 33591132160 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.550271E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.747 | TFLOPs: 11.80 | 7: iteration 64080/ 173500 | consumed samples: 16404480 | consumed tokens: 33596375040 | elapsed time per iteration (s): 0.08 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.550239E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.429 | TFLOPs: 11.92 | 7: iteration 64090/ 173500 | consumed samples: 16407040 | consumed tokens: 33601617920 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.542929E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.949 | TFLOPs: 11.94 | 7: iteration 64100/ 173500 | consumed samples: 16409600 | consumed tokens: 33606860800 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.540757E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.631 | TFLOPs: 11.93 | 7: iteration 64110/ 173500 | consumed samples: 16412160 | consumed tokens: 33612103680 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.531865E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.133 | TFLOPs: 11.93 | 7: iteration 64120/ 173500 | consumed samples: 16414720 | consumed tokens: 33617346560 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.540067E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.834 | TFLOPs: 11.93 | 7: iteration 64130/ 173500 | consumed samples: 16417280 | consumed tokens: 33622589440 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.537018E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.606 | TFLOPs: 11.89 | 7: iteration 64140/ 173500 | consumed samples: 16419840 | consumed tokens: 33627832320 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.551427E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.309 | TFLOPs: 11.93 | 7: iteration 64150/ 173500 | consumed samples: 16422400 | consumed tokens: 33633075200 | elapsed time per iteration (s): 0.08 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 4.542882E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.701 | TFLOPs: 11.89 | 7: iteration 64160/ 173500 | consumed samples: 16424960 | consumed tokens: 33638318080 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.537281E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.652 | TFLOPs: 11.89 | 7: iteration 64170/ 173500 | consumed samples: 16427520 | consumed tokens: 33643560960 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.544617E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.420 | TFLOPs: 11.93 | 7: iteration 64180/ 173500 | consumed samples: 16430080 | consumed tokens: 33648803840 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.546485E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.189 | TFLOPs: 11.90 | 7: iteration 64190/ 173500 | consumed samples: 16432640 | consumed tokens: 33654046720 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.538498E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.643 | TFLOPs: 11.92 | 7: iteration 64200/ 173500 | consumed samples: 16435200 | consumed tokens: 33659289600 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.536391E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.852 | TFLOPs: 11.88 | 7: iteration 64210/ 173500 | consumed samples: 16437760 | consumed tokens: 33664532480 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.534521E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.477 | TFLOPs: 11.89 | 7: iteration 64220/ 173500 | consumed samples: 16440320 | consumed tokens: 33669775360 | elapsed time per iteration (s): 0.08 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 4.539065E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.504 | TFLOPs: 11.84 | 7: iteration 64230/ 173500 | consumed samples: 16442880 | consumed tokens: 33675018240 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.549612E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.664 | TFLOPs: 11.86 | 7: iteration 64240/ 173500 | consumed samples: 16445440 | consumed tokens: 33680261120 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.550121E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.813 | TFLOPs: 12.06 | 7: iteration 64250/ 173500 | consumed samples: 16448000 | consumed tokens: 33685504000 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.545325E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.268 | TFLOPs: 12.03 | 7: iteration 64260/ 173500 | consumed samples: 16450560 | consumed tokens: 33690746880 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.539859E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.192 | TFLOPs: 11.95 | 7: iteration 64270/ 173500 | consumed samples: 16453120 | consumed tokens: 33695989760 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.524799E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.600 | TFLOPs: 11.88 | 7: iteration 64280/ 173500 | consumed samples: 16455680 | consumed tokens: 33701232640 | elapsed time per iteration (s): 0.08 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 4.537969E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.177 | TFLOPs: 12.01 | 7: iteration 64290/ 173500 | consumed samples: 16458240 | consumed tokens: 33706475520 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.537468E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.339 | TFLOPs: 11.98 | 7: iteration 64300/ 173500 | consumed samples: 16460800 | consumed tokens: 33711718400 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.548870E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.610 | TFLOPs: 12.03 | 7: iteration 64310/ 173500 | consumed samples: 16463360 | consumed tokens: 33716961280 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.535422E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.278 | TFLOPs: 12.01 | 7: iteration 64320/ 173500 | consumed samples: 16465920 | consumed tokens: 33722204160 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.538239E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.869 | TFLOPs: 11.95 | 7: iteration 64330/ 173500 | consumed samples: 16468480 | consumed tokens: 33727447040 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.541832E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.536 | TFLOPs: 11.90 | 7: iteration 64340/ 173500 | consumed samples: 16471040 | consumed tokens: 33732689920 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.536283E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.265 | TFLOPs: 11.35 | 7: iteration 64350/ 173500 | consumed samples: 16473600 | consumed tokens: 33737932800 | elapsed time per iteration (s): 0.08 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 4.541753E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.324 | TFLOPs: 11.44 | 7: iteration 64360/ 173500 | consumed samples: 16476160 | consumed tokens: 33743175680 | elapsed time per iteration (s): 0.08 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.539880E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.526 | TFLOPs: 12.01 | 7: iteration 64370/ 173500 | consumed samples: 16478720 | consumed tokens: 33748418560 | elapsed time per iteration (s): 0.09 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.536802E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.542 | TFLOPs: 11.08 | 7: iteration 64380/ 173500 | consumed samples: 16481280 | consumed tokens: 33753661440 | elapsed time per iteration (s): 0.09 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.532595E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.319 | TFLOPs: 10.54 | 7: iteration 64390/ 173500 | consumed samples: 16483840 | consumed tokens: 33758904320 | elapsed time per iteration (s): 0.08 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.540736E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.067 | TFLOPs: 11.86 | 7: iteration 64400/ 173500 | consumed samples: 16486400 | consumed tokens: 33764147200 | elapsed time per iteration (s): 0.09 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.545493E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.698 | TFLOPs: 10.34 | 7: iteration 64410/ 173500 | consumed samples: 16488960 | consumed tokens: 33769390080 | elapsed time per iteration (s): 0.08 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.536211E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.880 | TFLOPs: 11.27 | 7: iteration 64420/ 173500 | consumed samples: 16491520 | consumed tokens: 33774632960 | elapsed time per iteration (s): 0.08 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 4.534911E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.392 | TFLOPs: 11.61 | 7: iteration 64430/ 173500 | consumed samples: 16494080 | consumed tokens: 33779875840 | elapsed time per iteration (s): 0.09 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.517120E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.070 | TFLOPs: 11.19 | 7: iteration 64440/ 173500 | consumed samples: 16496640 | consumed tokens: 33785118720 | elapsed time per iteration (s): 0.08 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.541779E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.267 | TFLOPs: 11.59 | 7: iteration 64450/ 173500 | consumed samples: 16499200 | consumed tokens: 33790361600 | elapsed time per iteration (s): 0.08 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.526669E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.401 | TFLOPs: 11.93 | 7: iteration 64460/ 173500 | consumed samples: 16501760 | consumed tokens: 33795604480 | elapsed time per iteration (s): 0.09 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.537888E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.515 | TFLOPs: 10.44 | 7: iteration 64470/ 173500 | consumed samples: 16504320 | consumed tokens: 33800847360 | elapsed time per iteration (s): 0.09 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.548132E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2826.906 | TFLOPs: 10.51 | 7: iteration 64480/ 173500 | consumed samples: 16506880 | consumed tokens: 33806090240 | elapsed time per iteration (s): 0.08 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 4.534005E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.696 | TFLOPs: 11.67 | 7: iteration 64490/ 173500 | consumed samples: 16509440 | consumed tokens: 33811333120 | elapsed time per iteration (s): 0.08 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.550937E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.548 | TFLOPs: 11.91 | 7: iteration 64500/ 173500 | consumed samples: 16512000 | consumed tokens: 33816576000 | elapsed time per iteration (s): 0.13 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.537152E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2016.749 | TFLOPs: 7.50 | 7: iteration 64510/ 173500 | consumed samples: 16514560 | consumed tokens: 33821818880 | elapsed time per iteration (s): 0.12 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.552339E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2077.961 | TFLOPs: 7.73 | 7: iteration 64520/ 173500 | consumed samples: 16517120 | consumed tokens: 33827061760 | elapsed time per iteration (s): 0.13 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.548087E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.363 | TFLOPs: 7.46 | 7: iteration 64530/ 173500 | consumed samples: 16519680 | consumed tokens: 33832304640 | elapsed time per iteration (s): 0.12 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.538363E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.233 | TFLOPs: 7.83 | 7: iteration 64540/ 173500 | consumed samples: 16522240 | consumed tokens: 33837547520 | elapsed time per iteration (s): 0.13 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.526501E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.289 | TFLOPs: 7.46 | 7: iteration 64550/ 173500 | consumed samples: 16524800 | consumed tokens: 33842790400 | elapsed time per iteration (s): 0.09 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 4.530659E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.434 | TFLOPs: 11.15 | 7: iteration 64560/ 173500 | consumed samples: 16527360 | consumed tokens: 33848033280 | elapsed time per iteration (s): 0.09 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.542829E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.094 | TFLOPs: 11.20 | 7: iteration 64570/ 173500 | consumed samples: 16529920 | consumed tokens: 33853276160 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.528299E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.052 | TFLOPs: 11.71 | 7: iteration 64580/ 173500 | consumed samples: 16532480 | consumed tokens: 33858519040 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.542337E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.546 | TFLOPs: 12.03 | 7: iteration 64590/ 173500 | consumed samples: 16535040 | consumed tokens: 33863761920 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.542674E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.263 | TFLOPs: 11.99 | 7: iteration 64600/ 173500 | consumed samples: 16537600 | consumed tokens: 33869004800 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.547892E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.826 | TFLOPs: 11.90 | 7: iteration 64610/ 173500 | consumed samples: 16540160 | consumed tokens: 33874247680 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.548114E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.127 | TFLOPs: 12.01 | 7: iteration 64620/ 173500 | consumed samples: 16542720 | consumed tokens: 33879490560 | elapsed time per iteration (s): 0.08 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 4.544251E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.552 | TFLOPs: 12.04 | 7: iteration 64630/ 173500 | consumed samples: 16545280 | consumed tokens: 33884733440 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.547399E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.007 | TFLOPs: 12.00 | 7: iteration 64640/ 173500 | consumed samples: 16547840 | consumed tokens: 33889976320 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.537670E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.663 | TFLOPs: 12.02 | 7: iteration 64650/ 173500 | consumed samples: 16550400 | consumed tokens: 33895219200 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.546790E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.670 | TFLOPs: 12.00 | 7: iteration 64660/ 173500 | consumed samples: 16552960 | consumed tokens: 33900462080 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.552615E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.086 | TFLOPs: 12.02 | 7: iteration 64670/ 173500 | consumed samples: 16555520 | consumed tokens: 33905704960 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.543113E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.904 | TFLOPs: 12.01 | 7: iteration 64680/ 173500 | consumed samples: 16558080 | consumed tokens: 33910947840 | elapsed time per iteration (s): 0.08 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 4.559923E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.611 | TFLOPs: 11.95 | 7: iteration 64690/ 173500 | consumed samples: 16560640 | consumed tokens: 33916190720 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.545575E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.497 | TFLOPs: 12.01 | 7: iteration 64700/ 173500 | consumed samples: 16563200 | consumed tokens: 33921433600 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.548794E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.308 | TFLOPs: 12.05 | 7: iteration 64710/ 173500 | consumed samples: 16565760 | consumed tokens: 33926676480 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.537302E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.402 | TFLOPs: 11.99 | 7: iteration 64720/ 173500 | consumed samples: 16568320 | consumed tokens: 33931919360 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.553074E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.204 | TFLOPs: 12.03 | 7: iteration 64730/ 173500 | consumed samples: 16570880 | consumed tokens: 33937162240 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.541315E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.880 | TFLOPs: 11.85 | 7: iteration 64740/ 173500 | consumed samples: 16573440 | consumed tokens: 33942405120 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.548597E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.848 | TFLOPs: 12.00 | 7: iteration 64750/ 173500 | consumed samples: 16576000 | consumed tokens: 33947648000 | elapsed time per iteration (s): 0.08 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 4.547992E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.200 | TFLOPs: 11.98 | 7: iteration 64760/ 173500 | consumed samples: 16578560 | consumed tokens: 33952890880 | elapsed time per iteration (s): 0.11 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.544100E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2357.372 | TFLOPs: 8.77 | 7: iteration 64770/ 173500 | consumed samples: 16581120 | consumed tokens: 33958133760 | elapsed time per iteration (s): 0.12 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.538161E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2214.473 | TFLOPs: 8.24 | 7: iteration 64780/ 173500 | consumed samples: 16583680 | consumed tokens: 33963376640 | elapsed time per iteration (s): 0.09 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.545550E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.022 | TFLOPs: 11.02 | 7: iteration 64790/ 173500 | consumed samples: 16586240 | consumed tokens: 33968619520 | elapsed time per iteration (s): 0.10 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.536017E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2640.349 | TFLOPs: 9.82 | 7: iteration 64800/ 173500 | consumed samples: 16588800 | consumed tokens: 33973862400 | elapsed time per iteration (s): 0.08 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.531420E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.442 | TFLOPs: 11.39 | 7: iteration 64810/ 173500 | consumed samples: 16591360 | consumed tokens: 33979105280 | elapsed time per iteration (s): 0.12 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 4.538740E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.085 | TFLOPs: 8.22 | 7: iteration 64820/ 173500 | consumed samples: 16593920 | consumed tokens: 33984348160 | elapsed time per iteration (s): 0.11 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.534438E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.691 | TFLOPs: 8.55 | 7: iteration 64830/ 173500 | consumed samples: 16596480 | consumed tokens: 33989591040 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.541989E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.039 | TFLOPs: 11.95 | 7: iteration 64840/ 173500 | consumed samples: 16599040 | consumed tokens: 33994833920 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.551602E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.114 | TFLOPs: 11.98 | 7: iteration 64850/ 173500 | consumed samples: 16601600 | consumed tokens: 34000076800 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.542929E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.177 | TFLOPs: 12.01 | 7: iteration 64860/ 173500 | consumed samples: 16604160 | consumed tokens: 34005319680 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.546138E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.320 | TFLOPs: 11.97 | 7: iteration 64870/ 173500 | consumed samples: 16606720 | consumed tokens: 34010562560 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.548224E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.032 | TFLOPs: 11.95 | 7: iteration 64880/ 173500 | consumed samples: 16609280 | consumed tokens: 34015805440 | elapsed time per iteration (s): 0.08 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 4.545683E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.818 | TFLOPs: 11.97 | 7: iteration 64890/ 173500 | consumed samples: 16611840 | consumed tokens: 34021048320 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.542957E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.731 | TFLOPs: 12.01 | 7: iteration 64900/ 173500 | consumed samples: 16614400 | consumed tokens: 34026291200 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.533517E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.879 | TFLOPs: 11.37 | 7: iteration 64910/ 173500 | consumed samples: 16616960 | consumed tokens: 34031534080 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.553247E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.071 | TFLOPs: 11.84 | 7: iteration 64920/ 173500 | consumed samples: 16619520 | consumed tokens: 34036776960 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.535749E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.520 | TFLOPs: 11.83 | 7: iteration 64930/ 173500 | consumed samples: 16622080 | consumed tokens: 34042019840 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.538794E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.283 | TFLOPs: 11.85 | 7: iteration 64940/ 173500 | consumed samples: 16624640 | consumed tokens: 34047262720 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.534447E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.238 | TFLOPs: 11.83 | 7: iteration 64950/ 173500 | consumed samples: 16627200 | consumed tokens: 34052505600 | elapsed time per iteration (s): 0.08 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 4.532483E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.930 | TFLOPs: 11.85 | 7: iteration 64960/ 173500 | consumed samples: 16629760 | consumed tokens: 34057748480 | elapsed time per iteration (s): 0.08 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.538422E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.622 | TFLOPs: 11.76 | 7: iteration 64970/ 173500 | consumed samples: 16632320 | consumed tokens: 34062991360 | elapsed time per iteration (s): 0.08 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.549103E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.167 | TFLOPs: 11.97 | 7: iteration 64980/ 173500 | consumed samples: 16634880 | consumed tokens: 34068234240 | elapsed time per iteration (s): 0.08 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.539276E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.419 | TFLOPs: 11.99 | 7: iteration 64990/ 173500 | consumed samples: 16637440 | consumed tokens: 34073477120 | elapsed time per iteration (s): 0.08 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.546075E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.148 | TFLOPs: 12.00 | 7: iteration 65000/ 173500 | consumed samples: 16640000 | consumed tokens: 34078720000 | elapsed time per iteration (s): 0.08 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.557777E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.780 | TFLOPs: 11.99 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 65000 | lm loss value: 4.399868E+00 | lm loss PPL: 8.144008E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 65000 to checkpoints_14m91b100m 0: [2023-03-17 01:50:35,895] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step65000 is begin to save! 0: [2023-03-17 01:50:35,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:50:35,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:50:35,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:50:35,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:50:35,927] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:50:35,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:50:35,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:50:35,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:50:35,933] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:50:35,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:50:35,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:50:35,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:50:35,937] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step65000/mp_rank_00_model_states.pt 0: [2023-03-17 01:50:35,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:50:35,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:50:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:50:35,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 6: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:50:35,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2023-03-17 01:50:35,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2023-03-17 01:50:35,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:50:35,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:50:35,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: successfully saved checkpoint at iteration 65000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.19 7: iteration 65010/ 173500 | consumed samples: 16642560 | consumed tokens: 34083962880 | elapsed time per iteration (s): 0.09 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 4.538599E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.872 | TFLOPs: 10.39 | 7: iteration 65020/ 173500 | consumed samples: 16645120 | consumed tokens: 34089205760 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.532570E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.127 | TFLOPs: 11.84 | 7: iteration 65030/ 173500 | consumed samples: 16647680 | consumed tokens: 34094448640 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.550224E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.348 | TFLOPs: 11.29 | 7: iteration 65040/ 173500 | consumed samples: 16650240 | consumed tokens: 34099691520 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.537056E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.688 | TFLOPs: 11.80 | 7: iteration 65050/ 173500 | consumed samples: 16652800 | consumed tokens: 34104934400 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.523386E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.896 | TFLOPs: 11.84 | 7: iteration 65060/ 173500 | consumed samples: 16655360 | consumed tokens: 34110177280 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.545039E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.844 | TFLOPs: 11.81 | 7: iteration 65070/ 173500 | consumed samples: 16657920 | consumed tokens: 34115420160 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.538004E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.518 | TFLOPs: 11.44 | 7: iteration 65080/ 173500 | consumed samples: 16660480 | consumed tokens: 34120663040 | elapsed time per iteration (s): 0.08 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 4.548390E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.386 | TFLOPs: 11.77 | 7: iteration 65090/ 173500 | consumed samples: 16663040 | consumed tokens: 34125905920 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.546575E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.374 | TFLOPs: 11.21 | 7: iteration 65100/ 173500 | consumed samples: 16665600 | consumed tokens: 34131148800 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.545031E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.790 | TFLOPs: 11.76 | 7: iteration 65110/ 173500 | consumed samples: 16668160 | consumed tokens: 34136391680 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.542369E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.572 | TFLOPs: 11.60 | 7: iteration 65120/ 173500 | consumed samples: 16670720 | consumed tokens: 34141634560 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.543995E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.897 | TFLOPs: 11.68 | 7: iteration 65130/ 173500 | consumed samples: 16673280 | consumed tokens: 34146877440 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.540011E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.087 | TFLOPs: 11.78 | 7: iteration 65140/ 173500 | consumed samples: 16675840 | consumed tokens: 34152120320 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.535894E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.857 | TFLOPs: 11.84 | 7: iteration 65150/ 173500 | consumed samples: 16678400 | consumed tokens: 34157363200 | elapsed time per iteration (s): 0.08 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 4.539024E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.642 | TFLOPs: 11.79 | 7: iteration 65160/ 173500 | consumed samples: 16680960 | consumed tokens: 34162606080 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.553956E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.491 | TFLOPs: 11.76 | 7: iteration 65170/ 173500 | consumed samples: 16683520 | consumed tokens: 34167848960 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.552539E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.582 | TFLOPs: 11.83 | 7: iteration 65180/ 173500 | consumed samples: 16686080 | consumed tokens: 34173091840 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.535675E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.164 | TFLOPs: 11.84 | 7: iteration 65190/ 173500 | consumed samples: 16688640 | consumed tokens: 34178334720 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.540147E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.576 | TFLOPs: 11.82 | 7: iteration 65200/ 173500 | consumed samples: 16691200 | consumed tokens: 34183577600 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.540539E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.742 | TFLOPs: 11.82 | 7: iteration 65210/ 173500 | consumed samples: 16693760 | consumed tokens: 34188820480 | elapsed time per iteration (s): 0.08 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 4.540473E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.531 | TFLOPs: 11.81 | 7: iteration 65220/ 173500 | consumed samples: 16696320 | consumed tokens: 34194063360 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.534126E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.075 | TFLOPs: 11.77 | 7: iteration 65230/ 173500 | consumed samples: 16698880 | consumed tokens: 34199306240 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.548004E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.242 | TFLOPs: 11.87 | 7: iteration 65240/ 173500 | consumed samples: 16701440 | consumed tokens: 34204549120 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.550322E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.657 | TFLOPs: 11.84 | 7: iteration 65250/ 173500 | consumed samples: 16704000 | consumed tokens: 34209792000 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.538944E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.412 | TFLOPs: 11.85 | 7: iteration 65260/ 173500 | consumed samples: 16706560 | consumed tokens: 34215034880 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.546299E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.852 | TFLOPs: 11.85 | 7: iteration 65270/ 173500 | consumed samples: 16709120 | consumed tokens: 34220277760 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.540271E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.850 | TFLOPs: 11.87 | 7: iteration 65280/ 173500 | consumed samples: 16711680 | consumed tokens: 34225520640 | elapsed time per iteration (s): 0.08 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.538120E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.852 | TFLOPs: 11.86 | 7: iteration 65290/ 173500 | consumed samples: 16714240 | consumed tokens: 34230763520 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.529895E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.147 | TFLOPs: 11.85 | 7: iteration 65300/ 173500 | consumed samples: 16716800 | consumed tokens: 34236006400 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.539529E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.967 | TFLOPs: 11.87 | 7: iteration 65310/ 173500 | consumed samples: 16719360 | consumed tokens: 34241249280 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.534183E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.350 | TFLOPs: 11.85 | 7: iteration 65320/ 173500 | consumed samples: 16721920 | consumed tokens: 34246492160 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.534410E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.869 | TFLOPs: 11.86 | 7: iteration 65330/ 173500 | consumed samples: 16724480 | consumed tokens: 34251735040 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.547923E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.024 | TFLOPs: 11.84 | 7: iteration 65340/ 173500 | consumed samples: 16727040 | consumed tokens: 34256977920 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.538488E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.489 | TFLOPs: 11.84 | 7: iteration 65350/ 173500 | consumed samples: 16729600 | consumed tokens: 34262220800 | elapsed time per iteration (s): 0.08 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 4.552300E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.810 | TFLOPs: 11.83 | 7: iteration 65360/ 173500 | consumed samples: 16732160 | consumed tokens: 34267463680 | elapsed time per iteration (s): 0.08 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.542476E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.116 | TFLOPs: 11.84 | 7: iteration 65370/ 173500 | consumed samples: 16734720 | consumed tokens: 34272706560 | elapsed time per iteration (s): 0.08 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.540958E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.816 | TFLOPs: 11.86 | 7: iteration 65380/ 173500 | consumed samples: 16737280 | consumed tokens: 34277949440 | elapsed time per iteration (s): 0.08 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.541859E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.783 | TFLOPs: 11.79 | 7: iteration 65390/ 173500 | consumed samples: 16739840 | consumed tokens: 34283192320 | elapsed time per iteration (s): 0.08 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.536246E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.898 | TFLOPs: 11.85 | 7: iteration 65400/ 173500 | consumed samples: 16742400 | consumed tokens: 34288435200 | elapsed time per iteration (s): 0.08 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.544925E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.372 | TFLOPs: 11.83 | 7: iteration 65410/ 173500 | consumed samples: 16744960 | consumed tokens: 34293678080 | elapsed time per iteration (s): 0.09 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 4.547782E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.073 | TFLOPs: 10.88 | 7: iteration 65420/ 173500 | consumed samples: 16747520 | consumed tokens: 34298920960 | elapsed time per iteration (s): 0.10 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.550675E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.514 | TFLOPs: 9.96 | 7: iteration 65430/ 173500 | consumed samples: 16750080 | consumed tokens: 34304163840 | elapsed time per iteration (s): 0.09 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.527002E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2782.214 | TFLOPs: 10.35 | 7: iteration 65440/ 173500 | consumed samples: 16752640 | consumed tokens: 34309406720 | elapsed time per iteration (s): 0.09 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.532421E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.176 | TFLOPs: 11.00 | 7: iteration 65450/ 173500 | consumed samples: 16755200 | consumed tokens: 34314649600 | elapsed time per iteration (s): 0.08 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.546432E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.970 | TFLOPs: 11.83 | 7: iteration 65460/ 173500 | consumed samples: 16757760 | consumed tokens: 34319892480 | elapsed time per iteration (s): 0.08 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.536999E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.814 | TFLOPs: 11.75 | 7: iteration 65470/ 173500 | consumed samples: 16760320 | consumed tokens: 34325135360 | elapsed time per iteration (s): 0.08 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.545742E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.456 | TFLOPs: 11.77 | 7: iteration 65480/ 173500 | consumed samples: 16762880 | consumed tokens: 34330378240 | elapsed time per iteration (s): 0.08 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 4.537915E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.575 | TFLOPs: 11.77 | 7: iteration 65490/ 173500 | consumed samples: 16765440 | consumed tokens: 34335621120 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.540469E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.506 | TFLOPs: 11.87 | 7: iteration 65500/ 173500 | consumed samples: 16768000 | consumed tokens: 34340864000 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.538121E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.645 | TFLOPs: 11.83 | 7: iteration 65510/ 173500 | consumed samples: 16770560 | consumed tokens: 34346106880 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.546585E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.915 | TFLOPs: 11.79 | 7: iteration 65520/ 173500 | consumed samples: 16773120 | consumed tokens: 34351349760 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.540911E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.509 | TFLOPs: 11.77 | 7: iteration 65530/ 173500 | consumed samples: 16775680 | consumed tokens: 34356592640 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.548097E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.287 | TFLOPs: 11.87 | 7: iteration 65540/ 173500 | consumed samples: 16778240 | consumed tokens: 34361835520 | elapsed time per iteration (s): 0.08 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 4.534296E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.249 | TFLOPs: 11.87 | 7: iteration 65550/ 173500 | consumed samples: 16780800 | consumed tokens: 34367078400 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.549683E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.584 | TFLOPs: 11.82 | 7: iteration 65560/ 173500 | consumed samples: 16783360 | consumed tokens: 34372321280 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.539633E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.448 | TFLOPs: 11.76 | 7: iteration 65570/ 173500 | consumed samples: 16785920 | consumed tokens: 34377564160 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.539861E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.532 | TFLOPs: 11.61 | 7: iteration 65580/ 173500 | consumed samples: 16788480 | consumed tokens: 34382807040 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.535070E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.525 | TFLOPs: 11.86 | 7: iteration 65590/ 173500 | consumed samples: 16791040 | consumed tokens: 34388049920 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.531433E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.503 | TFLOPs: 11.80 | 7: iteration 65600/ 173500 | consumed samples: 16793600 | consumed tokens: 34393292800 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.539899E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.648 | TFLOPs: 11.84 | 7: iteration 65610/ 173500 | consumed samples: 16796160 | consumed tokens: 34398535680 | elapsed time per iteration (s): 0.08 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 4.551205E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.011 | TFLOPs: 11.85 | 7: iteration 65620/ 173500 | consumed samples: 16798720 | consumed tokens: 34403778560 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.536233E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.542 | TFLOPs: 11.65 | 7: iteration 65630/ 173500 | consumed samples: 16801280 | consumed tokens: 34409021440 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.540570E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.195 | TFLOPs: 11.88 | 7: iteration 65640/ 173500 | consumed samples: 16803840 | consumed tokens: 34414264320 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.555014E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.421 | TFLOPs: 11.84 | 7: iteration 65650/ 173500 | consumed samples: 16806400 | consumed tokens: 34419507200 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.541899E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.724 | TFLOPs: 11.54 | 7: iteration 65660/ 173500 | consumed samples: 16808960 | consumed tokens: 34424750080 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.536446E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.930 | TFLOPs: 11.82 | 7: iteration 65670/ 173500 | consumed samples: 16811520 | consumed tokens: 34429992960 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.541928E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.794 | TFLOPs: 11.82 | 7: iteration 65680/ 173500 | consumed samples: 16814080 | consumed tokens: 34435235840 | elapsed time per iteration (s): 0.08 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 4.539970E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.862 | TFLOPs: 11.76 | 7: iteration 65690/ 173500 | consumed samples: 16816640 | consumed tokens: 34440478720 | elapsed time per iteration (s): 0.08 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.541207E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.792 | TFLOPs: 11.84 | 7: iteration 65700/ 173500 | consumed samples: 16819200 | consumed tokens: 34445721600 | elapsed time per iteration (s): 0.10 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.542180E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.134 | TFLOPs: 9.52 | 7: iteration 65710/ 173500 | consumed samples: 16821760 | consumed tokens: 34450964480 | elapsed time per iteration (s): 0.08 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.550623E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.163 | TFLOPs: 11.73 | 7: iteration 65720/ 173500 | consumed samples: 16824320 | consumed tokens: 34456207360 | elapsed time per iteration (s): 0.08 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.539390E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.761 | TFLOPs: 11.82 | 7: iteration 65730/ 173500 | consumed samples: 16826880 | consumed tokens: 34461450240 | elapsed time per iteration (s): 0.08 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.535536E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.612 | TFLOPs: 11.83 | 7: iteration 65740/ 173500 | consumed samples: 16829440 | consumed tokens: 34466693120 | elapsed time per iteration (s): 0.08 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 4.539732E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.400 | TFLOPs: 11.78 | 7: iteration 65750/ 173500 | consumed samples: 16832000 | consumed tokens: 34471936000 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.534726E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.934 | TFLOPs: 11.87 | 7: iteration 65760/ 173500 | consumed samples: 16834560 | consumed tokens: 34477178880 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.529970E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.481 | TFLOPs: 11.80 | 7: iteration 65770/ 173500 | consumed samples: 16837120 | consumed tokens: 34482421760 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.538181E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.243 | TFLOPs: 11.61 | 7: iteration 65780/ 173500 | consumed samples: 16839680 | consumed tokens: 34487664640 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.531458E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.666 | TFLOPs: 11.86 | 7: iteration 65790/ 173500 | consumed samples: 16842240 | consumed tokens: 34492907520 | elapsed time per iteration (s): 0.11 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.541094E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.257 | TFLOPs: 9.02 | 7: iteration 65800/ 173500 | consumed samples: 16844800 | consumed tokens: 34498150400 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.542908E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.815 | TFLOPs: 11.85 | 7: iteration 65810/ 173500 | consumed samples: 16847360 | consumed tokens: 34503393280 | elapsed time per iteration (s): 0.08 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 4.536554E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.835 | TFLOPs: 11.83 | 7: iteration 65820/ 173500 | consumed samples: 16849920 | consumed tokens: 34508636160 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.536975E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.051 | TFLOPs: 11.84 | 7: iteration 65830/ 173500 | consumed samples: 16852480 | consumed tokens: 34513879040 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.542283E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.751 | TFLOPs: 11.81 | 7: iteration 65840/ 173500 | consumed samples: 16855040 | consumed tokens: 34519121920 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.541022E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.018 | TFLOPs: 11.84 | 7: iteration 65850/ 173500 | consumed samples: 16857600 | consumed tokens: 34524364800 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.545641E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.045 | TFLOPs: 11.41 | 7: iteration 65860/ 173500 | consumed samples: 16860160 | consumed tokens: 34529607680 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.531697E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.527 | TFLOPs: 11.85 | 7: iteration 65870/ 173500 | consumed samples: 16862720 | consumed tokens: 34534850560 | elapsed time per iteration (s): 0.08 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 4.537559E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.832 | TFLOPs: 11.82 | 7: iteration 65880/ 173500 | consumed samples: 16865280 | consumed tokens: 34540093440 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.546733E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.628 | TFLOPs: 11.85 | 7: iteration 65890/ 173500 | consumed samples: 16867840 | consumed tokens: 34545336320 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.545596E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.346 | TFLOPs: 11.83 | 7: iteration 65900/ 173500 | consumed samples: 16870400 | consumed tokens: 34550579200 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.537085E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.878 | TFLOPs: 11.84 | 7: iteration 65910/ 173500 | consumed samples: 16872960 | consumed tokens: 34555822080 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.554985E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.805 | TFLOPs: 11.57 | 7: iteration 65920/ 173500 | consumed samples: 16875520 | consumed tokens: 34561064960 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.540378E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.827 | TFLOPs: 11.76 | 7: iteration 65930/ 173500 | consumed samples: 16878080 | consumed tokens: 34566307840 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.526508E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.714 | TFLOPs: 11.57 | 7: iteration 65940/ 173500 | consumed samples: 16880640 | consumed tokens: 34571550720 | elapsed time per iteration (s): 0.08 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 4.528512E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.553 | TFLOPs: 11.83 | 7: iteration 65950/ 173500 | consumed samples: 16883200 | consumed tokens: 34576793600 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.539840E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.221 | TFLOPs: 11.59 | 7: iteration 65960/ 173500 | consumed samples: 16885760 | consumed tokens: 34582036480 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.547760E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.476 | TFLOPs: 11.84 | 7: iteration 65970/ 173500 | consumed samples: 16888320 | consumed tokens: 34587279360 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.547478E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.118 | TFLOPs: 11.50 | 7: iteration 65980/ 173500 | consumed samples: 16890880 | consumed tokens: 34592522240 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.532584E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.594 | TFLOPs: 11.75 | 7: iteration 65990/ 173500 | consumed samples: 16893440 | consumed tokens: 34597765120 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.531755E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.571 | TFLOPs: 11.75 | 0: [2023-03-17 01:51:57,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=0, lr=[0.00014466507355770288, 0.00014466507355770288, 0.00014466507355770288], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 66000/ 173500 | consumed samples: 16896000 | consumed tokens: 34603008000 | elapsed time per iteration (s): 0.08 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 4.528037E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.195 | TFLOPs: 11.79 | 0: steps: 66000 loss: 4.4393 iter time (s): 0.083 samples/sec: 3101.815 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 66000 | lm loss value: 4.451278E+00 | lm loss PPL: 8.573642E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 66000 to checkpoints_14m91b100m 0: [2023-03-17 01:51:57,715] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step66000 is begin to save! 0: [2023-03-17 01:51:57,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:51:57,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:51:57,744] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:51:57,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:51:57,747] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:51:57,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:51:57,750] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:51:57,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:51:57,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:51:57,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:51:57,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:51:57,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:51:57,757] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step66000/mp_rank_00_model_states.pt 0: [2023-03-17 01:51:57,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:51:57,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:51:57,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:51:57,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:51:57,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2023-03-17 01:51:57,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:51:57,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2023-03-17 01:51:57,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: successfully saved checkpoint at iteration 66000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.37 7: iteration 66010/ 173500 | consumed samples: 16898560 | consumed tokens: 34608250880 | elapsed time per iteration (s): 0.09 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.534412E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.724 | TFLOPs: 10.09 | 7: iteration 66020/ 173500 | consumed samples: 16901120 | consumed tokens: 34613493760 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.528598E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.428 | TFLOPs: 11.80 | 7: iteration 66030/ 173500 | consumed samples: 16903680 | consumed tokens: 34618736640 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.531030E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.954 | TFLOPs: 11.81 | 7: iteration 66040/ 173500 | consumed samples: 16906240 | consumed tokens: 34623979520 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.548125E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.907 | TFLOPs: 11.81 | 7: iteration 66050/ 173500 | consumed samples: 16908800 | consumed tokens: 34629222400 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.533067E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.108 | TFLOPs: 11.71 | 7: iteration 66060/ 173500 | consumed samples: 16911360 | consumed tokens: 34634465280 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.551895E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.178 | TFLOPs: 11.75 | 7: iteration 66070/ 173500 | consumed samples: 16913920 | consumed tokens: 34639708160 | elapsed time per iteration (s): 0.08 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 4.542680E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.444 | TFLOPs: 11.81 | 7: iteration 66080/ 173500 | consumed samples: 16916480 | consumed tokens: 34644951040 | elapsed time per iteration (s): 0.08 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.552010E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.767 | TFLOPs: 11.81 | 7: iteration 66090/ 173500 | consumed samples: 16919040 | consumed tokens: 34650193920 | elapsed time per iteration (s): 0.08 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.546920E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.182 | TFLOPs: 11.80 | 7: iteration 66100/ 173500 | consumed samples: 16921600 | consumed tokens: 34655436800 | elapsed time per iteration (s): 0.08 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.537128E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.094 | TFLOPs: 11.79 | 7: iteration 66110/ 173500 | consumed samples: 16924160 | consumed tokens: 34660679680 | elapsed time per iteration (s): 0.08 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.546534E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.076 | TFLOPs: 11.81 | 7: iteration 66120/ 173500 | consumed samples: 16926720 | consumed tokens: 34665922560 | elapsed time per iteration (s): 0.08 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.545046E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.121 | TFLOPs: 11.59 | 7: iteration 66130/ 173500 | consumed samples: 16929280 | consumed tokens: 34671165440 | elapsed time per iteration (s): 0.09 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.550219E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.272 | TFLOPs: 10.49 | 7: iteration 66140/ 173500 | consumed samples: 16931840 | consumed tokens: 34676408320 | elapsed time per iteration (s): 0.10 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 4.541476E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2515.669 | TFLOPs: 9.36 | 7: iteration 66150/ 173500 | consumed samples: 16934400 | consumed tokens: 34681651200 | elapsed time per iteration (s): 0.11 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.546096E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.015 | TFLOPs: 8.86 | 7: iteration 66160/ 173500 | consumed samples: 16936960 | consumed tokens: 34686894080 | elapsed time per iteration (s): 0.10 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.532405E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2565.319 | TFLOPs: 9.54 | 7: iteration 66170/ 173500 | consumed samples: 16939520 | consumed tokens: 34692136960 | elapsed time per iteration (s): 0.11 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.530721E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2280.036 | TFLOPs: 8.48 | 7: iteration 66180/ 173500 | consumed samples: 16942080 | consumed tokens: 34697379840 | elapsed time per iteration (s): 0.10 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.547540E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.437 | TFLOPs: 9.30 | 7: iteration 66190/ 173500 | consumed samples: 16944640 | consumed tokens: 34702622720 | elapsed time per iteration (s): 0.10 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.547908E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2504.968 | TFLOPs: 9.32 | 7: iteration 66200/ 173500 | consumed samples: 16947200 | consumed tokens: 34707865600 | elapsed time per iteration (s): 0.10 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 4.551421E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2635.674 | TFLOPs: 9.80 | 7: iteration 66210/ 173500 | consumed samples: 16949760 | consumed tokens: 34713108480 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.544560E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2472.373 | TFLOPs: 9.20 | 7: iteration 66220/ 173500 | consumed samples: 16952320 | consumed tokens: 34718351360 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.543262E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.685 | TFLOPs: 9.15 | 7: iteration 66230/ 173500 | consumed samples: 16954880 | consumed tokens: 34723594240 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.548362E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2491.169 | TFLOPs: 9.27 | 7: iteration 66240/ 173500 | consumed samples: 16957440 | consumed tokens: 34728837120 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.546414E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.906 | TFLOPs: 9.29 | 7: iteration 66250/ 173500 | consumed samples: 16960000 | consumed tokens: 34734080000 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.535975E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.532 | TFLOPs: 9.24 | 7: iteration 66260/ 173500 | consumed samples: 16962560 | consumed tokens: 34739322880 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.539524E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.976 | TFLOPs: 9.47 | 7: iteration 66270/ 173500 | consumed samples: 16965120 | consumed tokens: 34744565760 | elapsed time per iteration (s): 0.10 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 4.533894E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2556.225 | TFLOPs: 9.51 | 7: iteration 66280/ 173500 | consumed samples: 16967680 | consumed tokens: 34749808640 | elapsed time per iteration (s): 0.10 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.548729E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2578.504 | TFLOPs: 9.59 | 7: iteration 66290/ 173500 | consumed samples: 16970240 | consumed tokens: 34755051520 | elapsed time per iteration (s): 0.10 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.541055E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.334 | TFLOPs: 9.24 | 7: iteration 66300/ 173500 | consumed samples: 16972800 | consumed tokens: 34760294400 | elapsed time per iteration (s): 0.10 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.544892E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2573.788 | TFLOPs: 9.57 | 7: iteration 66310/ 173500 | consumed samples: 16975360 | consumed tokens: 34765537280 | elapsed time per iteration (s): 0.10 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.526708E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.006 | TFLOPs: 9.15 | 7: iteration 66320/ 173500 | consumed samples: 16977920 | consumed tokens: 34770780160 | elapsed time per iteration (s): 0.10 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.546326E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2531.333 | TFLOPs: 9.42 | 7: iteration 66330/ 173500 | consumed samples: 16980480 | consumed tokens: 34776023040 | elapsed time per iteration (s): 0.11 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 4.539271E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.180 | TFLOPs: 8.95 | 7: iteration 66340/ 173500 | consumed samples: 16983040 | consumed tokens: 34781265920 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.544161E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.730 | TFLOPs: 9.29 | 7: iteration 66350/ 173500 | consumed samples: 16985600 | consumed tokens: 34786508800 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.543714E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2456.274 | TFLOPs: 9.14 | 7: iteration 66360/ 173500 | consumed samples: 16988160 | consumed tokens: 34791751680 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.533767E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.571 | TFLOPs: 9.15 | 7: iteration 66370/ 173500 | consumed samples: 16990720 | consumed tokens: 34796994560 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.537985E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.665 | TFLOPs: 9.16 | 7: iteration 66380/ 173500 | consumed samples: 16993280 | consumed tokens: 34802237440 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.542771E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2467.824 | TFLOPs: 9.18 | 7: iteration 66390/ 173500 | consumed samples: 16995840 | consumed tokens: 34807480320 | elapsed time per iteration (s): 0.11 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.528304E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.425 | TFLOPs: 9.02 | 7: iteration 66400/ 173500 | consumed samples: 16998400 | consumed tokens: 34812723200 | elapsed time per iteration (s): 0.10 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.543913E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2472.716 | TFLOPs: 9.20 | 7: iteration 66410/ 173500 | consumed samples: 17000960 | consumed tokens: 34817966080 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.542192E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2512.386 | TFLOPs: 9.34 | 7: iteration 66420/ 173500 | consumed samples: 17003520 | consumed tokens: 34823208960 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.552328E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.573 | TFLOPs: 9.59 | 7: iteration 66430/ 173500 | consumed samples: 17006080 | consumed tokens: 34828451840 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.529237E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.491 | TFLOPs: 9.29 | 7: iteration 66440/ 173500 | consumed samples: 17008640 | consumed tokens: 34833694720 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.545773E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2503.758 | TFLOPs: 9.31 | 7: iteration 66450/ 173500 | consumed samples: 17011200 | consumed tokens: 34838937600 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.555353E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2561.870 | TFLOPs: 9.53 | 7: iteration 66460/ 173500 | consumed samples: 17013760 | consumed tokens: 34844180480 | elapsed time per iteration (s): 0.10 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 4.527869E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2524.421 | TFLOPs: 9.39 | 7: iteration 66470/ 173500 | consumed samples: 17016320 | consumed tokens: 34849423360 | elapsed time per iteration (s): 0.11 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.536362E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2400.344 | TFLOPs: 8.93 | 7: iteration 66480/ 173500 | consumed samples: 17018880 | consumed tokens: 34854666240 | elapsed time per iteration (s): 0.10 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.545172E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2477.477 | TFLOPs: 9.22 | 7: iteration 66490/ 173500 | consumed samples: 17021440 | consumed tokens: 34859909120 | elapsed time per iteration (s): 0.10 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.522539E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.015 | TFLOPs: 9.12 | 7: iteration 66500/ 173500 | consumed samples: 17024000 | consumed tokens: 34865152000 | elapsed time per iteration (s): 0.11 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.538367E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.592 | TFLOPs: 9.05 | 7: iteration 66510/ 173500 | consumed samples: 17026560 | consumed tokens: 34870394880 | elapsed time per iteration (s): 0.10 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.539469E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.561 | TFLOPs: 9.52 | 7: iteration 66520/ 173500 | consumed samples: 17029120 | consumed tokens: 34875637760 | elapsed time per iteration (s): 0.09 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.534900E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.472 | TFLOPs: 10.22 | 7: iteration 66530/ 173500 | consumed samples: 17031680 | consumed tokens: 34880880640 | elapsed time per iteration (s): 0.08 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 4.523704E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.578 | TFLOPs: 12.00 | 7: iteration 66540/ 173500 | consumed samples: 17034240 | consumed tokens: 34886123520 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.541823E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.167 | TFLOPs: 11.87 | 7: iteration 66550/ 173500 | consumed samples: 17036800 | consumed tokens: 34891366400 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.540324E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.259 | TFLOPs: 11.88 | 7: iteration 66560/ 173500 | consumed samples: 17039360 | consumed tokens: 34896609280 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.549808E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.364 | TFLOPs: 11.65 | 7: iteration 66570/ 173500 | consumed samples: 17041920 | consumed tokens: 34901852160 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.539585E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.118 | TFLOPs: 11.95 | 7: iteration 66580/ 173500 | consumed samples: 17044480 | consumed tokens: 34907095040 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.544581E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.762 | TFLOPs: 11.97 | 7: iteration 66590/ 173500 | consumed samples: 17047040 | consumed tokens: 34912337920 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.532435E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.051 | TFLOPs: 11.95 | 7: iteration 66600/ 173500 | consumed samples: 17049600 | consumed tokens: 34917580800 | elapsed time per iteration (s): 0.08 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 4.540246E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.336 | TFLOPs: 11.91 | 7: iteration 66610/ 173500 | consumed samples: 17052160 | consumed tokens: 34922823680 | elapsed time per iteration (s): 0.08 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.554351E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.365 | TFLOPs: 11.92 | 7: iteration 66620/ 173500 | consumed samples: 17054720 | consumed tokens: 34928066560 | elapsed time per iteration (s): 0.08 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.544484E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.312 | TFLOPs: 11.57 | 7: iteration 66630/ 173500 | consumed samples: 17057280 | consumed tokens: 34933309440 | elapsed time per iteration (s): 0.08 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.553711E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.025 | TFLOPs: 11.34 | 7: iteration 66640/ 173500 | consumed samples: 17059840 | consumed tokens: 34938552320 | elapsed time per iteration (s): 0.08 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.530376E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.380 | TFLOPs: 11.39 | 7: iteration 66650/ 173500 | consumed samples: 17062400 | consumed tokens: 34943795200 | elapsed time per iteration (s): 0.09 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.532648E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.916 | TFLOPs: 11.04 | 7: iteration 66660/ 173500 | consumed samples: 17064960 | consumed tokens: 34949038080 | elapsed time per iteration (s): 0.08 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 4.536634E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.963 | TFLOPs: 11.57 | 7: iteration 66670/ 173500 | consumed samples: 17067520 | consumed tokens: 34954280960 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.538158E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3042.611 | TFLOPs: 11.32 | 7: iteration 66680/ 173500 | consumed samples: 17070080 | consumed tokens: 34959523840 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.535883E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.407 | TFLOPs: 11.88 | 7: iteration 66690/ 173500 | consumed samples: 17072640 | consumed tokens: 34964766720 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.537263E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.936 | TFLOPs: 11.89 | 7: iteration 66700/ 173500 | consumed samples: 17075200 | consumed tokens: 34970009600 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.548062E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.617 | TFLOPs: 11.81 | 7: iteration 66710/ 173500 | consumed samples: 17077760 | consumed tokens: 34975252480 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.548598E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.663 | TFLOPs: 11.87 | 7: iteration 66720/ 173500 | consumed samples: 17080320 | consumed tokens: 34980495360 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.544547E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.034 | TFLOPs: 11.88 | 7: iteration 66730/ 173500 | consumed samples: 17082880 | consumed tokens: 34985738240 | elapsed time per iteration (s): 0.08 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 4.540525E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.434 | TFLOPs: 11.88 | 7: iteration 66740/ 173500 | consumed samples: 17085440 | consumed tokens: 34990981120 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.544132E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.140 | TFLOPs: 11.88 | 7: iteration 66750/ 173500 | consumed samples: 17088000 | consumed tokens: 34996224000 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.548549E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.944 | TFLOPs: 11.87 | 7: iteration 66760/ 173500 | consumed samples: 17090560 | consumed tokens: 35001466880 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.534445E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.151 | TFLOPs: 11.69 | 7: iteration 66770/ 173500 | consumed samples: 17093120 | consumed tokens: 35006709760 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.548038E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.400 | TFLOPs: 11.88 | 7: iteration 66780/ 173500 | consumed samples: 17095680 | consumed tokens: 35011952640 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.537302E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.993 | TFLOPs: 11.88 | 7: iteration 66790/ 173500 | consumed samples: 17098240 | consumed tokens: 35017195520 | elapsed time per iteration (s): 0.08 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 4.541608E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.696 | TFLOPs: 11.88 | 7: iteration 66800/ 173500 | consumed samples: 17100800 | consumed tokens: 35022438400 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.543180E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.124 | TFLOPs: 11.90 | 7: iteration 66810/ 173500 | consumed samples: 17103360 | consumed tokens: 35027681280 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.550228E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.017 | TFLOPs: 11.85 | 7: iteration 66820/ 173500 | consumed samples: 17105920 | consumed tokens: 35032924160 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.542915E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.919 | TFLOPs: 11.89 | 7: iteration 66830/ 173500 | consumed samples: 17108480 | consumed tokens: 35038167040 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.545293E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.370 | TFLOPs: 11.83 | 7: iteration 66840/ 173500 | consumed samples: 17111040 | consumed tokens: 35043409920 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.543746E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.907 | TFLOPs: 11.91 | 7: iteration 66850/ 173500 | consumed samples: 17113600 | consumed tokens: 35048652800 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.540118E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.108 | TFLOPs: 11.87 | 7: iteration 66860/ 173500 | consumed samples: 17116160 | consumed tokens: 35053895680 | elapsed time per iteration (s): 0.08 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 4.525902E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.975 | TFLOPs: 11.85 | 7: iteration 66870/ 173500 | consumed samples: 17118720 | consumed tokens: 35059138560 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.541904E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.839 | TFLOPs: 11.93 | 7: iteration 66880/ 173500 | consumed samples: 17121280 | consumed tokens: 35064381440 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.531193E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.477 | TFLOPs: 11.93 | 7: iteration 66890/ 173500 | consumed samples: 17123840 | consumed tokens: 35069624320 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.538979E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.398 | TFLOPs: 11.94 | 7: iteration 66900/ 173500 | consumed samples: 17126400 | consumed tokens: 35074867200 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.541203E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.817 | TFLOPs: 11.92 | 7: iteration 66910/ 173500 | consumed samples: 17128960 | consumed tokens: 35080110080 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.549065E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.310 | TFLOPs: 11.67 | 7: iteration 66920/ 173500 | consumed samples: 17131520 | consumed tokens: 35085352960 | elapsed time per iteration (s): 0.08 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 4.537883E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.030 | TFLOPs: 11.77 | 7: iteration 66930/ 173500 | consumed samples: 17134080 | consumed tokens: 35090595840 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.539223E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.526 | TFLOPs: 11.92 | 7: iteration 66940/ 173500 | consumed samples: 17136640 | consumed tokens: 35095838720 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.537846E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.563 | TFLOPs: 11.96 | 7: iteration 66950/ 173500 | consumed samples: 17139200 | consumed tokens: 35101081600 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.540977E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.629 | TFLOPs: 11.93 | 7: iteration 66960/ 173500 | consumed samples: 17141760 | consumed tokens: 35106324480 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.538818E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.829 | TFLOPs: 11.85 | 7: iteration 66970/ 173500 | consumed samples: 17144320 | consumed tokens: 35111567360 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.540612E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.704 | TFLOPs: 11.84 | 7: iteration 66980/ 173500 | consumed samples: 17146880 | consumed tokens: 35116810240 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.548100E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.585 | TFLOPs: 11.92 | 7: iteration 66990/ 173500 | consumed samples: 17149440 | consumed tokens: 35122053120 | elapsed time per iteration (s): 0.08 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 4.543832E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.422 | TFLOPs: 11.90 | 7: iteration 67000/ 173500 | consumed samples: 17152000 | consumed tokens: 35127296000 | elapsed time per iteration (s): 0.08 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.541651E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.999 | TFLOPs: 11.96 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 67000 | lm loss value: 4.408345E+00 | lm loss PPL: 8.213340E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 67000 to checkpoints_14m91b100m 0: [2023-03-17 01:53:27,144] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step67000 is begin to save! 0: [2023-03-17 01:53:27,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:53:27,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:53:27,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:53:27,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:53:27,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:53:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:53:27,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:53:27,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:53:27,183] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:53:27,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:53:27,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:53:27,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:53:27,187] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step67000/mp_rank_00_model_states.pt 0: [2023-03-17 01:53:27,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:53:27,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:53:27,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:53:27,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:53:27,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2023-03-17 01:53:27,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:53:27,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2023-03-17 01:53:27,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:53:27,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2023-03-17 01:53:27,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:53:27,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:53:27,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2023-03-17 01:53:27,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:53:27,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:53:27,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: successfully saved checkpoint at iteration 67000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.67 7: iteration 67010/ 173500 | consumed samples: 17154560 | consumed tokens: 35132538880 | elapsed time per iteration (s): 0.09 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.541956E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.638 | TFLOPs: 10.42 | 7: iteration 67020/ 173500 | consumed samples: 17157120 | consumed tokens: 35137781760 | elapsed time per iteration (s): 0.08 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.542175E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.474 | TFLOPs: 11.87 | 7: iteration 67030/ 173500 | consumed samples: 17159680 | consumed tokens: 35143024640 | elapsed time per iteration (s): 0.08 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.540813E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.740 | TFLOPs: 11.92 | 7: iteration 67040/ 173500 | consumed samples: 17162240 | consumed tokens: 35148267520 | elapsed time per iteration (s): 0.08 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.546323E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.040 | TFLOPs: 11.87 | 7: iteration 67050/ 173500 | consumed samples: 17164800 | consumed tokens: 35153510400 | elapsed time per iteration (s): 0.08 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 4.532039E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.716 | TFLOPs: 11.87 | 7: iteration 67060/ 173500 | consumed samples: 17167360 | consumed tokens: 35158753280 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.553253E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.338 | TFLOPs: 11.90 | 7: iteration 67070/ 173500 | consumed samples: 17169920 | consumed tokens: 35163996160 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.560454E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.748 | TFLOPs: 11.63 | 7: iteration 67080/ 173500 | consumed samples: 17172480 | consumed tokens: 35169239040 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.530560E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.471 | TFLOPs: 11.59 | 7: iteration 67090/ 173500 | consumed samples: 17175040 | consumed tokens: 35174481920 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.526372E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.238 | TFLOPs: 11.85 | 7: iteration 67100/ 173500 | consumed samples: 17177600 | consumed tokens: 35179724800 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.531855E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.589 | TFLOPs: 11.78 | 7: iteration 67110/ 173500 | consumed samples: 17180160 | consumed tokens: 35184967680 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.546261E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.644 | TFLOPs: 11.81 | 7: iteration 67120/ 173500 | consumed samples: 17182720 | consumed tokens: 35190210560 | elapsed time per iteration (s): 0.08 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 4.532063E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.288 | TFLOPs: 11.84 | 7: iteration 67130/ 173500 | consumed samples: 17185280 | consumed tokens: 35195453440 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.525739E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.545 | TFLOPs: 11.86 | 7: iteration 67140/ 173500 | consumed samples: 17187840 | consumed tokens: 35200696320 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.537862E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.831 | TFLOPs: 11.79 | 7: iteration 67150/ 173500 | consumed samples: 17190400 | consumed tokens: 35205939200 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.543194E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.894 | TFLOPs: 11.84 | 7: iteration 67160/ 173500 | consumed samples: 17192960 | consumed tokens: 35211182080 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.539164E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.245 | TFLOPs: 11.85 | 7: iteration 67170/ 173500 | consumed samples: 17195520 | consumed tokens: 35216424960 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.539048E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.197 | TFLOPs: 11.71 | 7: iteration 67180/ 173500 | consumed samples: 17198080 | consumed tokens: 35221667840 | elapsed time per iteration (s): 0.08 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 4.552341E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.408 | TFLOPs: 11.86 | 7: iteration 67190/ 173500 | consumed samples: 17200640 | consumed tokens: 35226910720 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.539120E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.370 | TFLOPs: 11.86 | 7: iteration 67200/ 173500 | consumed samples: 17203200 | consumed tokens: 35232153600 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.535591E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.155 | TFLOPs: 11.80 | 7: iteration 67210/ 173500 | consumed samples: 17205760 | consumed tokens: 35237396480 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.526083E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.438 | TFLOPs: 11.85 | 7: iteration 67220/ 173500 | consumed samples: 17208320 | consumed tokens: 35242639360 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.539710E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.956 | TFLOPs: 11.84 | 7: iteration 67230/ 173500 | consumed samples: 17210880 | consumed tokens: 35247882240 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.539129E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.894 | TFLOPs: 11.86 | 7: iteration 67240/ 173500 | consumed samples: 17213440 | consumed tokens: 35253125120 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.554205E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.794 | TFLOPs: 11.82 | 7: iteration 67250/ 173500 | consumed samples: 17216000 | consumed tokens: 35258368000 | elapsed time per iteration (s): 0.08 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 4.542063E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.579 | TFLOPs: 11.86 | 7: iteration 67260/ 173500 | consumed samples: 17218560 | consumed tokens: 35263610880 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.546362E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.126 | TFLOPs: 11.89 | 7: iteration 67270/ 173500 | consumed samples: 17221120 | consumed tokens: 35268853760 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.534167E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.318 | TFLOPs: 11.85 | 7: iteration 67280/ 173500 | consumed samples: 17223680 | consumed tokens: 35274096640 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.537535E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.324 | TFLOPs: 11.81 | 7: iteration 67290/ 173500 | consumed samples: 17226240 | consumed tokens: 35279339520 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.537247E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.725 | TFLOPs: 11.80 | 7: iteration 67300/ 173500 | consumed samples: 17228800 | consumed tokens: 35284582400 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.537873E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.014 | TFLOPs: 11.91 | 7: iteration 67310/ 173500 | consumed samples: 17231360 | consumed tokens: 35289825280 | elapsed time per iteration (s): 0.08 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 4.543942E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.078 | TFLOPs: 11.88 | 7: iteration 67320/ 173500 | consumed samples: 17233920 | consumed tokens: 35295068160 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.542697E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.746 | TFLOPs: 11.85 | 7: iteration 67330/ 173500 | consumed samples: 17236480 | consumed tokens: 35300311040 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.529328E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.216 | TFLOPs: 11.87 | 7: iteration 67340/ 173500 | consumed samples: 17239040 | consumed tokens: 35305553920 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.533096E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.416 | TFLOPs: 11.83 | 7: iteration 67350/ 173500 | consumed samples: 17241600 | consumed tokens: 35310796800 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.544600E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.832 | TFLOPs: 11.85 | 7: iteration 67360/ 173500 | consumed samples: 17244160 | consumed tokens: 35316039680 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.523369E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.778 | TFLOPs: 11.82 | 7: iteration 67370/ 173500 | consumed samples: 17246720 | consumed tokens: 35321282560 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.544433E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.418 | TFLOPs: 11.86 | 7: iteration 67380/ 173500 | consumed samples: 17249280 | consumed tokens: 35326525440 | elapsed time per iteration (s): 0.08 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 4.526965E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.005 | TFLOPs: 11.88 | 7: iteration 67390/ 173500 | consumed samples: 17251840 | consumed tokens: 35331768320 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.531185E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.744 | TFLOPs: 11.78 | 7: iteration 67400/ 173500 | consumed samples: 17254400 | consumed tokens: 35337011200 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.538665E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.411 | TFLOPs: 11.85 | 7: iteration 67410/ 173500 | consumed samples: 17256960 | consumed tokens: 35342254080 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.542461E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.925 | TFLOPs: 11.89 | 7: iteration 67420/ 173500 | consumed samples: 17259520 | consumed tokens: 35347496960 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.541098E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.365 | TFLOPs: 11.89 | 7: iteration 67430/ 173500 | consumed samples: 17262080 | consumed tokens: 35352739840 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.529992E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.053 | TFLOPs: 11.86 | 7: iteration 67440/ 173500 | consumed samples: 17264640 | consumed tokens: 35357982720 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.536153E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.941 | TFLOPs: 11.94 | 7: iteration 67450/ 173500 | consumed samples: 17267200 | consumed tokens: 35363225600 | elapsed time per iteration (s): 0.08 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 4.542797E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.947 | TFLOPs: 11.72 | 7: iteration 67460/ 173500 | consumed samples: 17269760 | consumed tokens: 35368468480 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.538461E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.646 | TFLOPs: 11.93 | 7: iteration 67470/ 173500 | consumed samples: 17272320 | consumed tokens: 35373711360 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.542717E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.968 | TFLOPs: 11.92 | 7: iteration 67480/ 173500 | consumed samples: 17274880 | consumed tokens: 35378954240 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.549569E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.733 | TFLOPs: 11.92 | 7: iteration 67490/ 173500 | consumed samples: 17277440 | consumed tokens: 35384197120 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.540070E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.696 | TFLOPs: 11.30 | 7: iteration 67500/ 173500 | consumed samples: 17280000 | consumed tokens: 35389440000 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.533186E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.863 | TFLOPs: 11.87 | 7: iteration 67510/ 173500 | consumed samples: 17282560 | consumed tokens: 35394682880 | elapsed time per iteration (s): 0.08 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.528546E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.781 | TFLOPs: 11.86 | 7: iteration 67520/ 173500 | consumed samples: 17285120 | consumed tokens: 35399925760 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.534993E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.256 | TFLOPs: 11.80 | 7: iteration 67530/ 173500 | consumed samples: 17287680 | consumed tokens: 35405168640 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.530639E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.043 | TFLOPs: 11.87 | 7: iteration 67540/ 173500 | consumed samples: 17290240 | consumed tokens: 35410411520 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.532862E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.592 | TFLOPs: 11.58 | 7: iteration 67550/ 173500 | consumed samples: 17292800 | consumed tokens: 35415654400 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.534148E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.773 | TFLOPs: 11.92 | 7: iteration 67560/ 173500 | consumed samples: 17295360 | consumed tokens: 35420897280 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.545190E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.090 | TFLOPs: 11.93 | 7: iteration 67570/ 173500 | consumed samples: 17297920 | consumed tokens: 35426140160 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.531331E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.342 | TFLOPs: 11.92 | 7: iteration 67580/ 173500 | consumed samples: 17300480 | consumed tokens: 35431383040 | elapsed time per iteration (s): 0.08 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 4.544926E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.270 | TFLOPs: 11.68 | 7: iteration 67590/ 173500 | consumed samples: 17303040 | consumed tokens: 35436625920 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.529541E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.159 | TFLOPs: 11.90 | 7: iteration 67600/ 173500 | consumed samples: 17305600 | consumed tokens: 35441868800 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.535206E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.528 | TFLOPs: 11.79 | 7: iteration 67610/ 173500 | consumed samples: 17308160 | consumed tokens: 35447111680 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.550441E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.218 | TFLOPs: 11.90 | 7: iteration 67620/ 173500 | consumed samples: 17310720 | consumed tokens: 35452354560 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.544906E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.124 | TFLOPs: 11.25 | 7: iteration 67630/ 173500 | consumed samples: 17313280 | consumed tokens: 35457597440 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.547161E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.348 | TFLOPs: 11.87 | 7: iteration 67640/ 173500 | consumed samples: 17315840 | consumed tokens: 35462840320 | elapsed time per iteration (s): 0.08 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 4.557559E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.026 | TFLOPs: 11.87 | 7: iteration 67650/ 173500 | consumed samples: 17318400 | consumed tokens: 35468083200 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.532156E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.439 | TFLOPs: 11.85 | 7: iteration 67660/ 173500 | consumed samples: 17320960 | consumed tokens: 35473326080 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.549126E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.885 | TFLOPs: 11.84 | 7: iteration 67670/ 173500 | consumed samples: 17323520 | consumed tokens: 35478568960 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.534148E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.593 | TFLOPs: 11.90 | 7: iteration 67680/ 173500 | consumed samples: 17326080 | consumed tokens: 35483811840 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.545730E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.804 | TFLOPs: 11.90 | 7: iteration 67690/ 173500 | consumed samples: 17328640 | consumed tokens: 35489054720 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.539280E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.665 | TFLOPs: 11.91 | 7: iteration 67700/ 173500 | consumed samples: 17331200 | consumed tokens: 35494297600 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.528769E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.227 | TFLOPs: 11.91 | 7: iteration 67710/ 173500 | consumed samples: 17333760 | consumed tokens: 35499540480 | elapsed time per iteration (s): 0.08 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 4.534561E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.652 | TFLOPs: 11.92 | 7: iteration 67720/ 173500 | consumed samples: 17336320 | consumed tokens: 35504783360 | elapsed time per iteration (s): 0.08 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.542987E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.817 | TFLOPs: 11.90 | 7: iteration 67730/ 173500 | consumed samples: 17338880 | consumed tokens: 35510026240 | elapsed time per iteration (s): 0.08 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.545926E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.380 | TFLOPs: 11.89 | 7: iteration 67740/ 173500 | consumed samples: 17341440 | consumed tokens: 35515269120 | elapsed time per iteration (s): 0.08 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.539922E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.947 | TFLOPs: 11.84 | 7: iteration 67750/ 173500 | consumed samples: 17344000 | consumed tokens: 35520512000 | elapsed time per iteration (s): 0.08 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.537084E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.085 | TFLOPs: 11.88 | 7: iteration 67760/ 173500 | consumed samples: 17346560 | consumed tokens: 35525754880 | elapsed time per iteration (s): 0.08 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.530444E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.033 | TFLOPs: 11.89 | 7: iteration 67770/ 173500 | consumed samples: 17349120 | consumed tokens: 35530997760 | elapsed time per iteration (s): 0.11 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 4.546795E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2290.464 | TFLOPs: 8.52 | 7: iteration 67780/ 173500 | consumed samples: 17351680 | consumed tokens: 35536240640 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.532840E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.664 | TFLOPs: 11.85 | 7: iteration 67790/ 173500 | consumed samples: 17354240 | consumed tokens: 35541483520 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.545628E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.528 | TFLOPs: 11.91 | 7: iteration 67800/ 173500 | consumed samples: 17356800 | consumed tokens: 35546726400 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.528878E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.912 | TFLOPs: 11.92 | 7: iteration 67810/ 173500 | consumed samples: 17359360 | consumed tokens: 35551969280 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.527979E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.282 | TFLOPs: 11.90 | 7: iteration 67820/ 173500 | consumed samples: 17361920 | consumed tokens: 35557212160 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.532887E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.055 | TFLOPs: 11.88 | 7: iteration 67830/ 173500 | consumed samples: 17364480 | consumed tokens: 35562455040 | elapsed time per iteration (s): 0.09 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.549058E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.759 | TFLOPs: 11.20 | 7: iteration 67840/ 173500 | consumed samples: 17367040 | consumed tokens: 35567697920 | elapsed time per iteration (s): 0.08 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 4.521936E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.727 | TFLOPs: 11.86 | 7: iteration 67850/ 173500 | consumed samples: 17369600 | consumed tokens: 35572940800 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.538643E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.027 | TFLOPs: 11.85 | 7: iteration 67860/ 173500 | consumed samples: 17372160 | consumed tokens: 35578183680 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.533678E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.788 | TFLOPs: 11.89 | 7: iteration 67870/ 173500 | consumed samples: 17374720 | consumed tokens: 35583426560 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.533994E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.375 | TFLOPs: 11.90 | 7: iteration 67880/ 173500 | consumed samples: 17377280 | consumed tokens: 35588669440 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.551121E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.369 | TFLOPs: 11.87 | 7: iteration 67890/ 173500 | consumed samples: 17379840 | consumed tokens: 35593912320 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.548900E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.486 | TFLOPs: 11.84 | 7: iteration 67900/ 173500 | consumed samples: 17382400 | consumed tokens: 35599155200 | elapsed time per iteration (s): 0.08 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 4.535540E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.646 | TFLOPs: 11.86 | 7: iteration 67910/ 173500 | consumed samples: 17384960 | consumed tokens: 35604398080 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.543289E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.006 | TFLOPs: 11.84 | 7: iteration 67920/ 173500 | consumed samples: 17387520 | consumed tokens: 35609640960 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.531487E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.042 | TFLOPs: 11.84 | 7: iteration 67930/ 173500 | consumed samples: 17390080 | consumed tokens: 35614883840 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.531143E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.786 | TFLOPs: 11.87 | 7: iteration 67940/ 173500 | consumed samples: 17392640 | consumed tokens: 35620126720 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.531176E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.765 | TFLOPs: 11.89 | 7: iteration 67950/ 173500 | consumed samples: 17395200 | consumed tokens: 35625369600 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.549700E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.092 | TFLOPs: 11.85 | 7: iteration 67960/ 173500 | consumed samples: 17397760 | consumed tokens: 35630612480 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.529718E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.206 | TFLOPs: 11.88 | 7: iteration 67970/ 173500 | consumed samples: 17400320 | consumed tokens: 35635855360 | elapsed time per iteration (s): 0.08 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 4.537907E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.155 | TFLOPs: 11.81 | 7: iteration 67980/ 173500 | consumed samples: 17402880 | consumed tokens: 35641098240 | elapsed time per iteration (s): 0.08 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.543888E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.589 | TFLOPs: 11.88 | 7: iteration 67990/ 173500 | consumed samples: 17405440 | consumed tokens: 35646341120 | elapsed time per iteration (s): 0.08 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.514547E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.825 | TFLOPs: 11.81 | 0: [2023-03-17 01:54:48,006] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=0, lr=[0.00014160436454810027, 0.00014160436454810027, 0.00014160436454810027], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 68000/ 173500 | consumed samples: 17408000 | consumed tokens: 35651584000 | elapsed time per iteration (s): 0.08 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.544902E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.072 | TFLOPs: 11.86 | 0: steps: 68000 loss: 4.5470 iter time (s): 0.084 samples/sec: 3032.258 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 68000 | lm loss value: 4.435745E+00 | lm loss PPL: 8.441501E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 68000 to checkpoints_14m91b100m 0: [2023-03-17 01:54:48,065] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step68000 is begin to save! 0: [2023-03-17 01:54:48,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:54:48,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:54:48,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:54:48,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:54:48,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:54:48,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:54:48,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:54:48,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:54:48,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:54:48,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:54:48,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:54:48,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:54:48,108] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step68000/mp_rank_00_model_states.pt 0: [2023-03-17 01:54:48,108] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:54:48,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:54:48,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:54:48,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2023-03-17 01:54:48,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:54:48,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2023-03-17 01:54:48,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:54:48,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:54:48,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: successfully saved checkpoint at iteration 68000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.11 7: iteration 68010/ 173500 | consumed samples: 17410560 | consumed tokens: 35656826880 | elapsed time per iteration (s): 0.09 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.531584E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2733.599 | TFLOPs: 10.17 | 7: iteration 68020/ 173500 | consumed samples: 17413120 | consumed tokens: 35662069760 | elapsed time per iteration (s): 0.08 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.551509E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.194 | TFLOPs: 11.87 | 7: iteration 68030/ 173500 | consumed samples: 17415680 | consumed tokens: 35667312640 | elapsed time per iteration (s): 0.08 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 4.550521E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.734 | TFLOPs: 11.88 | 7: iteration 68040/ 173500 | consumed samples: 17418240 | consumed tokens: 35672555520 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.530558E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.087 | TFLOPs: 11.91 | 7: iteration 68050/ 173500 | consumed samples: 17420800 | consumed tokens: 35677798400 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.541658E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.693 | TFLOPs: 11.89 | 7: iteration 68060/ 173500 | consumed samples: 17423360 | consumed tokens: 35683041280 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.539530E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.710 | TFLOPs: 11.84 | 7: iteration 68070/ 173500 | consumed samples: 17425920 | consumed tokens: 35688284160 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.540255E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.060 | TFLOPs: 11.86 | 7: iteration 68080/ 173500 | consumed samples: 17428480 | consumed tokens: 35693527040 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.533137E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.188 | TFLOPs: 11.91 | 7: iteration 68090/ 173500 | consumed samples: 17431040 | consumed tokens: 35698769920 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.537432E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.434 | TFLOPs: 11.89 | 7: iteration 68100/ 173500 | consumed samples: 17433600 | consumed tokens: 35704012800 | elapsed time per iteration (s): 0.08 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 4.540814E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.599 | TFLOPs: 11.88 | 7: iteration 68110/ 173500 | consumed samples: 17436160 | consumed tokens: 35709255680 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.540297E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.141 | TFLOPs: 11.86 | 7: iteration 68120/ 173500 | consumed samples: 17438720 | consumed tokens: 35714498560 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.530637E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.593 | TFLOPs: 11.89 | 7: iteration 68130/ 173500 | consumed samples: 17441280 | consumed tokens: 35719741440 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.551405E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.682 | TFLOPs: 11.83 | 7: iteration 68140/ 173500 | consumed samples: 17443840 | consumed tokens: 35724984320 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.536391E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.252 | TFLOPs: 11.88 | 7: iteration 68150/ 173500 | consumed samples: 17446400 | consumed tokens: 35730227200 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.539300E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.207 | TFLOPs: 11.88 | 7: iteration 68160/ 173500 | consumed samples: 17448960 | consumed tokens: 35735470080 | elapsed time per iteration (s): 0.08 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 4.537859E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.152 | TFLOPs: 11.27 | 7: iteration 68170/ 173500 | consumed samples: 17451520 | consumed tokens: 35740712960 | elapsed time per iteration (s): 0.08 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.530786E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.861 | TFLOPs: 11.86 | 7: iteration 68180/ 173500 | consumed samples: 17454080 | consumed tokens: 35745955840 | elapsed time per iteration (s): 0.08 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.541486E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.883 | TFLOPs: 11.90 | 7: iteration 68190/ 173500 | consumed samples: 17456640 | consumed tokens: 35751198720 | elapsed time per iteration (s): 0.08 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.540240E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.355 | TFLOPs: 11.89 | 7: iteration 68200/ 173500 | consumed samples: 17459200 | consumed tokens: 35756441600 | elapsed time per iteration (s): 0.08 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.536666E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.942 | TFLOPs: 11.88 | 7: iteration 68210/ 173500 | consumed samples: 17461760 | consumed tokens: 35761684480 | elapsed time per iteration (s): 0.12 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.540623E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.414 | TFLOPs: 7.99 | 7: iteration 68220/ 173500 | consumed samples: 17464320 | consumed tokens: 35766927360 | elapsed time per iteration (s): 0.08 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 4.550484E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.732 | TFLOPs: 11.80 | 7: iteration 68230/ 173500 | consumed samples: 17466880 | consumed tokens: 35772170240 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.538329E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.618 | TFLOPs: 11.84 | 7: iteration 68240/ 173500 | consumed samples: 17469440 | consumed tokens: 35777413120 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.547462E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.007 | TFLOPs: 11.87 | 7: iteration 68250/ 173500 | consumed samples: 17472000 | consumed tokens: 35782656000 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.540223E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.968 | TFLOPs: 11.89 | 7: iteration 68260/ 173500 | consumed samples: 17474560 | consumed tokens: 35787898880 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.540463E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.538 | TFLOPs: 11.89 | 7: iteration 68270/ 173500 | consumed samples: 17477120 | consumed tokens: 35793141760 | elapsed time per iteration (s): 0.09 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.548106E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2922.799 | TFLOPs: 10.87 | 7: iteration 68280/ 173500 | consumed samples: 17479680 | consumed tokens: 35798384640 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.541993E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.908 | TFLOPs: 11.90 | 7: iteration 68290/ 173500 | consumed samples: 17482240 | consumed tokens: 35803627520 | elapsed time per iteration (s): 0.08 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 4.547976E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.435 | TFLOPs: 11.86 | 7: iteration 68300/ 173500 | consumed samples: 17484800 | consumed tokens: 35808870400 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.528131E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.995 | TFLOPs: 11.89 | 7: iteration 68310/ 173500 | consumed samples: 17487360 | consumed tokens: 35814113280 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.533645E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.412 | TFLOPs: 11.87 | 7: iteration 68320/ 173500 | consumed samples: 17489920 | consumed tokens: 35819356160 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.535620E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.350 | TFLOPs: 11.88 | 7: iteration 68330/ 173500 | consumed samples: 17492480 | consumed tokens: 35824599040 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.545787E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.464 | TFLOPs: 11.82 | 7: iteration 68340/ 173500 | consumed samples: 17495040 | consumed tokens: 35829841920 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.542191E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.713 | TFLOPs: 11.85 | 7: iteration 68350/ 173500 | consumed samples: 17497600 | consumed tokens: 35835084800 | elapsed time per iteration (s): 0.08 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 4.538926E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.930 | TFLOPs: 11.90 | 7: iteration 68360/ 173500 | consumed samples: 17500160 | consumed tokens: 35840327680 | elapsed time per iteration (s): 0.12 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.555301E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.448 | TFLOPs: 8.21 | 7: iteration 68370/ 173500 | consumed samples: 17502720 | consumed tokens: 35845570560 | elapsed time per iteration (s): 0.13 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.534944E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1940.333 | TFLOPs: 7.22 | 7: iteration 68380/ 173500 | consumed samples: 17505280 | consumed tokens: 35850813440 | elapsed time per iteration (s): 0.13 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.539298E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.249 | TFLOPs: 7.30 | 7: iteration 68390/ 173500 | consumed samples: 17507840 | consumed tokens: 35856056320 | elapsed time per iteration (s): 0.13 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.540349E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.297 | TFLOPs: 7.24 | 7: iteration 68400/ 173500 | consumed samples: 17510400 | consumed tokens: 35861299200 | elapsed time per iteration (s): 0.12 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.542506E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2109.104 | TFLOPs: 7.84 | 7: iteration 68410/ 173500 | consumed samples: 17512960 | consumed tokens: 35866542080 | elapsed time per iteration (s): 0.08 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.524927E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.433 | TFLOPs: 11.99 | 7: iteration 68420/ 173500 | consumed samples: 17515520 | consumed tokens: 35871784960 | elapsed time per iteration (s): 0.08 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 4.540202E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.082 | TFLOPs: 12.06 | 7: iteration 68430/ 173500 | consumed samples: 17518080 | consumed tokens: 35877027840 | elapsed time per iteration (s): 0.08 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.543622E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.472 | TFLOPs: 12.02 | 7: iteration 68440/ 173500 | consumed samples: 17520640 | consumed tokens: 35882270720 | elapsed time per iteration (s): 0.08 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.537096E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.997 | TFLOPs: 12.06 | 7: iteration 68450/ 173500 | consumed samples: 17523200 | consumed tokens: 35887513600 | elapsed time per iteration (s): 0.09 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.540110E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.606 | TFLOPs: 10.99 | 7: iteration 68460/ 173500 | consumed samples: 17525760 | consumed tokens: 35892756480 | elapsed time per iteration (s): 0.10 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.541428E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2596.469 | TFLOPs: 9.66 | 7: iteration 68470/ 173500 | consumed samples: 17528320 | consumed tokens: 35897999360 | elapsed time per iteration (s): 0.10 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.548153E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2462.537 | TFLOPs: 9.16 | 7: iteration 68480/ 173500 | consumed samples: 17530880 | consumed tokens: 35903242240 | elapsed time per iteration (s): 0.08 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 4.542751E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.277 | TFLOPs: 11.38 | 7: iteration 68490/ 173500 | consumed samples: 17533440 | consumed tokens: 35908485120 | elapsed time per iteration (s): 0.10 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.546389E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2567.482 | TFLOPs: 9.55 | 7: iteration 68500/ 173500 | consumed samples: 17536000 | consumed tokens: 35913728000 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.545751E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.720 | TFLOPs: 11.96 | 7: iteration 68510/ 173500 | consumed samples: 17538560 | consumed tokens: 35918970880 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.544326E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.646 | TFLOPs: 12.01 | 7: iteration 68520/ 173500 | consumed samples: 17541120 | consumed tokens: 35924213760 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.541964E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.625 | TFLOPs: 11.51 | 7: iteration 68530/ 173500 | consumed samples: 17543680 | consumed tokens: 35929456640 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.550623E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.533 | TFLOPs: 11.80 | 7: iteration 68540/ 173500 | consumed samples: 17546240 | consumed tokens: 35934699520 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.527899E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.294 | TFLOPs: 12.03 | 7: iteration 68550/ 173500 | consumed samples: 17548800 | consumed tokens: 35939942400 | elapsed time per iteration (s): 0.08 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 4.531776E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.691 | TFLOPs: 12.03 | 7: iteration 68560/ 173500 | consumed samples: 17551360 | consumed tokens: 35945185280 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.538817E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.082 | TFLOPs: 12.05 | 7: iteration 68570/ 173500 | consumed samples: 17553920 | consumed tokens: 35950428160 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.527487E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.138 | TFLOPs: 12.02 | 7: iteration 68580/ 173500 | consumed samples: 17556480 | consumed tokens: 35955671040 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.521681E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.798 | TFLOPs: 11.93 | 7: iteration 68590/ 173500 | consumed samples: 17559040 | consumed tokens: 35960913920 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.528507E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.935 | TFLOPs: 11.91 | 7: iteration 68600/ 173500 | consumed samples: 17561600 | consumed tokens: 35966156800 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.541425E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.356 | TFLOPs: 12.00 | 7: iteration 68610/ 173500 | consumed samples: 17564160 | consumed tokens: 35971399680 | elapsed time per iteration (s): 0.08 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 4.532790E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.761 | TFLOPs: 12.02 | 7: iteration 68620/ 173500 | consumed samples: 17566720 | consumed tokens: 35976642560 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.538322E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.435 | TFLOPs: 11.43 | 7: iteration 68630/ 173500 | consumed samples: 17569280 | consumed tokens: 35981885440 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.538351E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.889 | TFLOPs: 12.11 | 7: iteration 68640/ 173500 | consumed samples: 17571840 | consumed tokens: 35987128320 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.538202E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.679 | TFLOPs: 12.12 | 7: iteration 68650/ 173500 | consumed samples: 17574400 | consumed tokens: 35992371200 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.540097E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.101 | TFLOPs: 12.04 | 7: iteration 68660/ 173500 | consumed samples: 17576960 | consumed tokens: 35997614080 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.547199E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.704 | TFLOPs: 11.99 | 7: iteration 68670/ 173500 | consumed samples: 17579520 | consumed tokens: 36002856960 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.533027E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.554 | TFLOPs: 12.00 | 7: iteration 68680/ 173500 | consumed samples: 17582080 | consumed tokens: 36008099840 | elapsed time per iteration (s): 0.08 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.550653E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.726 | TFLOPs: 11.92 | 7: iteration 68690/ 173500 | consumed samples: 17584640 | consumed tokens: 36013342720 | elapsed time per iteration (s): 0.08 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.526850E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.996 | TFLOPs: 11.98 | 7: iteration 68700/ 173500 | consumed samples: 17587200 | consumed tokens: 36018585600 | elapsed time per iteration (s): 0.08 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.532774E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.495 | TFLOPs: 12.03 | 7: iteration 68710/ 173500 | consumed samples: 17589760 | consumed tokens: 36023828480 | elapsed time per iteration (s): 0.08 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.533613E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.431 | TFLOPs: 12.04 | 7: iteration 68720/ 173500 | consumed samples: 17592320 | consumed tokens: 36029071360 | elapsed time per iteration (s): 0.09 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.531017E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.188 | TFLOPs: 11.06 | 7: iteration 68730/ 173500 | consumed samples: 17594880 | consumed tokens: 36034314240 | elapsed time per iteration (s): 0.09 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.532925E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.009 | TFLOPs: 10.43 | 7: iteration 68740/ 173500 | consumed samples: 17597440 | consumed tokens: 36039557120 | elapsed time per iteration (s): 0.09 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 4.534367E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.904 | TFLOPs: 10.04 | 7: iteration 68750/ 173500 | consumed samples: 17600000 | consumed tokens: 36044800000 | elapsed time per iteration (s): 0.09 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.543745E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.587 | TFLOPs: 10.54 | 7: iteration 68760/ 173500 | consumed samples: 17602560 | consumed tokens: 36050042880 | elapsed time per iteration (s): 0.09 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.542808E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2926.092 | TFLOPs: 10.88 | 7: iteration 68770/ 173500 | consumed samples: 17605120 | consumed tokens: 36055285760 | elapsed time per iteration (s): 0.08 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.548134E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.293 | TFLOPs: 12.06 | 7: iteration 68780/ 173500 | consumed samples: 17607680 | consumed tokens: 36060528640 | elapsed time per iteration (s): 0.08 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.537238E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.976 | TFLOPs: 12.04 | 7: iteration 68790/ 173500 | consumed samples: 17610240 | consumed tokens: 36065771520 | elapsed time per iteration (s): 0.08 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.548435E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.529 | TFLOPs: 11.99 | 7: iteration 68800/ 173500 | consumed samples: 17612800 | consumed tokens: 36071014400 | elapsed time per iteration (s): 0.10 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.546801E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2617.130 | TFLOPs: 9.73 | 7: iteration 68810/ 173500 | consumed samples: 17615360 | consumed tokens: 36076257280 | elapsed time per iteration (s): 0.10 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 4.551521E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.829 | TFLOPs: 9.57 | 7: iteration 68820/ 173500 | consumed samples: 17617920 | consumed tokens: 36081500160 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.538919E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.731 | TFLOPs: 11.93 | 7: iteration 68830/ 173500 | consumed samples: 17620480 | consumed tokens: 36086743040 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.523359E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.556 | TFLOPs: 11.93 | 7: iteration 68840/ 173500 | consumed samples: 17623040 | consumed tokens: 36091985920 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.535048E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.762 | TFLOPs: 11.89 | 7: iteration 68850/ 173500 | consumed samples: 17625600 | consumed tokens: 36097228800 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.539327E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.884 | TFLOPs: 12.06 | 7: iteration 68860/ 173500 | consumed samples: 17628160 | consumed tokens: 36102471680 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.542171E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.396 | TFLOPs: 12.05 | 7: iteration 68870/ 173500 | consumed samples: 17630720 | consumed tokens: 36107714560 | elapsed time per iteration (s): 0.08 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 4.540326E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.731 | TFLOPs: 12.05 | 7: iteration 68880/ 173500 | consumed samples: 17633280 | consumed tokens: 36112957440 | elapsed time per iteration (s): 0.09 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.534326E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2720.236 | TFLOPs: 10.12 | 7: iteration 68890/ 173500 | consumed samples: 17635840 | consumed tokens: 36118200320 | elapsed time per iteration (s): 0.09 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.541323E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2787.638 | TFLOPs: 10.37 | 7: iteration 68900/ 173500 | consumed samples: 17638400 | consumed tokens: 36123443200 | elapsed time per iteration (s): 0.08 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.533665E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.960 | TFLOPs: 11.78 | 7: iteration 68910/ 173500 | consumed samples: 17640960 | consumed tokens: 36128686080 | elapsed time per iteration (s): 0.08 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.546955E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.437 | TFLOPs: 11.96 | 7: iteration 68920/ 173500 | consumed samples: 17643520 | consumed tokens: 36133928960 | elapsed time per iteration (s): 0.08 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.543674E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.665 | TFLOPs: 11.99 | 7: iteration 68930/ 173500 | consumed samples: 17646080 | consumed tokens: 36139171840 | elapsed time per iteration (s): 0.09 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.522700E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.030 | TFLOPs: 10.58 | 7: iteration 68940/ 173500 | consumed samples: 17648640 | consumed tokens: 36144414720 | elapsed time per iteration (s): 0.08 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 4.532603E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.274 | TFLOPs: 11.71 | 7: iteration 68950/ 173500 | consumed samples: 17651200 | consumed tokens: 36149657600 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.526052E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.746 | TFLOPs: 11.72 | 7: iteration 68960/ 173500 | consumed samples: 17653760 | consumed tokens: 36154900480 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.541638E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.085 | TFLOPs: 11.72 | 7: iteration 68970/ 173500 | consumed samples: 17656320 | consumed tokens: 36160143360 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.538080E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.668 | TFLOPs: 11.84 | 7: iteration 68980/ 173500 | consumed samples: 17658880 | consumed tokens: 36165386240 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.553735E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.748 | TFLOPs: 11.57 | 7: iteration 68990/ 173500 | consumed samples: 17661440 | consumed tokens: 36170629120 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.537524E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.415 | TFLOPs: 11.88 | 7: iteration 69000/ 173500 | consumed samples: 17664000 | consumed tokens: 36175872000 | elapsed time per iteration (s): 0.08 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 4.532994E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.717 | TFLOPs: 11.99 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 69000 | lm loss value: 4.367357E+00 | lm loss PPL: 7.883498E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 69000 to checkpoints_14m91b100m 0: [2023-03-17 01:56:12,927] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step69000 is begin to save! 0: [2023-03-17 01:56:12,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:56:12,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:56:12,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:56:12,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:56:12,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:56:12,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:56:12,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:56:12,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:56:12,965] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:56:12,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:56:12,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:56:12,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:56:12,969] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step69000/mp_rank_00_model_states.pt 0: [2023-03-17 01:56:12,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:56:12,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:56:12,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:56:12,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:12,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:12,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:12,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:12,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:12,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:12,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:12,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:12,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:12,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:12,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:12,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:12,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:12,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:13,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:13,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:13,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:13,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:13,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:56:13,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:13,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:13,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:13,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:13,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:13,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 01:56:13,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2023-03-17 01:56:13,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: successfully saved checkpoint at iteration 69000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.28 7: iteration 69010/ 173500 | consumed samples: 17666560 | consumed tokens: 36181114880 | elapsed time per iteration (s): 0.10 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.532594E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2661.954 | TFLOPs: 9.90 | 7: iteration 69020/ 173500 | consumed samples: 17669120 | consumed tokens: 36186357760 | elapsed time per iteration (s): 0.09 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.540354E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.275 | TFLOPs: 11.19 | 7: iteration 69030/ 173500 | consumed samples: 17671680 | consumed tokens: 36191600640 | elapsed time per iteration (s): 0.09 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.552020E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.678 | TFLOPs: 11.11 | 7: iteration 69040/ 173500 | consumed samples: 17674240 | consumed tokens: 36196843520 | elapsed time per iteration (s): 0.09 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.535936E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.780 | TFLOPs: 10.92 | 7: iteration 69050/ 173500 | consumed samples: 17676800 | consumed tokens: 36202086400 | elapsed time per iteration (s): 0.08 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.539467E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.088 | TFLOPs: 11.94 | 7: iteration 69060/ 173500 | consumed samples: 17679360 | consumed tokens: 36207329280 | elapsed time per iteration (s): 0.08 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 4.538289E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.516 | TFLOPs: 12.02 | 7: iteration 69070/ 173500 | consumed samples: 17681920 | consumed tokens: 36212572160 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.541866E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.271 | TFLOPs: 11.68 | 7: iteration 69080/ 173500 | consumed samples: 17684480 | consumed tokens: 36217815040 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.536075E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.512 | TFLOPs: 11.95 | 7: iteration 69090/ 173500 | consumed samples: 17687040 | consumed tokens: 36223057920 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.539579E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.782 | TFLOPs: 11.64 | 7: iteration 69100/ 173500 | consumed samples: 17689600 | consumed tokens: 36228300800 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.541980E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.613 | TFLOPs: 11.86 | 7: iteration 69110/ 173500 | consumed samples: 17692160 | consumed tokens: 36233543680 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.539531E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.912 | TFLOPs: 12.02 | 7: iteration 69120/ 173500 | consumed samples: 17694720 | consumed tokens: 36238786560 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.538423E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.214 | TFLOPs: 11.87 | 7: iteration 69130/ 173500 | consumed samples: 17697280 | consumed tokens: 36244029440 | elapsed time per iteration (s): 0.08 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 4.527209E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.326 | TFLOPs: 11.94 | 7: iteration 69140/ 173500 | consumed samples: 17699840 | consumed tokens: 36249272320 | elapsed time per iteration (s): 0.09 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.525636E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.394 | TFLOPs: 10.57 | 7: iteration 69150/ 173500 | consumed samples: 17702400 | consumed tokens: 36254515200 | elapsed time per iteration (s): 0.08 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.541454E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.852 | TFLOPs: 11.91 | 7: iteration 69160/ 173500 | consumed samples: 17704960 | consumed tokens: 36259758080 | elapsed time per iteration (s): 0.08 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.525107E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.949 | TFLOPs: 11.38 | 7: iteration 69170/ 173500 | consumed samples: 17707520 | consumed tokens: 36265000960 | elapsed time per iteration (s): 0.08 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.532858E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.298 | TFLOPs: 11.49 | 7: iteration 69180/ 173500 | consumed samples: 17710080 | consumed tokens: 36270243840 | elapsed time per iteration (s): 0.09 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.527459E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.456 | TFLOPs: 10.45 | 7: iteration 69190/ 173500 | consumed samples: 17712640 | consumed tokens: 36275486720 | elapsed time per iteration (s): 0.08 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 4.530119E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.394 | TFLOPs: 11.90 | 7: iteration 69200/ 173500 | consumed samples: 17715200 | consumed tokens: 36280729600 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.524248E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.174 | TFLOPs: 11.46 | 7: iteration 69210/ 173500 | consumed samples: 17717760 | consumed tokens: 36285972480 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.552694E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.599 | TFLOPs: 11.66 | 7: iteration 69220/ 173500 | consumed samples: 17720320 | consumed tokens: 36291215360 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.543528E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.755 | TFLOPs: 11.44 | 7: iteration 69230/ 173500 | consumed samples: 17722880 | consumed tokens: 36296458240 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.542667E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.510 | TFLOPs: 11.96 | 7: iteration 69240/ 173500 | consumed samples: 17725440 | consumed tokens: 36301701120 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.534248E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.769 | TFLOPs: 12.00 | 7: iteration 69250/ 173500 | consumed samples: 17728000 | consumed tokens: 36306944000 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.533147E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.180 | TFLOPs: 12.03 | 7: iteration 69260/ 173500 | consumed samples: 17730560 | consumed tokens: 36312186880 | elapsed time per iteration (s): 0.08 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 4.542973E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.566 | TFLOPs: 12.06 | 7: iteration 69270/ 173500 | consumed samples: 17733120 | consumed tokens: 36317429760 | elapsed time per iteration (s): 0.08 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.540359E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.690 | TFLOPs: 11.38 | 7: iteration 69280/ 173500 | consumed samples: 17735680 | consumed tokens: 36322672640 | elapsed time per iteration (s): 0.08 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.541131E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.292 | TFLOPs: 11.47 | 7: iteration 69290/ 173500 | consumed samples: 17738240 | consumed tokens: 36327915520 | elapsed time per iteration (s): 0.08 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.546265E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.474 | TFLOPs: 12.05 | 7: iteration 69300/ 173500 | consumed samples: 17740800 | consumed tokens: 36333158400 | elapsed time per iteration (s): 0.09 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.518524E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.038 | TFLOPs: 10.64 | 7: iteration 69310/ 173500 | consumed samples: 17743360 | consumed tokens: 36338401280 | elapsed time per iteration (s): 0.08 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.533154E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.503 | TFLOPs: 12.01 | 7: iteration 69320/ 173500 | consumed samples: 17745920 | consumed tokens: 36343644160 | elapsed time per iteration (s): 0.08 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 4.541183E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.884 | TFLOPs: 12.05 | 7: iteration 69330/ 173500 | consumed samples: 17748480 | consumed tokens: 36348887040 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.529873E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.149 | TFLOPs: 11.73 | 7: iteration 69340/ 173500 | consumed samples: 17751040 | consumed tokens: 36354129920 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.547135E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3257.233 | TFLOPs: 12.12 | 7: iteration 69350/ 173500 | consumed samples: 17753600 | consumed tokens: 36359372800 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.543972E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.280 | TFLOPs: 11.70 | 7: iteration 69360/ 173500 | consumed samples: 17756160 | consumed tokens: 36364615680 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.543362E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.252 | TFLOPs: 12.12 | 7: iteration 69370/ 173500 | consumed samples: 17758720 | consumed tokens: 36369858560 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.526987E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.789 | TFLOPs: 12.11 | 7: iteration 69380/ 173500 | consumed samples: 17761280 | consumed tokens: 36375101440 | elapsed time per iteration (s): 0.08 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.535237E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.176 | TFLOPs: 11.77 | 7: iteration 69390/ 173500 | consumed samples: 17763840 | consumed tokens: 36380344320 | elapsed time per iteration (s): 0.09 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 4.533995E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2927.915 | TFLOPs: 10.89 | 7: iteration 69400/ 173500 | consumed samples: 17766400 | consumed tokens: 36385587200 | elapsed time per iteration (s): 0.08 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.540411E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.961 | TFLOPs: 11.75 | 7: iteration 69410/ 173500 | consumed samples: 17768960 | consumed tokens: 36390830080 | elapsed time per iteration (s): 0.10 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.546223E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.549 | TFLOPs: 9.81 | 7: iteration 69420/ 173500 | consumed samples: 17771520 | consumed tokens: 36396072960 | elapsed time per iteration (s): 0.08 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.545182E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.861 | TFLOPs: 12.11 | 7: iteration 69430/ 173500 | consumed samples: 17774080 | consumed tokens: 36401315840 | elapsed time per iteration (s): 0.08 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.538660E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.834 | TFLOPs: 11.64 | 7: iteration 69440/ 173500 | consumed samples: 17776640 | consumed tokens: 36406558720 | elapsed time per iteration (s): 0.08 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.539906E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.646 | TFLOPs: 11.49 | 7: iteration 69450/ 173500 | consumed samples: 17779200 | consumed tokens: 36411801600 | elapsed time per iteration (s): 0.08 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 4.535553E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.518 | TFLOPs: 12.03 | 7: iteration 69460/ 173500 | consumed samples: 17781760 | consumed tokens: 36417044480 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.532348E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.519 | TFLOPs: 12.02 | 7: iteration 69470/ 173500 | consumed samples: 17784320 | consumed tokens: 36422287360 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.542313E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.069 | TFLOPs: 11.99 | 7: iteration 69480/ 173500 | consumed samples: 17786880 | consumed tokens: 36427530240 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.545031E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.376 | TFLOPs: 12.04 | 7: iteration 69490/ 173500 | consumed samples: 17789440 | consumed tokens: 36432773120 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.534933E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.753 | TFLOPs: 12.03 | 7: iteration 69500/ 173500 | consumed samples: 17792000 | consumed tokens: 36438016000 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.541444E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.480 | TFLOPs: 12.01 | 7: iteration 69510/ 173500 | consumed samples: 17794560 | consumed tokens: 36443258880 | elapsed time per iteration (s): 0.08 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 4.536820E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.148 | TFLOPs: 12.11 | 7: iteration 69520/ 173500 | consumed samples: 17797120 | consumed tokens: 36448501760 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.531100E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.612 | TFLOPs: 12.09 | 7: iteration 69530/ 173500 | consumed samples: 17799680 | consumed tokens: 36453744640 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.534755E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3261.694 | TFLOPs: 12.13 | 7: iteration 69540/ 173500 | consumed samples: 17802240 | consumed tokens: 36458987520 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.526020E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.999 | TFLOPs: 12.08 | 7: iteration 69550/ 173500 | consumed samples: 17804800 | consumed tokens: 36464230400 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.527633E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.730 | TFLOPs: 12.02 | 7: iteration 69560/ 173500 | consumed samples: 17807360 | consumed tokens: 36469473280 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.531543E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.294 | TFLOPs: 12.08 | 7: iteration 69570/ 173500 | consumed samples: 17809920 | consumed tokens: 36474716160 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.541556E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.907 | TFLOPs: 12.02 | 7: iteration 69580/ 173500 | consumed samples: 17812480 | consumed tokens: 36479959040 | elapsed time per iteration (s): 0.08 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 4.529152E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.189 | TFLOPs: 11.99 | 7: iteration 69590/ 173500 | consumed samples: 17815040 | consumed tokens: 36485201920 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.534489E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3257.734 | TFLOPs: 12.12 | 7: iteration 69600/ 173500 | consumed samples: 17817600 | consumed tokens: 36490444800 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.530447E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.070 | TFLOPs: 12.09 | 7: iteration 69610/ 173500 | consumed samples: 17820160 | consumed tokens: 36495687680 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.528770E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.673 | TFLOPs: 12.10 | 7: iteration 69620/ 173500 | consumed samples: 17822720 | consumed tokens: 36500930560 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.536973E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.772 | TFLOPs: 12.08 | 7: iteration 69630/ 173500 | consumed samples: 17825280 | consumed tokens: 36506173440 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.542717E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.068 | TFLOPs: 12.07 | 7: iteration 69640/ 173500 | consumed samples: 17827840 | consumed tokens: 36511416320 | elapsed time per iteration (s): 0.08 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 4.536570E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.485 | TFLOPs: 12.08 | 7: iteration 69650/ 173500 | consumed samples: 17830400 | consumed tokens: 36516659200 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.526534E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.167 | TFLOPs: 12.07 | 7: iteration 69660/ 173500 | consumed samples: 17832960 | consumed tokens: 36521902080 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.547828E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3257.800 | TFLOPs: 12.12 | 7: iteration 69670/ 173500 | consumed samples: 17835520 | consumed tokens: 36527144960 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.544692E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.857 | TFLOPs: 12.07 | 7: iteration 69680/ 173500 | consumed samples: 17838080 | consumed tokens: 36532387840 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.540694E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.356 | TFLOPs: 12.06 | 7: iteration 69690/ 173500 | consumed samples: 17840640 | consumed tokens: 36537630720 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.535368E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.103 | TFLOPs: 12.07 | 7: iteration 69700/ 173500 | consumed samples: 17843200 | consumed tokens: 36542873600 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.533113E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.028 | TFLOPs: 12.00 | 7: iteration 69710/ 173500 | consumed samples: 17845760 | consumed tokens: 36548116480 | elapsed time per iteration (s): 0.08 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 4.545647E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.042 | TFLOPs: 11.56 | 7: iteration 69720/ 173500 | consumed samples: 17848320 | consumed tokens: 36553359360 | elapsed time per iteration (s): 0.08 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.538339E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.667 | TFLOPs: 11.96 | 7: iteration 69730/ 173500 | consumed samples: 17850880 | consumed tokens: 36558602240 | elapsed time per iteration (s): 0.08 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.542108E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.039 | TFLOPs: 12.02 | 7: iteration 69740/ 173500 | consumed samples: 17853440 | consumed tokens: 36563845120 | elapsed time per iteration (s): 0.08 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.539738E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.559 | TFLOPs: 12.04 | 7: iteration 69750/ 173500 | consumed samples: 17856000 | consumed tokens: 36569088000 | elapsed time per iteration (s): 0.10 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.528802E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.315 | TFLOPs: 9.83 | 7: iteration 69760/ 173500 | consumed samples: 17858560 | consumed tokens: 36574330880 | elapsed time per iteration (s): 0.09 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.545110E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.398 | TFLOPs: 10.23 | 7: iteration 69770/ 173500 | consumed samples: 17861120 | consumed tokens: 36579573760 | elapsed time per iteration (s): 0.08 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.536562E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.463 | TFLOPs: 11.27 | 7: iteration 69780/ 173500 | consumed samples: 17863680 | consumed tokens: 36584816640 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.527928E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.016 | TFLOPs: 11.93 | 7: iteration 69790/ 173500 | consumed samples: 17866240 | consumed tokens: 36590059520 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.549232E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.046 | TFLOPs: 12.01 | 7: iteration 69800/ 173500 | consumed samples: 17868800 | consumed tokens: 36595302400 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.527522E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.622 | TFLOPs: 12.05 | 7: iteration 69810/ 173500 | consumed samples: 17871360 | consumed tokens: 36600545280 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.537341E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.684 | TFLOPs: 12.11 | 7: iteration 69820/ 173500 | consumed samples: 17873920 | consumed tokens: 36605788160 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.522029E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.209 | TFLOPs: 12.08 | 7: iteration 69830/ 173500 | consumed samples: 17876480 | consumed tokens: 36611031040 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.554185E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.706 | TFLOPs: 12.05 | 7: iteration 69840/ 173500 | consumed samples: 17879040 | consumed tokens: 36616273920 | elapsed time per iteration (s): 0.08 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 4.535618E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.763 | TFLOPs: 12.05 | 7: iteration 69850/ 173500 | consumed samples: 17881600 | consumed tokens: 36621516800 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.547227E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.383 | TFLOPs: 11.70 | 7: iteration 69860/ 173500 | consumed samples: 17884160 | consumed tokens: 36626759680 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.540442E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.087 | TFLOPs: 12.06 | 7: iteration 69870/ 173500 | consumed samples: 17886720 | consumed tokens: 36632002560 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.525200E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.187 | TFLOPs: 12.12 | 7: iteration 69880/ 173500 | consumed samples: 17889280 | consumed tokens: 36637245440 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.548135E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.709 | TFLOPs: 12.00 | 7: iteration 69890/ 173500 | consumed samples: 17891840 | consumed tokens: 36642488320 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.538390E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.860 | TFLOPs: 12.11 | 7: iteration 69900/ 173500 | consumed samples: 17894400 | consumed tokens: 36647731200 | elapsed time per iteration (s): 0.08 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 4.542075E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.573 | TFLOPs: 12.12 | 7: iteration 69910/ 173500 | consumed samples: 17896960 | consumed tokens: 36652974080 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.530529E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.178 | TFLOPs: 12.04 | 7: iteration 69920/ 173500 | consumed samples: 17899520 | consumed tokens: 36658216960 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.531263E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.914 | TFLOPs: 12.05 | 7: iteration 69930/ 173500 | consumed samples: 17902080 | consumed tokens: 36663459840 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.537186E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.202 | TFLOPs: 12.06 | 7: iteration 69940/ 173500 | consumed samples: 17904640 | consumed tokens: 36668702720 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.529127E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3251.681 | TFLOPs: 12.09 | 7: iteration 69950/ 173500 | consumed samples: 17907200 | consumed tokens: 36673945600 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.527715E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.560 | TFLOPs: 12.07 | 7: iteration 69960/ 173500 | consumed samples: 17909760 | consumed tokens: 36679188480 | elapsed time per iteration (s): 0.08 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 4.532811E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.503 | TFLOPs: 12.12 | 7: iteration 69970/ 173500 | consumed samples: 17912320 | consumed tokens: 36684431360 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.530033E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.768 | TFLOPs: 12.12 | 7: iteration 69980/ 173500 | consumed samples: 17914880 | consumed tokens: 36689674240 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.538210E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3255.131 | TFLOPs: 12.11 | 7: iteration 69990/ 173500 | consumed samples: 17917440 | consumed tokens: 36694917120 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.521737E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.725 | TFLOPs: 12.08 | 0: [2023-03-17 01:57:33,838] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=0, lr=[0.0001385013705497804, 0.0001385013705497804, 0.0001385013705497804], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 70000/ 173500 | consumed samples: 17920000 | consumed tokens: 36700160000 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.539947E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.058 | TFLOPs: 12.02 | 0: steps: 70000 loss: 4.5489 iter time (s): 0.082 samples/sec: 3115.631 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 70000 | lm loss value: 4.417347E+00 | lm loss PPL: 8.287616E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 70000 to checkpoints_14m91b100m 0: [2023-03-17 01:57:33,898] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step70000 is begin to save! 0: [2023-03-17 01:57:33,901] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:57:33,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:57:33,927] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:57:33,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:57:33,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:57:33,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:57:33,933] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:57:33,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:57:33,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:57:33,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:57:33,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:57:33,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:57:33,940] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step70000/mp_rank_00_model_states.pt 0: [2023-03-17 01:57:33,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:57:33,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:57:33,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:57:33,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:57:33,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:57:33,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2023-03-17 01:57:33,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2023-03-17 01:57:33,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:57:33,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:57:33,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: successfully saved checkpoint at iteration 70000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.56 7: iteration 70010/ 173500 | consumed samples: 17922560 | consumed tokens: 36705402880 | elapsed time per iteration (s): 0.09 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.543528E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2766.209 | TFLOPs: 10.29 | 7: iteration 70020/ 173500 | consumed samples: 17925120 | consumed tokens: 36710645760 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.551676E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.380 | TFLOPs: 11.77 | 7: iteration 70030/ 173500 | consumed samples: 17927680 | consumed tokens: 36715888640 | elapsed time per iteration (s): 0.08 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 4.523737E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.553 | TFLOPs: 11.93 | 7: iteration 70040/ 173500 | consumed samples: 17930240 | consumed tokens: 36721131520 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.539759E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.982 | TFLOPs: 11.93 | 7: iteration 70050/ 173500 | consumed samples: 17932800 | consumed tokens: 36726374400 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.535331E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.214 | TFLOPs: 11.93 | 7: iteration 70060/ 173500 | consumed samples: 17935360 | consumed tokens: 36731617280 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.534653E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.239 | TFLOPs: 11.92 | 7: iteration 70070/ 173500 | consumed samples: 17937920 | consumed tokens: 36736860160 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.534988E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.161 | TFLOPs: 11.90 | 7: iteration 70080/ 173500 | consumed samples: 17940480 | consumed tokens: 36742103040 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.533536E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.590 | TFLOPs: 11.64 | 7: iteration 70090/ 173500 | consumed samples: 17943040 | consumed tokens: 36747345920 | elapsed time per iteration (s): 0.08 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 4.528763E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.868 | TFLOPs: 12.08 | 7: iteration 70100/ 173500 | consumed samples: 17945600 | consumed tokens: 36752588800 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.529716E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.483 | TFLOPs: 12.07 | 7: iteration 70110/ 173500 | consumed samples: 17948160 | consumed tokens: 36757831680 | elapsed time per iteration (s): 0.09 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.525148E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2826.158 | TFLOPs: 10.51 | 7: iteration 70120/ 173500 | consumed samples: 17950720 | consumed tokens: 36763074560 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.535748E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.206 | TFLOPs: 11.89 | 7: iteration 70130/ 173500 | consumed samples: 17953280 | consumed tokens: 36768317440 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.541341E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.606 | TFLOPs: 11.98 | 7: iteration 70140/ 173500 | consumed samples: 17955840 | consumed tokens: 36773560320 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.537790E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.839 | TFLOPs: 11.91 | 7: iteration 70150/ 173500 | consumed samples: 17958400 | consumed tokens: 36778803200 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.526252E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.030 | TFLOPs: 11.98 | 7: iteration 70160/ 173500 | consumed samples: 17960960 | consumed tokens: 36784046080 | elapsed time per iteration (s): 0.08 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 4.529304E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.125 | TFLOPs: 11.98 | 7: iteration 70170/ 173500 | consumed samples: 17963520 | consumed tokens: 36789288960 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.534011E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.724 | TFLOPs: 12.06 | 7: iteration 70180/ 173500 | consumed samples: 17966080 | consumed tokens: 36794531840 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.531815E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3263.679 | TFLOPs: 12.14 | 7: iteration 70190/ 173500 | consumed samples: 17968640 | consumed tokens: 36799774720 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.539583E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.180 | TFLOPs: 12.00 | 7: iteration 70200/ 173500 | consumed samples: 17971200 | consumed tokens: 36805017600 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.520710E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.285 | TFLOPs: 11.96 | 7: iteration 70210/ 173500 | consumed samples: 17973760 | consumed tokens: 36810260480 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.541891E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.542 | TFLOPs: 11.73 | 7: iteration 70220/ 173500 | consumed samples: 17976320 | consumed tokens: 36815503360 | elapsed time per iteration (s): 0.08 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 4.539185E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.470 | TFLOPs: 11.71 | 7: iteration 70230/ 173500 | consumed samples: 17978880 | consumed tokens: 36820746240 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.539627E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.104 | TFLOPs: 12.03 | 7: iteration 70240/ 173500 | consumed samples: 17981440 | consumed tokens: 36825989120 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.551220E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.292 | TFLOPs: 12.02 | 7: iteration 70250/ 173500 | consumed samples: 17984000 | consumed tokens: 36831232000 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.541703E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.704 | TFLOPs: 12.00 | 7: iteration 70260/ 173500 | consumed samples: 17986560 | consumed tokens: 36836474880 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.515228E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.932 | TFLOPs: 12.02 | 7: iteration 70270/ 173500 | consumed samples: 17989120 | consumed tokens: 36841717760 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.533911E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.871 | TFLOPs: 11.98 | 7: iteration 70280/ 173500 | consumed samples: 17991680 | consumed tokens: 36846960640 | elapsed time per iteration (s): 0.08 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 4.544513E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.949 | TFLOPs: 11.47 | 7: iteration 70290/ 173500 | consumed samples: 17994240 | consumed tokens: 36852203520 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.533460E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.546 | TFLOPs: 11.90 | 7: iteration 70300/ 173500 | consumed samples: 17996800 | consumed tokens: 36857446400 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.529694E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.547 | TFLOPs: 12.03 | 7: iteration 70310/ 173500 | consumed samples: 17999360 | consumed tokens: 36862689280 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.535396E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.924 | TFLOPs: 12.02 | 7: iteration 70320/ 173500 | consumed samples: 18001920 | consumed tokens: 36867932160 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.529108E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.956 | TFLOPs: 12.03 | 7: iteration 70330/ 173500 | consumed samples: 18004480 | consumed tokens: 36873175040 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.532456E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.781 | TFLOPs: 12.02 | 7: iteration 70340/ 173500 | consumed samples: 18007040 | consumed tokens: 36878417920 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.527634E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.247 | TFLOPs: 12.04 | 7: iteration 70350/ 173500 | consumed samples: 18009600 | consumed tokens: 36883660800 | elapsed time per iteration (s): 0.08 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 4.534594E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.423 | TFLOPs: 12.03 | 7: iteration 70360/ 173500 | consumed samples: 18012160 | consumed tokens: 36888903680 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.549684E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.825 | TFLOPs: 12.04 | 7: iteration 70370/ 173500 | consumed samples: 18014720 | consumed tokens: 36894146560 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.536320E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.688 | TFLOPs: 12.03 | 7: iteration 70380/ 173500 | consumed samples: 18017280 | consumed tokens: 36899389440 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.535523E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.791 | TFLOPs: 12.04 | 7: iteration 70390/ 173500 | consumed samples: 18019840 | consumed tokens: 36904632320 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.541478E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.543 | TFLOPs: 11.97 | 7: iteration 70400/ 173500 | consumed samples: 18022400 | consumed tokens: 36909875200 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.546075E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.193 | TFLOPs: 12.01 | 7: iteration 70410/ 173500 | consumed samples: 18024960 | consumed tokens: 36915118080 | elapsed time per iteration (s): 0.08 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 4.535475E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.264 | TFLOPs: 12.04 | 7: iteration 70420/ 173500 | consumed samples: 18027520 | consumed tokens: 36920360960 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.537976E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.463 | TFLOPs: 12.02 | 7: iteration 70430/ 173500 | consumed samples: 18030080 | consumed tokens: 36925603840 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.530342E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.188 | TFLOPs: 12.04 | 7: iteration 70440/ 173500 | consumed samples: 18032640 | consumed tokens: 36930846720 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.536581E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.640 | TFLOPs: 11.97 | 7: iteration 70450/ 173500 | consumed samples: 18035200 | consumed tokens: 36936089600 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.543551E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.648 | TFLOPs: 11.85 | 7: iteration 70460/ 173500 | consumed samples: 18037760 | consumed tokens: 36941332480 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.529696E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.652 | TFLOPs: 12.01 | 7: iteration 70470/ 173500 | consumed samples: 18040320 | consumed tokens: 36946575360 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.545415E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.589 | TFLOPs: 11.86 | 7: iteration 70480/ 173500 | consumed samples: 18042880 | consumed tokens: 36951818240 | elapsed time per iteration (s): 0.08 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 4.530744E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.694 | TFLOPs: 12.04 | 7: iteration 70490/ 173500 | consumed samples: 18045440 | consumed tokens: 36957061120 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.535576E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.027 | TFLOPs: 11.96 | 7: iteration 70500/ 173500 | consumed samples: 18048000 | consumed tokens: 36962304000 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.536349E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.827 | TFLOPs: 11.95 | 7: iteration 70510/ 173500 | consumed samples: 18050560 | consumed tokens: 36967546880 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.525174E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.645 | TFLOPs: 12.01 | 7: iteration 70520/ 173500 | consumed samples: 18053120 | consumed tokens: 36972789760 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.529334E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.105 | TFLOPs: 11.71 | 7: iteration 70530/ 173500 | consumed samples: 18055680 | consumed tokens: 36978032640 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.523307E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.704 | TFLOPs: 12.01 | 7: iteration 70540/ 173500 | consumed samples: 18058240 | consumed tokens: 36983275520 | elapsed time per iteration (s): 0.08 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 4.532975E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.482 | TFLOPs: 11.73 | 7: iteration 70550/ 173500 | consumed samples: 18060800 | consumed tokens: 36988518400 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.527292E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.011 | TFLOPs: 12.01 | 7: iteration 70560/ 173500 | consumed samples: 18063360 | consumed tokens: 36993761280 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.532931E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.806 | TFLOPs: 12.02 | 7: iteration 70570/ 173500 | consumed samples: 18065920 | consumed tokens: 36999004160 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.537175E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.503 | TFLOPs: 11.98 | 7: iteration 70580/ 173500 | consumed samples: 18068480 | consumed tokens: 37004247040 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.538725E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.811 | TFLOPs: 11.94 | 7: iteration 70590/ 173500 | consumed samples: 18071040 | consumed tokens: 37009489920 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.534775E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.127 | TFLOPs: 11.93 | 7: iteration 70600/ 173500 | consumed samples: 18073600 | consumed tokens: 37014732800 | elapsed time per iteration (s): 0.08 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 4.535390E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.339 | TFLOPs: 11.93 | 7: iteration 70610/ 173500 | consumed samples: 18076160 | consumed tokens: 37019975680 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.527377E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.965 | TFLOPs: 11.90 | 7: iteration 70620/ 173500 | consumed samples: 18078720 | consumed tokens: 37025218560 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.546128E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.939 | TFLOPs: 11.92 | 7: iteration 70630/ 173500 | consumed samples: 18081280 | consumed tokens: 37030461440 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.525987E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.062 | TFLOPs: 11.87 | 7: iteration 70640/ 173500 | consumed samples: 18083840 | consumed tokens: 37035704320 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.531633E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.217 | TFLOPs: 11.93 | 7: iteration 70650/ 173500 | consumed samples: 18086400 | consumed tokens: 37040947200 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.527540E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.969 | TFLOPs: 11.91 | 7: iteration 70660/ 173500 | consumed samples: 18088960 | consumed tokens: 37046190080 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.529855E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.237 | TFLOPs: 11.90 | 7: iteration 70670/ 173500 | consumed samples: 18091520 | consumed tokens: 37051432960 | elapsed time per iteration (s): 0.08 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 4.541862E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.173 | TFLOPs: 11.88 | 7: iteration 70680/ 173500 | consumed samples: 18094080 | consumed tokens: 37056675840 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.534364E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.543 | TFLOPs: 11.95 | 7: iteration 70690/ 173500 | consumed samples: 18096640 | consumed tokens: 37061918720 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.536847E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.814 | TFLOPs: 11.98 | 7: iteration 70700/ 173500 | consumed samples: 18099200 | consumed tokens: 37067161600 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.525969E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.322 | TFLOPs: 11.87 | 7: iteration 70710/ 173500 | consumed samples: 18101760 | consumed tokens: 37072404480 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.531818E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.471 | TFLOPs: 11.92 | 7: iteration 70720/ 173500 | consumed samples: 18104320 | consumed tokens: 37077647360 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.535304E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.923 | TFLOPs: 11.90 | 7: iteration 70730/ 173500 | consumed samples: 18106880 | consumed tokens: 37082890240 | elapsed time per iteration (s): 0.08 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 4.537057E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.617 | TFLOPs: 11.96 | 7: iteration 70740/ 173500 | consumed samples: 18109440 | consumed tokens: 37088133120 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.530814E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.762 | TFLOPs: 11.67 | 7: iteration 70750/ 173500 | consumed samples: 18112000 | consumed tokens: 37093376000 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.537159E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.440 | TFLOPs: 11.90 | 7: iteration 70760/ 173500 | consumed samples: 18114560 | consumed tokens: 37098618880 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.541293E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.874 | TFLOPs: 11.96 | 7: iteration 70770/ 173500 | consumed samples: 18117120 | consumed tokens: 37103861760 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.529466E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.543 | TFLOPs: 11.96 | 7: iteration 70780/ 173500 | consumed samples: 18119680 | consumed tokens: 37109104640 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.527966E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.028 | TFLOPs: 11.96 | 7: iteration 70790/ 173500 | consumed samples: 18122240 | consumed tokens: 37114347520 | elapsed time per iteration (s): 0.08 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 4.534948E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.321 | TFLOPs: 11.93 | 7: iteration 70800/ 173500 | consumed samples: 18124800 | consumed tokens: 37119590400 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.532851E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.996 | TFLOPs: 11.98 | 7: iteration 70810/ 173500 | consumed samples: 18127360 | consumed tokens: 37124833280 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.530262E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.586 | TFLOPs: 11.99 | 7: iteration 70820/ 173500 | consumed samples: 18129920 | consumed tokens: 37130076160 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.544595E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.403 | TFLOPs: 11.92 | 7: iteration 70830/ 173500 | consumed samples: 18132480 | consumed tokens: 37135319040 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.537632E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.988 | TFLOPs: 11.98 | 7: iteration 70840/ 173500 | consumed samples: 18135040 | consumed tokens: 37140561920 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.540406E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.677 | TFLOPs: 11.98 | 7: iteration 70850/ 173500 | consumed samples: 18137600 | consumed tokens: 37145804800 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.527833E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.303 | TFLOPs: 12.03 | 7: iteration 70860/ 173500 | consumed samples: 18140160 | consumed tokens: 37151047680 | elapsed time per iteration (s): 0.08 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 4.541255E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.418 | TFLOPs: 12.05 | 7: iteration 70870/ 173500 | consumed samples: 18142720 | consumed tokens: 37156290560 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.525939E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.649 | TFLOPs: 12.03 | 7: iteration 70880/ 173500 | consumed samples: 18145280 | consumed tokens: 37161533440 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.537580E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.959 | TFLOPs: 12.05 | 7: iteration 70890/ 173500 | consumed samples: 18147840 | consumed tokens: 37166776320 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.534894E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.417 | TFLOPs: 11.94 | 7: iteration 70900/ 173500 | consumed samples: 18150400 | consumed tokens: 37172019200 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.531002E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.166 | TFLOPs: 11.98 | 7: iteration 70910/ 173500 | consumed samples: 18152960 | consumed tokens: 37177262080 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.540107E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.190 | TFLOPs: 12.04 | 7: iteration 70920/ 173500 | consumed samples: 18155520 | consumed tokens: 37182504960 | elapsed time per iteration (s): 0.08 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.538428E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.706 | TFLOPs: 11.93 | 7: iteration 70930/ 173500 | consumed samples: 18158080 | consumed tokens: 37187747840 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.532273E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.696 | TFLOPs: 12.04 | 7: iteration 70940/ 173500 | consumed samples: 18160640 | consumed tokens: 37192990720 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.543564E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.287 | TFLOPs: 12.07 | 7: iteration 70950/ 173500 | consumed samples: 18163200 | consumed tokens: 37198233600 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.540998E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.922 | TFLOPs: 12.01 | 7: iteration 70960/ 173500 | consumed samples: 18165760 | consumed tokens: 37203476480 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.542318E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.249 | TFLOPs: 11.99 | 7: iteration 70970/ 173500 | consumed samples: 18168320 | consumed tokens: 37208719360 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.533895E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.754 | TFLOPs: 11.90 | 7: iteration 70980/ 173500 | consumed samples: 18170880 | consumed tokens: 37213962240 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.534434E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.714 | TFLOPs: 12.05 | 7: iteration 70990/ 173500 | consumed samples: 18173440 | consumed tokens: 37219205120 | elapsed time per iteration (s): 0.08 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 4.546870E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.801 | TFLOPs: 12.02 | 7: iteration 71000/ 173500 | consumed samples: 18176000 | consumed tokens: 37224448000 | elapsed time per iteration (s): 0.08 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.542665E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.223 | TFLOPs: 12.03 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 71000 | lm loss value: 4.421870E+00 | lm loss PPL: 8.325184E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 71000 to checkpoints_14m91b100m 0: [2023-03-17 01:58:53,779] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step71000 is begin to save! 0: [2023-03-17 01:58:53,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_01-model_00-model_states.pt... 0: [2023-03-17 01:58:53,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_01-model_00-model_states.pt. 0: [2023-03-17 01:58:53,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_03-model_00-model_states.pt... 0: [2023-03-17 01:58:53,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_03-model_00-model_states.pt. 0: [2023-03-17 01:58:53,812] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_04-model_00-model_states.pt... 0: [2023-03-17 01:58:53,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_04-model_00-model_states.pt. 0: [2023-03-17 01:58:53,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_05-model_00-model_states.pt... 0: [2023-03-17 01:58:53,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_05-model_00-model_states.pt. 0: [2023-03-17 01:58:53,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_06-model_00-model_states.pt... 0: [2023-03-17 01:58:53,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_06-model_00-model_states.pt. 0: [2023-03-17 01:58:53,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/layer_08-model_00-model_states.pt... 0: [2023-03-17 01:58:53,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/layer_08-model_00-model_states.pt. 0: [2023-03-17 01:58:53,823] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step71000/mp_rank_00_model_states.pt 0: [2023-03-17 01:58:53,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/mp_rank_00_model_states.pt... 0: [2023-03-17 01:58:53,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/mp_rank_00_model_states.pt. 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 1: [2023-03-17 01:58:53,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 01:58:53,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:58:53,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 01:58:53,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 01:58:53,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 01:58:53,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2023-03-17 01:58:53,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 01:58:53,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 01:58:53,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: successfully saved checkpoint at iteration 71000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.40 7: iteration 71010/ 173500 | consumed samples: 18178560 | consumed tokens: 37229690880 | elapsed time per iteration (s): 0.09 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.538813E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.077 | TFLOPs: 10.37 | 7: iteration 71020/ 173500 | consumed samples: 18181120 | consumed tokens: 37234933760 | elapsed time per iteration (s): 0.08 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.538784E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.229 | TFLOPs: 12.03 | 7: iteration 71030/ 173500 | consumed samples: 18183680 | consumed tokens: 37240176640 | elapsed time per iteration (s): 0.08 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.538047E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.549 | TFLOPs: 12.02 | 7: iteration 71040/ 173500 | consumed samples: 18186240 | consumed tokens: 37245419520 | elapsed time per iteration (s): 0.08 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.539875E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.774 | TFLOPs: 11.99 | 7: iteration 71050/ 173500 | consumed samples: 18188800 | consumed tokens: 37250662400 | elapsed time per iteration (s): 0.08 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 4.526495E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.033 | TFLOPs: 12.00 | 7: iteration 71060/ 173500 | consumed samples: 18191360 | consumed tokens: 37255905280 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.534711E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.047 | TFLOPs: 12.03 | 7: iteration 71070/ 173500 | consumed samples: 18193920 | consumed tokens: 37261148160 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.541195E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.871 | TFLOPs: 12.06 | 7: iteration 71080/ 173500 | consumed samples: 18196480 | consumed tokens: 37266391040 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.546810E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.560 | TFLOPs: 11.93 | 7: iteration 71090/ 173500 | consumed samples: 18199040 | consumed tokens: 37271633920 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.542023E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.701 | TFLOPs: 11.90 | 7: iteration 71100/ 173500 | consumed samples: 18201600 | consumed tokens: 37276876800 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.534005E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.446 | TFLOPs: 11.89 | 7: iteration 71110/ 173500 | consumed samples: 18204160 | consumed tokens: 37282119680 | elapsed time per iteration (s): 0.08 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 4.521793E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.599 | TFLOPs: 11.90 | 7: iteration 71120/ 173500 | consumed samples: 18206720 | consumed tokens: 37287362560 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.549179E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.531 | TFLOPs: 11.90 | 7: iteration 71130/ 173500 | consumed samples: 18209280 | consumed tokens: 37292605440 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.533978E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.515 | TFLOPs: 11.86 | 7: iteration 71140/ 173500 | consumed samples: 18211840 | consumed tokens: 37297848320 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.523591E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.996 | TFLOPs: 11.93 | 7: iteration 71150/ 173500 | consumed samples: 18214400 | consumed tokens: 37303091200 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.534256E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.161 | TFLOPs: 11.92 | 7: iteration 71160/ 173500 | consumed samples: 18216960 | consumed tokens: 37308334080 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.529330E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.989 | TFLOPs: 11.89 | 7: iteration 71170/ 173500 | consumed samples: 18219520 | consumed tokens: 37313576960 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.541867E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.469 | TFLOPs: 11.92 | 7: iteration 71180/ 173500 | consumed samples: 18222080 | consumed tokens: 37318819840 | elapsed time per iteration (s): 0.08 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 4.536928E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.112 | TFLOPs: 11.88 | 7: iteration 71190/ 173500 | consumed samples: 18224640 | consumed tokens: 37324062720 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.533326E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.206 | TFLOPs: 11.93 | 7: iteration 71200/ 173500 | consumed samples: 18227200 | consumed tokens: 37329305600 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.531616E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.727 | TFLOPs: 11.89 | 7: iteration 71210/ 173500 | consumed samples: 18229760 | consumed tokens: 37334548480 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.544365E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.996 | TFLOPs: 11.95 | 7: iteration 71220/ 173500 | consumed samples: 18232320 | consumed tokens: 37339791360 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.523284E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.085 | TFLOPs: 12.01 | 7: iteration 71230/ 173500 | consumed samples: 18234880 | consumed tokens: 37345034240 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.526624E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.474 | TFLOPs: 12.06 | 7: iteration 71240/ 173500 | consumed samples: 18237440 | consumed tokens: 37350277120 | elapsed time per iteration (s): 0.08 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 4.552819E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.882 | TFLOPs: 11.98 | 7: iteration 71250/ 173500 | consumed samples: 18240000 | consumed tokens: 37355520000 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.538782E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.823 | TFLOPs: 11.83 | 7: iteration 71260/ 173500 | consumed samples: 18242560 | consumed tokens: 37360762880 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.531675E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.736 | TFLOPs: 11.94 | 7: iteration 71270/ 173500 | consumed samples: 18245120 | consumed tokens: 37366005760 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.542648E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.296 | TFLOPs: 11.86 | 7: iteration 71280/ 173500 | consumed samples: 18247680 | consumed tokens: 37371248640 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.524660E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.448 | TFLOPs: 11.93 | 7: iteration 71290/ 173500 | consumed samples: 18250240 | consumed tokens: 37376491520 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.535369E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.744 | TFLOPs: 11.93 | 7: iteration 71300/ 173500 | consumed samples: 18252800 | consumed tokens: 37381734400 | elapsed time per iteration (s): 0.08 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 4.549302E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.002 | TFLOPs: 11.95 | 7: iteration 71310/ 173500 | consumed samples: 18255360 | consumed tokens: 37386977280 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.542023E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.583 | TFLOPs: 11.95 | 7: iteration 71320/ 173500 | consumed samples: 18257920 | consumed tokens: 37392220160 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.530438E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.214 | TFLOPs: 11.94 | 7: iteration 71330/ 173500 | consumed samples: 18260480 | consumed tokens: 37397463040 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.534212E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.887 | TFLOPs: 11.65 | 7: iteration 71340/ 173500 | consumed samples: 18263040 | consumed tokens: 37402705920 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.528299E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.807 | TFLOPs: 11.94 | 7: iteration 71350/ 173500 | consumed samples: 18265600 | consumed tokens: 37407948800 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.542729E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.747 | TFLOPs: 11.92 | 7: iteration 71360/ 173500 | consumed samples: 18268160 | consumed tokens: 37413191680 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.536736E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.739 | TFLOPs: 11.49 | 7: iteration 71370/ 173500 | consumed samples: 18270720 | consumed tokens: 37418434560 | elapsed time per iteration (s): 0.08 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 4.535085E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.244 | TFLOPs: 11.86 | 7: iteration 71380/ 173500 | consumed samples: 18273280 | consumed tokens: 37423677440 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.536010E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.113 | TFLOPs: 11.91 | 7: iteration 71390/ 173500 | consumed samples: 18275840 | consumed tokens: 37428920320 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.541433E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.085 | TFLOPs: 11.92 | 7: iteration 71400/ 173500 | consumed samples: 18278400 | consumed tokens: 37434163200 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.546166E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.093 | TFLOPs: 11.93 | 7: iteration 71410/ 173500 | consumed samples: 18280960 | consumed tokens: 37439406080 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.532580E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.935 | TFLOPs: 11.90 | 7: iteration 71420/ 173500 | consumed samples: 18283520 | consumed tokens: 37444648960 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.549585E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.868 | TFLOPs: 11.94 | 7: iteration 71430/ 173500 | consumed samples: 18286080 | consumed tokens: 37449891840 | elapsed time per iteration (s): 0.08 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 4.533642E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.700 | TFLOPs: 11.89 | 7: iteration 71440/ 173500 | consumed samples: 18288640 | consumed tokens: 37455134720 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.535762E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.989 | TFLOPs: 11.89 | 7: iteration 71450/ 173500 | consumed samples: 18291200 | consumed tokens: 37460377600 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.541792E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.869 | TFLOPs: 11.94 | 7: iteration 71460/ 173500 | consumed samples: 18293760 | consumed tokens: 37465620480 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.526673E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.036 | TFLOPs: 11.82 | 7: iteration 71470/ 173500 | consumed samples: 18296320 | consumed tokens: 37470863360 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.538214E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.640 | TFLOPs: 11.88 | 7: iteration 71480/ 173500 | consumed samples: 18298880 | consumed tokens: 37476106240 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.539911E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.630 | TFLOPs: 11.90 | 7: iteration 71490/ 173500 | consumed samples: 18301440 | consumed tokens: 37481349120 | elapsed time per iteration (s): 0.08 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 4.537962E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.158 | TFLOPs: 11.88 | 7: iteration 71500/ 173500 | consumed samples: 18304000 | consumed tokens: 37486592000 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.540817E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.427 | TFLOPs: 11.90 | 7: iteration 71510/ 173500 | consumed samples: 18306560 | consumed tokens: 37491834880 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.540904E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.902 | TFLOPs: 11.89 | 7: iteration 71520/ 173500 | consumed samples: 18309120 | consumed tokens: 37497077760 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.544547E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.351 | TFLOPs: 11.84 | 7: iteration 71530/ 173500 | consumed samples: 18311680 | consumed tokens: 37502320640 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.530552E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.079 | TFLOPs: 11.92 | 7: iteration 71540/ 173500 | consumed samples: 18314240 | consumed tokens: 37507563520 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.537420E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.117 | TFLOPs: 11.94 | 7: iteration 71550/ 173500 | consumed samples: 18316800 | consumed tokens: 37512806400 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.542910E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.371 | TFLOPs: 11.61 | 7: iteration 71560/ 173500 | consumed samples: 18319360 | consumed tokens: 37518049280 | elapsed time per iteration (s): 0.08 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 4.546689E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.823 | TFLOPs: 11.82 | 7: iteration 71570/ 173500 | consumed samples: 18321920 | consumed tokens: 37523292160 | elapsed time per iteration (s): 0.08 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.530410E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.312 | TFLOPs: 11.75 | 7: iteration 71580/ 173500 | consumed samples: 18324480 | consumed tokens: 37528535040 | elapsed time per iteration (s): 0.10 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.546404E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2625.580 | TFLOPs: 9.77 | 7: iteration 71590/ 173500 | consumed samples: 18327040 | consumed tokens: 37533777920 | elapsed time per iteration (s): 0.08 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.519187E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.332 | TFLOPs: 11.91 | 7: iteration 71600/ 173500 | consumed samples: 18329600 | consumed tokens: 37539020800 | elapsed time per iteration (s): 0.08 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.542366E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.013 | TFLOPs: 11.88 | 7: iteration 71610/ 173500 | consumed samples: 18332160 | consumed tokens: 37544263680 | elapsed time per iteration (s): 0.08 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.539218E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.769 | TFLOPs: 11.90 | 7: iteration 71620/ 173500 | consumed samples: 18334720 | consumed tokens: 37549506560 | elapsed time per iteration (s): 0.08 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 4.532020E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.280 | TFLOPs: 11.34 | 7: iteration 71630/ 173500 | consumed samples: 18337280 | consumed tokens: 37554749440 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.544037E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.959 | TFLOPs: 11.86 | 7: iteration 71640/ 173500 | consumed samples: 18339840 | consumed tokens: 37559992320 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.531630E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.266 | TFLOPs: 11.92 | 7: iteration 71650/ 173500 | consumed samples: 18342400 | consumed tokens: 37565235200 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.528658E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.593 | TFLOPs: 11.61 | 7: iteration 71660/ 173500 | consumed samples: 18344960 | consumed tokens: 37570478080 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.533677E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.507 | TFLOPs: 11.65 | 7: iteration 71670/ 173500 | consumed samples: 18347520 | consumed tokens: 37575720960 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.527815E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.036 | TFLOPs: 11.94 | 7: iteration 71680/ 173500 | consumed samples: 18350080 | consumed tokens: 37580963840 | elapsed time per iteration (s): 0.08 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 4.525422E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.333 | TFLOPs: 11.97 | 7: iteration 71690/ 173500 | consumed samples: 18352640 | consumed tokens: 37586206720 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.528994E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.573 | TFLOPs: 11.95 | 7: iteration 71700/ 173500 | consumed samples: 18355200 | consumed tokens: 37591449600 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.532428E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.248 | TFLOPs: 12.04 | 7: iteration 71710/ 173500 | consumed samples: 18357760 | consumed tokens: 37596692480 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.545557E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3265.380 | TFLOPs: 12.15 | 7: iteration 71720/ 173500 | consumed samples: 18360320 | consumed tokens: 37601935360 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.533607E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.776 | TFLOPs: 12.10 | 7: iteration 71730/ 173500 | consumed samples: 18362880 | consumed tokens: 37607178240 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.514744E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.177 | TFLOPs: 12.00 | 7: iteration 71740/ 173500 | consumed samples: 18365440 | consumed tokens: 37612421120 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.535314E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.624 | TFLOPs: 11.98 | 7: iteration 71750/ 173500 | consumed samples: 18368000 | consumed tokens: 37617664000 | elapsed time per iteration (s): 0.08 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 4.528103E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.662 | TFLOPs: 12.04 | 7: iteration 71760/ 173500 | consumed samples: 18370560 | consumed tokens: 37622906880 | elapsed time per iteration (s): 0.09 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.535731E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.157 | TFLOPs: 11.16 | 7: iteration 71770/ 173500 | consumed samples: 18373120 | consumed tokens: 37628149760 | elapsed time per iteration (s): 0.08 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.535979E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.386 | TFLOPs: 11.72 | 7: iteration 71780/ 173500 | consumed samples: 18375680 | consumed tokens: 37633392640 | elapsed time per iteration (s): 0.08 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.523507E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.146 | TFLOPs: 11.87 | 7: iteration 71790/ 173500 | consumed samples: 18378240 | consumed tokens: 37638635520 | elapsed time per iteration (s): 0.08 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.537708E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.051 | TFLOPs: 11.69 | 7: iteration 71800/ 173500 | consumed samples: 18380800 | consumed tokens: 37643878400 | elapsed time per iteration (s): 0.08 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.536253E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.160 | TFLOPs: 11.74 | 7: iteration 71810/ 173500 | consumed samples: 18383360 | consumed tokens: 37649121280 | elapsed time per iteration (s): 0.08 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 4.536954E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.436 | TFLOPs: 12.01 | 7: iteration 71820/ 173500 | consumed samples: 18385920 | consumed tokens: 37654364160 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.538980E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.607 | TFLOPs: 11.92 | 7: iteration 71830/ 173500 | consumed samples: 18388480 | consumed tokens: 37659607040 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.549903E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.812 | TFLOPs: 12.01 | 7: iteration 71840/ 173500 | consumed samples: 18391040 | consumed tokens: 37664849920 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.533640E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.903 | TFLOPs: 11.46 | 7: iteration 71850/ 173500 | consumed samples: 18393600 | consumed tokens: 37670092800 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.546722E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.269 | TFLOPs: 11.70 | 7: iteration 71860/ 173500 | consumed samples: 18396160 | consumed tokens: 37675335680 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.538966E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.015 | TFLOPs: 11.99 | 7: iteration 71870/ 173500 | consumed samples: 18398720 | consumed tokens: 37680578560 | elapsed time per iteration (s): 0.08 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 4.534664E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.161 | TFLOPs: 12.06 | 7: iteration 71880/ 173500 | consumed samples: 18401280 | consumed tokens: 37685821440 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.529799E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.804 | TFLOPs: 12.02 | 7: iteration 71890/ 173500 | consumed samples: 18403840 | consumed tokens: 37691064320 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.533837E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.266 | TFLOPs: 12.04 | 7: iteration 71900/ 173500 | consumed samples: 18406400 | consumed tokens: 37696307200 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.533961E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.607 | TFLOPs: 11.98 | 7: iteration 71910/ 173500 | consumed samples: 18408960 | consumed tokens: 37701550080 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.540373E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.068 | TFLOPs: 12.03 | 7: iteration 71920/ 173500 | consumed samples: 18411520 | consumed tokens: 37706792960 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.525747E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.324 | TFLOPs: 12.01 | 7: iteration 71930/ 173500 | consumed samples: 18414080 | consumed tokens: 37712035840 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.537910E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.418 | TFLOPs: 11.26 | 7: iteration 71940/ 173500 | consumed samples: 18416640 | consumed tokens: 37717278720 | elapsed time per iteration (s): 0.08 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 4.536196E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.265 | TFLOPs: 11.63 | 7: iteration 71950/ 173500 | consumed samples: 18419200 | consumed tokens: 37722521600 | elapsed time per iteration (s): 0.08 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.524019E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.474 | TFLOPs: 11.88 | 7: iteration 71960/ 173500 | consumed samples: 18421760 | consumed tokens: 37727764480 | elapsed time per iteration (s): 0.13 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.532101E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.543 | TFLOPs: 7.37 | 7: iteration 71970/ 173500 | consumed samples: 18424320 | consumed tokens: 37733007360 | elapsed time per iteration (s): 0.08 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.539734E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.220 | TFLOPs: 11.91 | 7: iteration 71980/ 173500 | consumed samples: 18426880 | consumed tokens: 37738250240 | elapsed time per iteration (s): 0.08 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.524785E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.426 | TFLOPs: 11.63 | 7: iteration 71990/ 173500 | consumed samples: 18429440 | consumed tokens: 37743493120 | elapsed time per iteration (s): 0.08 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.532473E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.165 | TFLOPs: 11.84 | 0: [2023-03-17 02:00:14,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=0, lr=[0.0001353602432066091, 0.0001353602432066091, 0.0001353602432066091], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 72000/ 173500 | consumed samples: 18432000 | consumed tokens: 37748736000 | elapsed time per iteration (s): 0.08 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.533930E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.513 | TFLOPs: 11.80 | 0: steps: 72000 loss: 4.5341 iter time (s): 0.080 samples/sec: 3212.607 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 72000 | lm loss value: 4.371984E+00 | lm loss PPL: 7.920061E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 72000 to checkpoints_14m91b100m 0: [2023-03-17 02:00:14,762] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step72000 is begin to save! 0: [2023-03-17 02:00:14,765] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:00:14,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:00:14,791] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:00:14,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:00:14,794] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:00:14,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:00:14,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:00:14,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:00:14,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:00:14,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:00:14,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:00:14,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:00:14,804] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step72000/mp_rank_00_model_states.pt 0: [2023-03-17 02:00:14,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:00:14,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:00:14,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:00:14,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:00:14,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:00:14,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:00:14,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:00:14,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2023-03-17 02:00:14,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:00:14,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:00:14,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: successfully saved checkpoint at iteration 72000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.28 7: iteration 72010/ 173500 | consumed samples: 18434560 | consumed tokens: 37753978880 | elapsed time per iteration (s): 0.10 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.529513E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.633 | TFLOPs: 10.02 | 7: iteration 72020/ 173500 | consumed samples: 18437120 | consumed tokens: 37759221760 | elapsed time per iteration (s): 0.08 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.535881E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.054 | TFLOPs: 11.88 | 7: iteration 72030/ 173500 | consumed samples: 18439680 | consumed tokens: 37764464640 | elapsed time per iteration (s): 0.08 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.525276E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.360 | TFLOPs: 11.90 | 7: iteration 72040/ 173500 | consumed samples: 18442240 | consumed tokens: 37769707520 | elapsed time per iteration (s): 0.08 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.525989E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.845 | TFLOPs: 11.38 | 7: iteration 72050/ 173500 | consumed samples: 18444800 | consumed tokens: 37774950400 | elapsed time per iteration (s): 0.08 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.532505E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.897 | TFLOPs: 11.45 | 7: iteration 72060/ 173500 | consumed samples: 18447360 | consumed tokens: 37780193280 | elapsed time per iteration (s): 0.09 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 4.526444E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2933.061 | TFLOPs: 10.91 | 7: iteration 72070/ 173500 | consumed samples: 18449920 | consumed tokens: 37785436160 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.535055E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.164 | TFLOPs: 11.61 | 7: iteration 72080/ 173500 | consumed samples: 18452480 | consumed tokens: 37790679040 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.541432E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.716 | TFLOPs: 11.83 | 7: iteration 72090/ 173500 | consumed samples: 18455040 | consumed tokens: 37795921920 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.530716E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.106 | TFLOPs: 11.86 | 7: iteration 72100/ 173500 | consumed samples: 18457600 | consumed tokens: 37801164800 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.543471E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.232 | TFLOPs: 11.81 | 7: iteration 72110/ 173500 | consumed samples: 18460160 | consumed tokens: 37806407680 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.544490E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.199 | TFLOPs: 11.88 | 7: iteration 72120/ 173500 | consumed samples: 18462720 | consumed tokens: 37811650560 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.548870E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.665 | TFLOPs: 11.59 | 7: iteration 72130/ 173500 | consumed samples: 18465280 | consumed tokens: 37816893440 | elapsed time per iteration (s): 0.08 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 4.539764E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.303 | TFLOPs: 11.83 | 7: iteration 72140/ 173500 | consumed samples: 18467840 | consumed tokens: 37822136320 | elapsed time per iteration (s): 0.08 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.547712E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.541 | TFLOPs: 11.86 | 7: iteration 72150/ 173500 | consumed samples: 18470400 | consumed tokens: 37827379200 | elapsed time per iteration (s): 0.08 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.541458E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.024 | TFLOPs: 11.82 | 7: iteration 72160/ 173500 | consumed samples: 18472960 | consumed tokens: 37832622080 | elapsed time per iteration (s): 0.08 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.533860E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.408 | TFLOPs: 11.81 | 7: iteration 72170/ 173500 | consumed samples: 18475520 | consumed tokens: 37837864960 | elapsed time per iteration (s): 0.08 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.539443E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.384 | TFLOPs: 11.58 | 7: iteration 72180/ 173500 | consumed samples: 18478080 | consumed tokens: 37843107840 | elapsed time per iteration (s): 0.08 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.543285E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.422 | TFLOPs: 11.84 | 7: iteration 72190/ 173500 | consumed samples: 18480640 | consumed tokens: 37848350720 | elapsed time per iteration (s): 0.09 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 4.545116E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.602 | TFLOPs: 10.98 | 7: iteration 72200/ 173500 | consumed samples: 18483200 | consumed tokens: 37853593600 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.544193E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.389 | TFLOPs: 11.33 | 7: iteration 72210/ 173500 | consumed samples: 18485760 | consumed tokens: 37858836480 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.545263E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.490 | TFLOPs: 11.80 | 7: iteration 72220/ 173500 | consumed samples: 18488320 | consumed tokens: 37864079360 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.538878E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.557 | TFLOPs: 11.85 | 7: iteration 72230/ 173500 | consumed samples: 18490880 | consumed tokens: 37869322240 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.520386E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.277 | TFLOPs: 11.75 | 7: iteration 72240/ 173500 | consumed samples: 18493440 | consumed tokens: 37874565120 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.533987E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.915 | TFLOPs: 11.87 | 7: iteration 72250/ 173500 | consumed samples: 18496000 | consumed tokens: 37879808000 | elapsed time per iteration (s): 0.08 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 4.539281E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.305 | TFLOPs: 11.82 | 7: iteration 72260/ 173500 | consumed samples: 18498560 | consumed tokens: 37885050880 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.531581E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.679 | TFLOPs: 11.78 | 7: iteration 72270/ 173500 | consumed samples: 18501120 | consumed tokens: 37890293760 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.521537E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.614 | TFLOPs: 11.88 | 7: iteration 72280/ 173500 | consumed samples: 18503680 | consumed tokens: 37895536640 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.556035E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.829 | TFLOPs: 11.85 | 7: iteration 72290/ 173500 | consumed samples: 18506240 | consumed tokens: 37900779520 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.531952E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.378 | TFLOPs: 11.85 | 7: iteration 72300/ 173500 | consumed samples: 18508800 | consumed tokens: 37906022400 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.531385E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.035 | TFLOPs: 11.84 | 7: iteration 72310/ 173500 | consumed samples: 18511360 | consumed tokens: 37911265280 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.537909E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.881 | TFLOPs: 11.81 | 7: iteration 72320/ 173500 | consumed samples: 18513920 | consumed tokens: 37916508160 | elapsed time per iteration (s): 0.08 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 4.535508E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.538 | TFLOPs: 11.82 | 7: iteration 72330/ 173500 | consumed samples: 18516480 | consumed tokens: 37921751040 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.544350E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.454 | TFLOPs: 11.77 | 7: iteration 72340/ 173500 | consumed samples: 18519040 | consumed tokens: 37926993920 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.545842E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.648 | TFLOPs: 11.85 | 7: iteration 72350/ 173500 | consumed samples: 18521600 | consumed tokens: 37932236800 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.531533E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.673 | TFLOPs: 11.86 | 7: iteration 72360/ 173500 | consumed samples: 18524160 | consumed tokens: 37937479680 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.529786E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.174 | TFLOPs: 11.81 | 7: iteration 72370/ 173500 | consumed samples: 18526720 | consumed tokens: 37942722560 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.527086E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.271 | TFLOPs: 11.87 | 7: iteration 72380/ 173500 | consumed samples: 18529280 | consumed tokens: 37947965440 | elapsed time per iteration (s): 0.08 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 4.540218E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.309 | TFLOPs: 11.63 | 7: iteration 72390/ 173500 | consumed samples: 18531840 | consumed tokens: 37953208320 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.527611E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.329 | TFLOPs: 11.86 | 7: iteration 72400/ 173500 | consumed samples: 18534400 | consumed tokens: 37958451200 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.534549E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.652 | TFLOPs: 11.90 | 7: iteration 72410/ 173500 | consumed samples: 18536960 | consumed tokens: 37963694080 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.546790E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.789 | TFLOPs: 11.85 | 7: iteration 72420/ 173500 | consumed samples: 18539520 | consumed tokens: 37968936960 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.538504E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.704 | TFLOPs: 11.85 | 7: iteration 72430/ 173500 | consumed samples: 18542080 | consumed tokens: 37974179840 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.531658E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.098 | TFLOPs: 11.84 | 7: iteration 72440/ 173500 | consumed samples: 18544640 | consumed tokens: 37979422720 | elapsed time per iteration (s): 0.08 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 4.536048E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.838 | TFLOPs: 11.82 | 7: iteration 72450/ 173500 | consumed samples: 18547200 | consumed tokens: 37984665600 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.526561E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.277 | TFLOPs: 11.85 | 7: iteration 72460/ 173500 | consumed samples: 18549760 | consumed tokens: 37989908480 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.541349E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.255 | TFLOPs: 11.84 | 7: iteration 72470/ 173500 | consumed samples: 18552320 | consumed tokens: 37995151360 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.541559E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.813 | TFLOPs: 11.87 | 7: iteration 72480/ 173500 | consumed samples: 18554880 | consumed tokens: 38000394240 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.529236E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.287 | TFLOPs: 11.86 | 7: iteration 72490/ 173500 | consumed samples: 18557440 | consumed tokens: 38005637120 | elapsed time per iteration (s): 0.10 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.538826E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.155 | TFLOPs: 9.92 | 7: iteration 72500/ 173500 | consumed samples: 18560000 | consumed tokens: 38010880000 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.552079E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.249 | TFLOPs: 11.81 | 7: iteration 72510/ 173500 | consumed samples: 18562560 | consumed tokens: 38016122880 | elapsed time per iteration (s): 0.08 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 4.548857E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.375 | TFLOPs: 11.92 | 7: iteration 72520/ 173500 | consumed samples: 18565120 | consumed tokens: 38021365760 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.528569E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.656 | TFLOPs: 11.90 | 7: iteration 72530/ 173500 | consumed samples: 18567680 | consumed tokens: 38026608640 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.526773E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.595 | TFLOPs: 11.82 | 7: iteration 72540/ 173500 | consumed samples: 18570240 | consumed tokens: 38031851520 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.532394E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.825 | TFLOPs: 11.92 | 7: iteration 72550/ 173500 | consumed samples: 18572800 | consumed tokens: 38037094400 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.527123E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.763 | TFLOPs: 11.91 | 7: iteration 72560/ 173500 | consumed samples: 18575360 | consumed tokens: 38042337280 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.537711E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.142 | TFLOPs: 11.86 | 7: iteration 72570/ 173500 | consumed samples: 18577920 | consumed tokens: 38047580160 | elapsed time per iteration (s): 0.08 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 4.535136E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.259 | TFLOPs: 11.83 | 7: iteration 72580/ 173500 | consumed samples: 18580480 | consumed tokens: 38052823040 | elapsed time per iteration (s): 0.11 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.531532E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2296.108 | TFLOPs: 8.54 | 7: iteration 72590/ 173500 | consumed samples: 18583040 | consumed tokens: 38058065920 | elapsed time per iteration (s): 0.10 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.530867E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.773 | TFLOPs: 9.29 | 7: iteration 72600/ 173500 | consumed samples: 18585600 | consumed tokens: 38063308800 | elapsed time per iteration (s): 0.08 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.517884E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.213 | TFLOPs: 11.83 | 7: iteration 72610/ 173500 | consumed samples: 18588160 | consumed tokens: 38068551680 | elapsed time per iteration (s): 0.08 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.538333E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.649 | TFLOPs: 11.85 | 7: iteration 72620/ 173500 | consumed samples: 18590720 | consumed tokens: 38073794560 | elapsed time per iteration (s): 0.08 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.540952E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.618 | TFLOPs: 11.92 | 7: iteration 72630/ 173500 | consumed samples: 18593280 | consumed tokens: 38079037440 | elapsed time per iteration (s): 0.08 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 4.525684E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.656 | TFLOPs: 11.89 | 7: iteration 72640/ 173500 | consumed samples: 18595840 | consumed tokens: 38084280320 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.521273E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.533 | TFLOPs: 11.89 | 7: iteration 72650/ 173500 | consumed samples: 18598400 | consumed tokens: 38089523200 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.534208E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.476 | TFLOPs: 11.44 | 7: iteration 72660/ 173500 | consumed samples: 18600960 | consumed tokens: 38094766080 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.542413E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.382 | TFLOPs: 11.93 | 7: iteration 72670/ 173500 | consumed samples: 18603520 | consumed tokens: 38100008960 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.515707E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.592 | TFLOPs: 11.95 | 7: iteration 72680/ 173500 | consumed samples: 18606080 | consumed tokens: 38105251840 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.528165E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.316 | TFLOPs: 11.89 | 7: iteration 72690/ 173500 | consumed samples: 18608640 | consumed tokens: 38110494720 | elapsed time per iteration (s): 0.09 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.547142E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2976.945 | TFLOPs: 11.07 | 7: iteration 72700/ 173500 | consumed samples: 18611200 | consumed tokens: 38115737600 | elapsed time per iteration (s): 0.08 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 4.539004E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.132 | TFLOPs: 11.66 | 7: iteration 72710/ 173500 | consumed samples: 18613760 | consumed tokens: 38120980480 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.514155E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.356 | TFLOPs: 11.96 | 7: iteration 72720/ 173500 | consumed samples: 18616320 | consumed tokens: 38126223360 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.521313E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.585 | TFLOPs: 11.91 | 7: iteration 72730/ 173500 | consumed samples: 18618880 | consumed tokens: 38131466240 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.536252E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.038 | TFLOPs: 11.87 | 7: iteration 72740/ 173500 | consumed samples: 18621440 | consumed tokens: 38136709120 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.530515E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.977 | TFLOPs: 11.76 | 7: iteration 72750/ 173500 | consumed samples: 18624000 | consumed tokens: 38141952000 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.544182E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.282 | TFLOPs: 11.77 | 7: iteration 72760/ 173500 | consumed samples: 18626560 | consumed tokens: 38147194880 | elapsed time per iteration (s): 0.08 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 4.537116E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.236 | TFLOPs: 11.78 | 7: iteration 72770/ 173500 | consumed samples: 18629120 | consumed tokens: 38152437760 | elapsed time per iteration (s): 0.09 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.523752E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.080 | TFLOPs: 11.18 | 7: iteration 72780/ 173500 | consumed samples: 18631680 | consumed tokens: 38157680640 | elapsed time per iteration (s): 0.08 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.541846E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.472 | TFLOPs: 11.84 | 7: iteration 72790/ 173500 | consumed samples: 18634240 | consumed tokens: 38162923520 | elapsed time per iteration (s): 0.08 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.537740E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.838 | TFLOPs: 11.87 | 7: iteration 72800/ 173500 | consumed samples: 18636800 | consumed tokens: 38168166400 | elapsed time per iteration (s): 0.08 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.531625E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.228 | TFLOPs: 11.87 | 7: iteration 72810/ 173500 | consumed samples: 18639360 | consumed tokens: 38173409280 | elapsed time per iteration (s): 0.08 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.543383E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.583 | TFLOPs: 11.87 | 7: iteration 72820/ 173500 | consumed samples: 18641920 | consumed tokens: 38178652160 | elapsed time per iteration (s): 0.08 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 4.531923E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.074 | TFLOPs: 11.88 | 7: iteration 72830/ 173500 | consumed samples: 18644480 | consumed tokens: 38183895040 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.514014E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.023 | TFLOPs: 11.88 | 7: iteration 72840/ 173500 | consumed samples: 18647040 | consumed tokens: 38189137920 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.537072E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.756 | TFLOPs: 11.85 | 7: iteration 72850/ 173500 | consumed samples: 18649600 | consumed tokens: 38194380800 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.530545E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.877 | TFLOPs: 11.91 | 7: iteration 72860/ 173500 | consumed samples: 18652160 | consumed tokens: 38199623680 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.536898E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.880 | TFLOPs: 12.02 | 7: iteration 72870/ 173500 | consumed samples: 18654720 | consumed tokens: 38204866560 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.544524E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.479 | TFLOPs: 11.96 | 7: iteration 72880/ 173500 | consumed samples: 18657280 | consumed tokens: 38210109440 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.536245E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.705 | TFLOPs: 11.99 | 7: iteration 72890/ 173500 | consumed samples: 18659840 | consumed tokens: 38215352320 | elapsed time per iteration (s): 0.08 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 4.540305E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.064 | TFLOPs: 12.00 | 7: iteration 72900/ 173500 | consumed samples: 18662400 | consumed tokens: 38220595200 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.541887E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.722 | TFLOPs: 11.98 | 7: iteration 72910/ 173500 | consumed samples: 18664960 | consumed tokens: 38225838080 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.529449E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.174 | TFLOPs: 12.05 | 7: iteration 72920/ 173500 | consumed samples: 18667520 | consumed tokens: 38231080960 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.526789E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.092 | TFLOPs: 12.03 | 7: iteration 72930/ 173500 | consumed samples: 18670080 | consumed tokens: 38236323840 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.525862E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.855 | TFLOPs: 11.92 | 7: iteration 72940/ 173500 | consumed samples: 18672640 | consumed tokens: 38241566720 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.537057E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.051 | TFLOPs: 11.99 | 7: iteration 72950/ 173500 | consumed samples: 18675200 | consumed tokens: 38246809600 | elapsed time per iteration (s): 0.08 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 4.523789E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.120 | TFLOPs: 11.67 | 7: iteration 72960/ 173500 | consumed samples: 18677760 | consumed tokens: 38252052480 | elapsed time per iteration (s): 0.08 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.526983E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.657 | TFLOPs: 11.95 | 7: iteration 72970/ 173500 | consumed samples: 18680320 | consumed tokens: 38257295360 | elapsed time per iteration (s): 0.08 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.534024E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.701 | TFLOPs: 11.64 | 7: iteration 72980/ 173500 | consumed samples: 18682880 | consumed tokens: 38262538240 | elapsed time per iteration (s): 0.08 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.532304E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.673 | TFLOPs: 11.97 | 7: iteration 72990/ 173500 | consumed samples: 18685440 | consumed tokens: 38267781120 | elapsed time per iteration (s): 0.08 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.539160E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.564 | TFLOPs: 11.96 | 7: iteration 73000/ 173500 | consumed samples: 18688000 | consumed tokens: 38273024000 | elapsed time per iteration (s): 0.08 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.551343E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.067 | TFLOPs: 11.69 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 73000 | lm loss value: 4.415477E+00 | lm loss PPL: 8.272131E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 73000 to checkpoints_14m91b100m 0: [2023-03-17 02:01:36,291] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step73000 is begin to save! 0: [2023-03-17 02:01:36,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:01:36,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:01:36,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:01:36,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:01:36,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:01:36,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:01:36,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:01:36,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:01:36,329] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:01:36,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:01:36,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:01:36,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:01:36,333] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step73000/mp_rank_00_model_states.pt 0: [2023-03-17 02:01:36,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:01:36,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:01:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:01:36,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2023-03-17 02:01:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 02:01:36,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2023-03-17 02:01:36,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:01:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:01:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2023-03-17 02:01:36,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:01:36,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:01:36,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: successfully saved checkpoint at iteration 73000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.10 7: iteration 73010/ 173500 | consumed samples: 18690560 | consumed tokens: 38278266880 | elapsed time per iteration (s): 0.09 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 4.540433E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.980 | TFLOPs: 10.38 | 7: iteration 73020/ 173500 | consumed samples: 18693120 | consumed tokens: 38283509760 | elapsed time per iteration (s): 0.08 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.527060E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.562 | TFLOPs: 11.95 | 7: iteration 73030/ 173500 | consumed samples: 18695680 | consumed tokens: 38288752640 | elapsed time per iteration (s): 0.08 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.532771E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.411 | TFLOPs: 11.92 | 7: iteration 73040/ 173500 | consumed samples: 18698240 | consumed tokens: 38293995520 | elapsed time per iteration (s): 0.08 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.535785E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.868 | TFLOPs: 11.96 | 7: iteration 73050/ 173500 | consumed samples: 18700800 | consumed tokens: 38299238400 | elapsed time per iteration (s): 0.08 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.542504E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.201 | TFLOPs: 11.93 | 7: iteration 73060/ 173500 | consumed samples: 18703360 | consumed tokens: 38304481280 | elapsed time per iteration (s): 0.13 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.528984E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2042.609 | TFLOPs: 7.60 | 7: iteration 73070/ 173500 | consumed samples: 18705920 | consumed tokens: 38309724160 | elapsed time per iteration (s): 0.13 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.530371E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1980.441 | TFLOPs: 7.37 | 7: iteration 73080/ 173500 | consumed samples: 18708480 | consumed tokens: 38314967040 | elapsed time per iteration (s): 0.09 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.526245E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2788.148 | TFLOPs: 10.37 | 7: iteration 73090/ 173500 | consumed samples: 18711040 | consumed tokens: 38320209920 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.531115E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.152 | TFLOPs: 11.57 | 7: iteration 73100/ 173500 | consumed samples: 18713600 | consumed tokens: 38325452800 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.529622E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.795 | TFLOPs: 11.57 | 7: iteration 73110/ 173500 | consumed samples: 18716160 | consumed tokens: 38330695680 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.529228E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.540 | TFLOPs: 11.82 | 7: iteration 73120/ 173500 | consumed samples: 18718720 | consumed tokens: 38335938560 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.535415E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.489 | TFLOPs: 11.84 | 7: iteration 73130/ 173500 | consumed samples: 18721280 | consumed tokens: 38341181440 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.537754E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.592 | TFLOPs: 11.24 | 7: iteration 73140/ 173500 | consumed samples: 18723840 | consumed tokens: 38346424320 | elapsed time per iteration (s): 0.08 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 4.531727E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.491 | TFLOPs: 11.82 | 7: iteration 73150/ 173500 | consumed samples: 18726400 | consumed tokens: 38351667200 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.534782E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.341 | TFLOPs: 11.71 | 7: iteration 73160/ 173500 | consumed samples: 18728960 | consumed tokens: 38356910080 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.536305E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.283 | TFLOPs: 11.79 | 7: iteration 73170/ 173500 | consumed samples: 18731520 | consumed tokens: 38362152960 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.534911E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.786 | TFLOPs: 11.77 | 7: iteration 73180/ 173500 | consumed samples: 18734080 | consumed tokens: 38367395840 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.528302E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.288 | TFLOPs: 11.77 | 7: iteration 73190/ 173500 | consumed samples: 18736640 | consumed tokens: 38372638720 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.547993E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.214 | TFLOPs: 11.50 | 7: iteration 73200/ 173500 | consumed samples: 18739200 | consumed tokens: 38377881600 | elapsed time per iteration (s): 0.08 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 4.528689E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.788 | TFLOPs: 11.81 | 7: iteration 73210/ 173500 | consumed samples: 18741760 | consumed tokens: 38383124480 | elapsed time per iteration (s): 0.08 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.540129E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.923 | TFLOPs: 11.78 | 7: iteration 73220/ 173500 | consumed samples: 18744320 | consumed tokens: 38388367360 | elapsed time per iteration (s): 0.09 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.537629E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2824.460 | TFLOPs: 10.51 | 7: iteration 73230/ 173500 | consumed samples: 18746880 | consumed tokens: 38393610240 | elapsed time per iteration (s): 0.10 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.533436E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2664.111 | TFLOPs: 9.91 | 7: iteration 73240/ 173500 | consumed samples: 18749440 | consumed tokens: 38398853120 | elapsed time per iteration (s): 0.08 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.529191E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.846 | TFLOPs: 11.84 | 7: iteration 73250/ 173500 | consumed samples: 18752000 | consumed tokens: 38404096000 | elapsed time per iteration (s): 0.08 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.528728E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.468 | TFLOPs: 11.85 | 7: iteration 73260/ 173500 | consumed samples: 18754560 | consumed tokens: 38409338880 | elapsed time per iteration (s): 0.08 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 4.523906E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.825 | TFLOPs: 11.73 | 7: iteration 73270/ 173500 | consumed samples: 18757120 | consumed tokens: 38414581760 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.530252E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.952 | TFLOPs: 11.83 | 7: iteration 73280/ 173500 | consumed samples: 18759680 | consumed tokens: 38419824640 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.538249E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.397 | TFLOPs: 11.80 | 7: iteration 73290/ 173500 | consumed samples: 18762240 | consumed tokens: 38425067520 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.544857E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.362 | TFLOPs: 11.72 | 7: iteration 73300/ 173500 | consumed samples: 18764800 | consumed tokens: 38430310400 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.543739E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.122 | TFLOPs: 11.84 | 7: iteration 73310/ 173500 | consumed samples: 18767360 | consumed tokens: 38435553280 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.546353E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.717 | TFLOPs: 11.81 | 7: iteration 73320/ 173500 | consumed samples: 18769920 | consumed tokens: 38440796160 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.532008E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.771 | TFLOPs: 11.78 | 7: iteration 73330/ 173500 | consumed samples: 18772480 | consumed tokens: 38446039040 | elapsed time per iteration (s): 0.08 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 4.518432E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.938 | TFLOPs: 11.84 | 7: iteration 73340/ 173500 | consumed samples: 18775040 | consumed tokens: 38451281920 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.531866E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.428 | TFLOPs: 11.88 | 7: iteration 73350/ 173500 | consumed samples: 18777600 | consumed tokens: 38456524800 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.542626E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.526 | TFLOPs: 11.84 | 7: iteration 73360/ 173500 | consumed samples: 18780160 | consumed tokens: 38461767680 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.527785E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.265 | TFLOPs: 11.81 | 7: iteration 73370/ 173500 | consumed samples: 18782720 | consumed tokens: 38467010560 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.544363E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.787 | TFLOPs: 11.78 | 7: iteration 73380/ 173500 | consumed samples: 18785280 | consumed tokens: 38472253440 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.523091E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.730 | TFLOPs: 11.54 | 7: iteration 73390/ 173500 | consumed samples: 18787840 | consumed tokens: 38477496320 | elapsed time per iteration (s): 0.08 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 4.540097E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.044 | TFLOPs: 11.79 | 7: iteration 73400/ 173500 | consumed samples: 18790400 | consumed tokens: 38482739200 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.534296E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.072 | TFLOPs: 11.83 | 7: iteration 73410/ 173500 | consumed samples: 18792960 | consumed tokens: 38487982080 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.542643E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.613 | TFLOPs: 11.80 | 7: iteration 73420/ 173500 | consumed samples: 18795520 | consumed tokens: 38493224960 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.537505E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.245 | TFLOPs: 11.52 | 7: iteration 73430/ 173500 | consumed samples: 18798080 | consumed tokens: 38498467840 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.531114E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.871 | TFLOPs: 11.87 | 7: iteration 73440/ 173500 | consumed samples: 18800640 | consumed tokens: 38503710720 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.541457E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.929 | TFLOPs: 11.86 | 7: iteration 73450/ 173500 | consumed samples: 18803200 | consumed tokens: 38508953600 | elapsed time per iteration (s): 0.08 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 4.529103E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.516 | TFLOPs: 11.85 | 7: iteration 73460/ 173500 | consumed samples: 18805760 | consumed tokens: 38514196480 | elapsed time per iteration (s): 0.09 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.532997E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.560 | TFLOPs: 10.76 | 7: iteration 73470/ 173500 | consumed samples: 18808320 | consumed tokens: 38519439360 | elapsed time per iteration (s): 0.08 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.521085E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.246 | TFLOPs: 11.88 | 7: iteration 73480/ 173500 | consumed samples: 18810880 | consumed tokens: 38524682240 | elapsed time per iteration (s): 0.09 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.534149E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.240 | TFLOPs: 11.02 | 7: iteration 73490/ 173500 | consumed samples: 18813440 | consumed tokens: 38529925120 | elapsed time per iteration (s): 0.09 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.541259E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2885.903 | TFLOPs: 10.73 | 7: iteration 73500/ 173500 | consumed samples: 18816000 | consumed tokens: 38535168000 | elapsed time per iteration (s): 0.11 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.528479E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.207 | TFLOPs: 8.59 | 7: iteration 73510/ 173500 | consumed samples: 18818560 | consumed tokens: 38540410880 | elapsed time per iteration (s): 0.08 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.546364E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.151 | TFLOPs: 11.88 | 7: iteration 73520/ 173500 | consumed samples: 18821120 | consumed tokens: 38545653760 | elapsed time per iteration (s): 0.08 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 4.530825E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.979 | TFLOPs: 11.84 | 7: iteration 73530/ 173500 | consumed samples: 18823680 | consumed tokens: 38550896640 | elapsed time per iteration (s): 0.08 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.529757E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.287 | TFLOPs: 11.88 | 7: iteration 73540/ 173500 | consumed samples: 18826240 | consumed tokens: 38556139520 | elapsed time per iteration (s): 0.08 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.539708E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.600 | TFLOPs: 11.83 | 7: iteration 73550/ 173500 | consumed samples: 18828800 | consumed tokens: 38561382400 | elapsed time per iteration (s): 0.10 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.533335E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2538.334 | TFLOPs: 9.44 | 7: iteration 73560/ 173500 | consumed samples: 18831360 | consumed tokens: 38566625280 | elapsed time per iteration (s): 0.08 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.529918E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.992 | TFLOPs: 11.98 | 7: iteration 73570/ 173500 | consumed samples: 18833920 | consumed tokens: 38571868160 | elapsed time per iteration (s): 0.08 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.537424E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.870 | TFLOPs: 12.03 | 7: iteration 73580/ 173500 | consumed samples: 18836480 | consumed tokens: 38577111040 | elapsed time per iteration (s): 0.08 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 4.537060E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.069 | TFLOPs: 11.94 | 7: iteration 73590/ 173500 | consumed samples: 18839040 | consumed tokens: 38582353920 | elapsed time per iteration (s): 0.08 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.531763E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.924 | TFLOPs: 11.85 | 7: iteration 73600/ 173500 | consumed samples: 18841600 | consumed tokens: 38587596800 | elapsed time per iteration (s): 0.08 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.527314E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.905 | TFLOPs: 11.84 | 7: iteration 73610/ 173500 | consumed samples: 18844160 | consumed tokens: 38592839680 | elapsed time per iteration (s): 0.09 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.546153E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2929.511 | TFLOPs: 10.90 | 7: iteration 73620/ 173500 | consumed samples: 18846720 | consumed tokens: 38598082560 | elapsed time per iteration (s): 0.09 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.526517E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.543 | TFLOPs: 11.06 | 7: iteration 73630/ 173500 | consumed samples: 18849280 | consumed tokens: 38603325440 | elapsed time per iteration (s): 0.08 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.537858E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.446 | TFLOPs: 11.82 | 7: iteration 73640/ 173500 | consumed samples: 18851840 | consumed tokens: 38608568320 | elapsed time per iteration (s): 0.08 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 4.515713E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.011 | TFLOPs: 11.80 | 7: iteration 73650/ 173500 | consumed samples: 18854400 | consumed tokens: 38613811200 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.525791E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.999 | TFLOPs: 11.82 | 7: iteration 73660/ 173500 | consumed samples: 18856960 | consumed tokens: 38619054080 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.537251E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.705 | TFLOPs: 11.85 | 7: iteration 73670/ 173500 | consumed samples: 18859520 | consumed tokens: 38624296960 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.532815E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.836 | TFLOPs: 11.80 | 7: iteration 73680/ 173500 | consumed samples: 18862080 | consumed tokens: 38629539840 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.542739E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.890 | TFLOPs: 11.79 | 7: iteration 73690/ 173500 | consumed samples: 18864640 | consumed tokens: 38634782720 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.543451E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.105 | TFLOPs: 11.81 | 7: iteration 73700/ 173500 | consumed samples: 18867200 | consumed tokens: 38640025600 | elapsed time per iteration (s): 0.08 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 4.537806E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.682 | TFLOPs: 11.86 | 7: iteration 73710/ 173500 | consumed samples: 18869760 | consumed tokens: 38645268480 | elapsed time per iteration (s): 0.08 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.523993E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.304 | TFLOPs: 11.86 | 7: iteration 73720/ 173500 | consumed samples: 18872320 | consumed tokens: 38650511360 | elapsed time per iteration (s): 0.08 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.541116E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.066 | TFLOPs: 11.88 | 7: iteration 73730/ 173500 | consumed samples: 18874880 | consumed tokens: 38655754240 | elapsed time per iteration (s): 0.09 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.547016E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.315 | TFLOPs: 10.99 | 7: iteration 73740/ 173500 | consumed samples: 18877440 | consumed tokens: 38660997120 | elapsed time per iteration (s): 0.08 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.527329E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.175 | TFLOPs: 11.56 | 7: iteration 73750/ 173500 | consumed samples: 18880000 | consumed tokens: 38666240000 | elapsed time per iteration (s): 0.08 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.546748E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.070 | TFLOPs: 11.81 | 7: iteration 73760/ 173500 | consumed samples: 18882560 | consumed tokens: 38671482880 | elapsed time per iteration (s): 0.08 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.524836E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.644 | TFLOPs: 11.76 | 7: iteration 73770/ 173500 | consumed samples: 18885120 | consumed tokens: 38676725760 | elapsed time per iteration (s): 0.10 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 4.531314E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2447.300 | TFLOPs: 9.10 | 7: iteration 73780/ 173500 | consumed samples: 18887680 | consumed tokens: 38681968640 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.528328E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.062 | TFLOPs: 11.85 | 7: iteration 73790/ 173500 | consumed samples: 18890240 | consumed tokens: 38687211520 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.521788E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.677 | TFLOPs: 11.82 | 7: iteration 73800/ 173500 | consumed samples: 18892800 | consumed tokens: 38692454400 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.535483E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.088 | TFLOPs: 11.77 | 7: iteration 73810/ 173500 | consumed samples: 18895360 | consumed tokens: 38697697280 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.535934E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.315 | TFLOPs: 11.74 | 7: iteration 73820/ 173500 | consumed samples: 18897920 | consumed tokens: 38702940160 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.542852E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.196 | TFLOPs: 11.80 | 7: iteration 73830/ 173500 | consumed samples: 18900480 | consumed tokens: 38708183040 | elapsed time per iteration (s): 0.08 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 4.543934E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.373 | TFLOPs: 11.77 | 7: iteration 73840/ 173500 | consumed samples: 18903040 | consumed tokens: 38713425920 | elapsed time per iteration (s): 0.08 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.528362E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.377 | TFLOPs: 11.83 | 7: iteration 73850/ 173500 | consumed samples: 18905600 | consumed tokens: 38718668800 | elapsed time per iteration (s): 0.08 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.540472E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.221 | TFLOPs: 11.86 | 7: iteration 73860/ 173500 | consumed samples: 18908160 | consumed tokens: 38723911680 | elapsed time per iteration (s): 0.08 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.546027E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.023 | TFLOPs: 11.79 | 7: iteration 73870/ 173500 | consumed samples: 18910720 | consumed tokens: 38729154560 | elapsed time per iteration (s): 0.09 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.522784E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.525 | TFLOPs: 11.04 | 7: iteration 73880/ 173500 | consumed samples: 18913280 | consumed tokens: 38734397440 | elapsed time per iteration (s): 0.08 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.533895E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.417 | TFLOPs: 11.81 | 7: iteration 73890/ 173500 | consumed samples: 18915840 | consumed tokens: 38739640320 | elapsed time per iteration (s): 0.08 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 4.542685E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.560 | TFLOPs: 11.56 | 7: iteration 73900/ 173500 | consumed samples: 18918400 | consumed tokens: 38744883200 | elapsed time per iteration (s): 0.08 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.529067E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.036 | TFLOPs: 11.67 | 7: iteration 73910/ 173500 | consumed samples: 18920960 | consumed tokens: 38750126080 | elapsed time per iteration (s): 0.08 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.532915E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.117 | TFLOPs: 11.43 | 7: iteration 73920/ 173500 | consumed samples: 18923520 | consumed tokens: 38755368960 | elapsed time per iteration (s): 0.08 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.528089E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.978 | TFLOPs: 11.56 | 7: iteration 73930/ 173500 | consumed samples: 18926080 | consumed tokens: 38760611840 | elapsed time per iteration (s): 0.09 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.532188E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.680 | TFLOPs: 10.26 | 7: iteration 73940/ 173500 | consumed samples: 18928640 | consumed tokens: 38765854720 | elapsed time per iteration (s): 0.11 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.528867E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.213 | TFLOPs: 8.63 | 7: iteration 73950/ 173500 | consumed samples: 18931200 | consumed tokens: 38771097600 | elapsed time per iteration (s): 0.10 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 4.542195E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2445.419 | TFLOPs: 9.10 | 7: iteration 73960/ 173500 | consumed samples: 18933760 | consumed tokens: 38776340480 | elapsed time per iteration (s): 0.08 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.542570E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.118 | TFLOPs: 11.75 | 7: iteration 73970/ 173500 | consumed samples: 18936320 | consumed tokens: 38781583360 | elapsed time per iteration (s): 0.08 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.530437E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.270 | TFLOPs: 11.78 | 7: iteration 73980/ 173500 | consumed samples: 18938880 | consumed tokens: 38786826240 | elapsed time per iteration (s): 0.08 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.543579E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.239 | TFLOPs: 11.80 | 7: iteration 73990/ 173500 | consumed samples: 18941440 | consumed tokens: 38792069120 | elapsed time per iteration (s): 0.09 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.537447E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2961.828 | TFLOPs: 11.02 | 0: [2023-03-17 02:03:00,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=0, lr=[0.0001321851851828754, 0.0001321851851828754, 0.0001321851851828754], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 74000/ 173500 | consumed samples: 18944000 | consumed tokens: 38797312000 | elapsed time per iteration (s): 0.10 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.532070E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.063 | TFLOPs: 9.57 | 0: steps: 74000 loss: 4.5390 iter time (s): 0.082 samples/sec: 3115.409 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 74000 | lm loss value: 4.418300E+00 | lm loss PPL: 8.295511E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 74000 to checkpoints_14m91b100m 0: [2023-03-17 02:03:00,594] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step74000 is begin to save! 0: [2023-03-17 02:03:00,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:03:00,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:03:00,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:03:00,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:03:00,627] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:03:00,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:03:00,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:03:00,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:03:00,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:03:00,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:03:00,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:03:00,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:03:00,638] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step74000/mp_rank_00_model_states.pt 0: [2023-03-17 02:03:00,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:03:00,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:03:00,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:03:00,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2023-03-17 02:03:00,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2023-03-17 02:03:00,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:03:00,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2023-03-17 02:03:00,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:03:00,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:03:00,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: successfully saved checkpoint at iteration 74000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.01 7: iteration 74010/ 173500 | consumed samples: 18946560 | consumed tokens: 38802554880 | elapsed time per iteration (s): 0.09 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.521526E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.950 | TFLOPs: 10.17 | 7: iteration 74020/ 173500 | consumed samples: 18949120 | consumed tokens: 38807797760 | elapsed time per iteration (s): 0.08 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 4.533868E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.311 | TFLOPs: 11.86 | 7: iteration 74030/ 173500 | consumed samples: 18951680 | consumed tokens: 38813040640 | elapsed time per iteration (s): 0.08 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.528470E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.009 | TFLOPs: 11.60 | 7: iteration 74040/ 173500 | consumed samples: 18954240 | consumed tokens: 38818283520 | elapsed time per iteration (s): 0.08 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.555252E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.807 | TFLOPs: 11.83 | 7: iteration 74050/ 173500 | consumed samples: 18956800 | consumed tokens: 38823526400 | elapsed time per iteration (s): 0.08 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.539990E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.246 | TFLOPs: 11.62 | 7: iteration 74060/ 173500 | consumed samples: 18959360 | consumed tokens: 38828769280 | elapsed time per iteration (s): 0.08 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.545257E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.309 | TFLOPs: 11.81 | 7: iteration 74070/ 173500 | consumed samples: 18961920 | consumed tokens: 38834012160 | elapsed time per iteration (s): 0.08 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.535868E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.690 | TFLOPs: 11.86 | 7: iteration 74080/ 173500 | consumed samples: 18964480 | consumed tokens: 38839255040 | elapsed time per iteration (s): 0.09 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 4.531719E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.109 | TFLOPs: 11.17 | 7: iteration 74090/ 173500 | consumed samples: 18967040 | consumed tokens: 38844497920 | elapsed time per iteration (s): 0.09 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.533617E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.624 | TFLOPs: 11.16 | 7: iteration 74100/ 173500 | consumed samples: 18969600 | consumed tokens: 38849740800 | elapsed time per iteration (s): 0.08 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.538632E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.721 | TFLOPs: 11.72 | 7: iteration 74110/ 173500 | consumed samples: 18972160 | consumed tokens: 38854983680 | elapsed time per iteration (s): 0.08 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.540605E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.684 | TFLOPs: 11.36 | 7: iteration 74120/ 173500 | consumed samples: 18974720 | consumed tokens: 38860226560 | elapsed time per iteration (s): 0.08 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.542669E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.610 | TFLOPs: 11.89 | 7: iteration 74130/ 173500 | consumed samples: 18977280 | consumed tokens: 38865469440 | elapsed time per iteration (s): 0.08 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.532330E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.312 | TFLOPs: 11.91 | 7: iteration 74140/ 173500 | consumed samples: 18979840 | consumed tokens: 38870712320 | elapsed time per iteration (s): 0.08 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 4.537365E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.147 | TFLOPs: 11.90 | 7: iteration 74150/ 173500 | consumed samples: 18982400 | consumed tokens: 38875955200 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.519791E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.122 | TFLOPs: 11.87 | 7: iteration 74160/ 173500 | consumed samples: 18984960 | consumed tokens: 38881198080 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.556502E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.934 | TFLOPs: 11.92 | 7: iteration 74170/ 173500 | consumed samples: 18987520 | consumed tokens: 38886440960 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.544089E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.857 | TFLOPs: 11.61 | 7: iteration 74180/ 173500 | consumed samples: 18990080 | consumed tokens: 38891683840 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.524853E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.008 | TFLOPs: 11.78 | 7: iteration 74190/ 173500 | consumed samples: 18992640 | consumed tokens: 38896926720 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.523871E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.954 | TFLOPs: 11.90 | 7: iteration 74200/ 173500 | consumed samples: 18995200 | consumed tokens: 38902169600 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.512846E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.389 | TFLOPs: 11.90 | 7: iteration 74210/ 173500 | consumed samples: 18997760 | consumed tokens: 38907412480 | elapsed time per iteration (s): 0.08 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.527611E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.006 | TFLOPs: 11.87 | 7: iteration 74220/ 173500 | consumed samples: 19000320 | consumed tokens: 38912655360 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.536116E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.080 | TFLOPs: 11.88 | 7: iteration 74230/ 173500 | consumed samples: 19002880 | consumed tokens: 38917898240 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.535862E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.519 | TFLOPs: 11.92 | 7: iteration 74240/ 173500 | consumed samples: 19005440 | consumed tokens: 38923141120 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.535191E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.662 | TFLOPs: 11.91 | 7: iteration 74250/ 173500 | consumed samples: 19008000 | consumed tokens: 38928384000 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.538219E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.241 | TFLOPs: 11.88 | 7: iteration 74260/ 173500 | consumed samples: 19010560 | consumed tokens: 38933626880 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.534112E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.858 | TFLOPs: 11.65 | 7: iteration 74270/ 173500 | consumed samples: 19013120 | consumed tokens: 38938869760 | elapsed time per iteration (s): 0.08 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 4.536873E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.608 | TFLOPs: 11.87 | 7: iteration 74280/ 173500 | consumed samples: 19015680 | consumed tokens: 38944112640 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.530147E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.553 | TFLOPs: 11.91 | 7: iteration 74290/ 173500 | consumed samples: 19018240 | consumed tokens: 38949355520 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.523921E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.740 | TFLOPs: 11.86 | 7: iteration 74300/ 173500 | consumed samples: 19020800 | consumed tokens: 38954598400 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.533306E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.373 | TFLOPs: 11.90 | 7: iteration 74310/ 173500 | consumed samples: 19023360 | consumed tokens: 38959841280 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.535217E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.930 | TFLOPs: 11.84 | 7: iteration 74320/ 173500 | consumed samples: 19025920 | consumed tokens: 38965084160 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.521584E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.534 | TFLOPs: 11.87 | 7: iteration 74330/ 173500 | consumed samples: 19028480 | consumed tokens: 38970327040 | elapsed time per iteration (s): 0.08 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 4.543494E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.219 | TFLOPs: 11.86 | 7: iteration 74340/ 173500 | consumed samples: 19031040 | consumed tokens: 38975569920 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.536665E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.712 | TFLOPs: 11.87 | 7: iteration 74350/ 173500 | consumed samples: 19033600 | consumed tokens: 38980812800 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.532883E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.958 | TFLOPs: 11.85 | 7: iteration 74360/ 173500 | consumed samples: 19036160 | consumed tokens: 38986055680 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.529774E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.188 | TFLOPs: 11.91 | 7: iteration 74370/ 173500 | consumed samples: 19038720 | consumed tokens: 38991298560 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.533296E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.753 | TFLOPs: 11.90 | 7: iteration 74380/ 173500 | consumed samples: 19041280 | consumed tokens: 38996541440 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.536272E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.334 | TFLOPs: 11.84 | 7: iteration 74390/ 173500 | consumed samples: 19043840 | consumed tokens: 39001784320 | elapsed time per iteration (s): 0.08 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 4.527713E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.078 | TFLOPs: 11.82 | 7: iteration 74400/ 173500 | consumed samples: 19046400 | consumed tokens: 39007027200 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.534238E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.234 | TFLOPs: 11.61 | 7: iteration 74410/ 173500 | consumed samples: 19048960 | consumed tokens: 39012270080 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.535141E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.735 | TFLOPs: 11.90 | 7: iteration 74420/ 173500 | consumed samples: 19051520 | consumed tokens: 39017512960 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.531464E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.663 | TFLOPs: 11.87 | 7: iteration 74430/ 173500 | consumed samples: 19054080 | consumed tokens: 39022755840 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.531962E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.572 | TFLOPs: 11.89 | 7: iteration 74440/ 173500 | consumed samples: 19056640 | consumed tokens: 39027998720 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.537076E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.288 | TFLOPs: 11.90 | 7: iteration 74450/ 173500 | consumed samples: 19059200 | consumed tokens: 39033241600 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.532884E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.067 | TFLOPs: 11.87 | 7: iteration 74460/ 173500 | consumed samples: 19061760 | consumed tokens: 39038484480 | elapsed time per iteration (s): 0.08 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 4.527308E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.991 | TFLOPs: 11.60 | 7: iteration 74470/ 173500 | consumed samples: 19064320 | consumed tokens: 39043727360 | elapsed time per iteration (s): 0.08 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.543670E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.506 | TFLOPs: 11.63 | 7: iteration 74480/ 173500 | consumed samples: 19066880 | consumed tokens: 39048970240 | elapsed time per iteration (s): 0.08 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.523608E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.390 | TFLOPs: 11.90 | 7: iteration 74490/ 173500 | consumed samples: 19069440 | consumed tokens: 39054213120 | elapsed time per iteration (s): 0.08 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.544799E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.630 | TFLOPs: 11.90 | 7: iteration 74500/ 173500 | consumed samples: 19072000 | consumed tokens: 39059456000 | elapsed time per iteration (s): 0.08 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.532076E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.001 | TFLOPs: 11.87 | 7: iteration 74510/ 173500 | consumed samples: 19074560 | consumed tokens: 39064698880 | elapsed time per iteration (s): 0.08 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.534094E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.166 | TFLOPs: 11.89 | 7: iteration 74520/ 173500 | consumed samples: 19077120 | consumed tokens: 39069941760 | elapsed time per iteration (s): 0.09 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 4.522995E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.483 | TFLOPs: 10.84 | 7: iteration 74530/ 173500 | consumed samples: 19079680 | consumed tokens: 39075184640 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.532935E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.636 | TFLOPs: 11.51 | 7: iteration 74540/ 173500 | consumed samples: 19082240 | consumed tokens: 39080427520 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.542268E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.471 | TFLOPs: 11.87 | 7: iteration 74550/ 173500 | consumed samples: 19084800 | consumed tokens: 39085670400 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.535923E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.828 | TFLOPs: 11.76 | 7: iteration 74560/ 173500 | consumed samples: 19087360 | consumed tokens: 39090913280 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.544439E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.104 | TFLOPs: 11.84 | 7: iteration 74570/ 173500 | consumed samples: 19089920 | consumed tokens: 39096156160 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.516099E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.416 | TFLOPs: 11.87 | 7: iteration 74580/ 173500 | consumed samples: 19092480 | consumed tokens: 39101399040 | elapsed time per iteration (s): 0.08 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 4.529753E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.502 | TFLOPs: 11.89 | 7: iteration 74590/ 173500 | consumed samples: 19095040 | consumed tokens: 39106641920 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.531907E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.046 | TFLOPs: 11.85 | 7: iteration 74600/ 173500 | consumed samples: 19097600 | consumed tokens: 39111884800 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.532826E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.435 | TFLOPs: 11.88 | 7: iteration 74610/ 173500 | consumed samples: 19100160 | consumed tokens: 39117127680 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.530604E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.537 | TFLOPs: 11.80 | 7: iteration 74620/ 173500 | consumed samples: 19102720 | consumed tokens: 39122370560 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.533808E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.491 | TFLOPs: 11.66 | 7: iteration 74630/ 173500 | consumed samples: 19105280 | consumed tokens: 39127613440 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.536546E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.915 | TFLOPs: 11.97 | 7: iteration 74640/ 173500 | consumed samples: 19107840 | consumed tokens: 39132856320 | elapsed time per iteration (s): 0.08 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 4.537902E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.805 | TFLOPs: 11.92 | 7: iteration 74650/ 173500 | consumed samples: 19110400 | consumed tokens: 39138099200 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.544220E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.232 | TFLOPs: 11.98 | 7: iteration 74660/ 173500 | consumed samples: 19112960 | consumed tokens: 39143342080 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.536692E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.367 | TFLOPs: 11.93 | 7: iteration 74670/ 173500 | consumed samples: 19115520 | consumed tokens: 39148584960 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.519648E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.958 | TFLOPs: 11.93 | 7: iteration 74680/ 173500 | consumed samples: 19118080 | consumed tokens: 39153827840 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.548232E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.348 | TFLOPs: 11.96 | 7: iteration 74690/ 173500 | consumed samples: 19120640 | consumed tokens: 39159070720 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.540475E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.385 | TFLOPs: 11.95 | 7: iteration 74700/ 173500 | consumed samples: 19123200 | consumed tokens: 39164313600 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.544574E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.650 | TFLOPs: 11.96 | 7: iteration 74710/ 173500 | consumed samples: 19125760 | consumed tokens: 39169556480 | elapsed time per iteration (s): 0.08 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 4.545784E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.883 | TFLOPs: 11.97 | 7: iteration 74720/ 173500 | consumed samples: 19128320 | consumed tokens: 39174799360 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.531739E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.185 | TFLOPs: 11.96 | 7: iteration 74730/ 173500 | consumed samples: 19130880 | consumed tokens: 39180042240 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.533713E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.376 | TFLOPs: 11.71 | 7: iteration 74740/ 173500 | consumed samples: 19133440 | consumed tokens: 39185285120 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.544510E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.639 | TFLOPs: 11.70 | 7: iteration 74750/ 173500 | consumed samples: 19136000 | consumed tokens: 39190528000 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.529321E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.943 | TFLOPs: 11.99 | 7: iteration 74760/ 173500 | consumed samples: 19138560 | consumed tokens: 39195770880 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.527238E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.487 | TFLOPs: 11.87 | 7: iteration 74770/ 173500 | consumed samples: 19141120 | consumed tokens: 39201013760 | elapsed time per iteration (s): 0.08 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 4.532927E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.997 | TFLOPs: 11.92 | 7: iteration 74780/ 173500 | consumed samples: 19143680 | consumed tokens: 39206256640 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.525556E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.125 | TFLOPs: 11.90 | 7: iteration 74790/ 173500 | consumed samples: 19146240 | consumed tokens: 39211499520 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.536513E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.044 | TFLOPs: 11.89 | 7: iteration 74800/ 173500 | consumed samples: 19148800 | consumed tokens: 39216742400 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.531398E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.722 | TFLOPs: 11.92 | 7: iteration 74810/ 173500 | consumed samples: 19151360 | consumed tokens: 39221985280 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.543426E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.206 | TFLOPs: 11.94 | 7: iteration 74820/ 173500 | consumed samples: 19153920 | consumed tokens: 39227228160 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.517351E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.779 | TFLOPs: 11.94 | 7: iteration 74830/ 173500 | consumed samples: 19156480 | consumed tokens: 39232471040 | elapsed time per iteration (s): 0.08 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 4.548394E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.762 | TFLOPs: 11.93 | 7: iteration 74840/ 173500 | consumed samples: 19159040 | consumed tokens: 39237713920 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.541926E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.282 | TFLOPs: 11.94 | 7: iteration 74850/ 173500 | consumed samples: 19161600 | consumed tokens: 39242956800 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.547210E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.822 | TFLOPs: 11.83 | 7: iteration 74860/ 173500 | consumed samples: 19164160 | consumed tokens: 39248199680 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.542211E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.595 | TFLOPs: 11.96 | 7: iteration 74870/ 173500 | consumed samples: 19166720 | consumed tokens: 39253442560 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.518544E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.946 | TFLOPs: 11.86 | 7: iteration 74880/ 173500 | consumed samples: 19169280 | consumed tokens: 39258685440 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.539940E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.768 | TFLOPs: 11.71 | 7: iteration 74890/ 173500 | consumed samples: 19171840 | consumed tokens: 39263928320 | elapsed time per iteration (s): 0.08 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 4.537412E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.195 | TFLOPs: 12.00 | 7: iteration 74900/ 173500 | consumed samples: 19174400 | consumed tokens: 39269171200 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.528061E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.022 | TFLOPs: 12.00 | 7: iteration 74910/ 173500 | consumed samples: 19176960 | consumed tokens: 39274414080 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.516500E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.461 | TFLOPs: 11.87 | 7: iteration 74920/ 173500 | consumed samples: 19179520 | consumed tokens: 39279656960 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.542713E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.990 | TFLOPs: 11.78 | 7: iteration 74930/ 173500 | consumed samples: 19182080 | consumed tokens: 39284899840 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.542542E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.433 | TFLOPs: 12.00 | 7: iteration 74940/ 173500 | consumed samples: 19184640 | consumed tokens: 39290142720 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.531364E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.939 | TFLOPs: 11.97 | 7: iteration 74950/ 173500 | consumed samples: 19187200 | consumed tokens: 39295385600 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.526186E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.347 | TFLOPs: 11.86 | 7: iteration 74960/ 173500 | consumed samples: 19189760 | consumed tokens: 39300628480 | elapsed time per iteration (s): 0.08 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 4.530532E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.079 | TFLOPs: 11.90 | 7: iteration 74970/ 173500 | consumed samples: 19192320 | consumed tokens: 39305871360 | elapsed time per iteration (s): 0.08 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.522577E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.404 | TFLOPs: 11.94 | 7: iteration 74980/ 173500 | consumed samples: 19194880 | consumed tokens: 39311114240 | elapsed time per iteration (s): 0.08 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.535870E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.857 | TFLOPs: 11.90 | 7: iteration 74990/ 173500 | consumed samples: 19197440 | consumed tokens: 39316357120 | elapsed time per iteration (s): 0.08 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.539657E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.490 | TFLOPs: 11.90 | 7: iteration 75000/ 173500 | consumed samples: 19200000 | consumed tokens: 39321600000 | elapsed time per iteration (s): 0.08 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.537594E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.069 | TFLOPs: 11.98 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 75000 | lm loss value: 4.404099E+00 | lm loss PPL: 8.178542E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 75000 to checkpoints_14m91b100m 0: [2023-03-17 02:04:21,205] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step75000 is begin to save! 0: [2023-03-17 02:04:21,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:04:21,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:04:21,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:04:21,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:04:21,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:04:21,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:04:21,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:04:21,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:04:21,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:04:21,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:04:21,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:04:21,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:04:21,251] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step75000/mp_rank_00_model_states.pt 0: [2023-03-17 02:04:21,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:04:21,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:04:21,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 02:04:21,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2023-03-17 02:04:21,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:04:21,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2023-03-17 02:04:21,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:04:21,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:04:21,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: successfully saved checkpoint at iteration 75000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.80 7: iteration 75010/ 173500 | consumed samples: 19202560 | consumed tokens: 39326842880 | elapsed time per iteration (s): 0.09 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.535479E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.476 | TFLOPs: 10.46 | 7: iteration 75020/ 173500 | consumed samples: 19205120 | consumed tokens: 39332085760 | elapsed time per iteration (s): 0.08 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 4.539231E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.258 | TFLOPs: 11.98 | 7: iteration 75030/ 173500 | consumed samples: 19207680 | consumed tokens: 39337328640 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.528133E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.987 | TFLOPs: 11.96 | 7: iteration 75040/ 173500 | consumed samples: 19210240 | consumed tokens: 39342571520 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.525381E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.811 | TFLOPs: 11.99 | 7: iteration 75050/ 173500 | consumed samples: 19212800 | consumed tokens: 39347814400 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.538456E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.935 | TFLOPs: 11.94 | 7: iteration 75060/ 173500 | consumed samples: 19215360 | consumed tokens: 39353057280 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.538975E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.174 | TFLOPs: 11.93 | 7: iteration 75070/ 173500 | consumed samples: 19217920 | consumed tokens: 39358300160 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.530299E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.857 | TFLOPs: 11.43 | 7: iteration 75080/ 173500 | consumed samples: 19220480 | consumed tokens: 39363543040 | elapsed time per iteration (s): 0.08 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 4.523046E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.819 | TFLOPs: 11.97 | 7: iteration 75090/ 173500 | consumed samples: 19223040 | consumed tokens: 39368785920 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.539911E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.017 | TFLOPs: 11.67 | 7: iteration 75100/ 173500 | consumed samples: 19225600 | consumed tokens: 39374028800 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.534864E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.116 | TFLOPs: 11.84 | 7: iteration 75110/ 173500 | consumed samples: 19228160 | consumed tokens: 39379271680 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.521956E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.873 | TFLOPs: 11.81 | 7: iteration 75120/ 173500 | consumed samples: 19230720 | consumed tokens: 39384514560 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.526704E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.979 | TFLOPs: 11.97 | 7: iteration 75130/ 173500 | consumed samples: 19233280 | consumed tokens: 39389757440 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.539594E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.308 | TFLOPs: 12.05 | 7: iteration 75140/ 173500 | consumed samples: 19235840 | consumed tokens: 39395000320 | elapsed time per iteration (s): 0.08 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 4.532842E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.524 | TFLOPs: 11.99 | 7: iteration 75150/ 173500 | consumed samples: 19238400 | consumed tokens: 39400243200 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.520284E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.414 | TFLOPs: 11.96 | 7: iteration 75160/ 173500 | consumed samples: 19240960 | consumed tokens: 39405486080 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.539971E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.512 | TFLOPs: 12.02 | 7: iteration 75170/ 173500 | consumed samples: 19243520 | consumed tokens: 39410728960 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.536748E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.368 | TFLOPs: 11.94 | 7: iteration 75180/ 173500 | consumed samples: 19246080 | consumed tokens: 39415971840 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.523470E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.483 | TFLOPs: 12.03 | 7: iteration 75190/ 173500 | consumed samples: 19248640 | consumed tokens: 39421214720 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.526337E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.342 | TFLOPs: 12.02 | 7: iteration 75200/ 173500 | consumed samples: 19251200 | consumed tokens: 39426457600 | elapsed time per iteration (s): 0.08 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 4.526110E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.108 | TFLOPs: 12.01 | 7: iteration 75210/ 173500 | consumed samples: 19253760 | consumed tokens: 39431700480 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.538977E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.270 | TFLOPs: 12.03 | 7: iteration 75220/ 173500 | consumed samples: 19256320 | consumed tokens: 39436943360 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.547964E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.482 | TFLOPs: 12.00 | 7: iteration 75230/ 173500 | consumed samples: 19258880 | consumed tokens: 39442186240 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.533011E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.458 | TFLOPs: 11.61 | 7: iteration 75240/ 173500 | consumed samples: 19261440 | consumed tokens: 39447429120 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.538644E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.757 | TFLOPs: 11.96 | 7: iteration 75250/ 173500 | consumed samples: 19264000 | consumed tokens: 39452672000 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.546906E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.613 | TFLOPs: 11.93 | 7: iteration 75260/ 173500 | consumed samples: 19266560 | consumed tokens: 39457914880 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.524280E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.867 | TFLOPs: 11.97 | 7: iteration 75270/ 173500 | consumed samples: 19269120 | consumed tokens: 39463157760 | elapsed time per iteration (s): 0.08 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.524614E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.552 | TFLOPs: 11.96 | 7: iteration 75280/ 173500 | consumed samples: 19271680 | consumed tokens: 39468400640 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.535204E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.512 | TFLOPs: 11.95 | 7: iteration 75290/ 173500 | consumed samples: 19274240 | consumed tokens: 39473643520 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.524324E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.828 | TFLOPs: 11.91 | 7: iteration 75300/ 173500 | consumed samples: 19276800 | consumed tokens: 39478886400 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.520348E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.046 | TFLOPs: 11.87 | 7: iteration 75310/ 173500 | consumed samples: 19279360 | consumed tokens: 39484129280 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.538707E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.751 | TFLOPs: 11.83 | 7: iteration 75320/ 173500 | consumed samples: 19281920 | consumed tokens: 39489372160 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.532080E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.493 | TFLOPs: 11.91 | 7: iteration 75330/ 173500 | consumed samples: 19284480 | consumed tokens: 39494615040 | elapsed time per iteration (s): 0.08 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 4.531772E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.688 | TFLOPs: 11.65 | 7: iteration 75340/ 173500 | consumed samples: 19287040 | consumed tokens: 39499857920 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.539759E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.175 | TFLOPs: 11.92 | 7: iteration 75350/ 173500 | consumed samples: 19289600 | consumed tokens: 39505100800 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.533636E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.279 | TFLOPs: 12.06 | 7: iteration 75360/ 173500 | consumed samples: 19292160 | consumed tokens: 39510343680 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.527262E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.311 | TFLOPs: 12.03 | 7: iteration 75370/ 173500 | consumed samples: 19294720 | consumed tokens: 39515586560 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.535497E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.653 | TFLOPs: 12.05 | 7: iteration 75380/ 173500 | consumed samples: 19297280 | consumed tokens: 39520829440 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.525946E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.734 | TFLOPs: 11.85 | 7: iteration 75390/ 173500 | consumed samples: 19299840 | consumed tokens: 39526072320 | elapsed time per iteration (s): 0.08 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 4.531740E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.703 | TFLOPs: 11.85 | 7: iteration 75400/ 173500 | consumed samples: 19302400 | consumed tokens: 39531315200 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.518699E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.861 | TFLOPs: 11.88 | 7: iteration 75410/ 173500 | consumed samples: 19304960 | consumed tokens: 39536558080 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.528481E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.680 | TFLOPs: 11.89 | 7: iteration 75420/ 173500 | consumed samples: 19307520 | consumed tokens: 39541800960 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.532224E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.641 | TFLOPs: 11.85 | 7: iteration 75430/ 173500 | consumed samples: 19310080 | consumed tokens: 39547043840 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.537584E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.288 | TFLOPs: 11.79 | 7: iteration 75440/ 173500 | consumed samples: 19312640 | consumed tokens: 39552286720 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.532706E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.988 | TFLOPs: 11.83 | 7: iteration 75450/ 173500 | consumed samples: 19315200 | consumed tokens: 39557529600 | elapsed time per iteration (s): 0.08 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 4.540851E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.538 | TFLOPs: 11.85 | 7: iteration 75460/ 173500 | consumed samples: 19317760 | consumed tokens: 39562772480 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.537014E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.726 | TFLOPs: 11.88 | 7: iteration 75470/ 173500 | consumed samples: 19320320 | consumed tokens: 39568015360 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.520186E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.538 | TFLOPs: 11.81 | 7: iteration 75480/ 173500 | consumed samples: 19322880 | consumed tokens: 39573258240 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.531330E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.205 | TFLOPs: 11.99 | 7: iteration 75490/ 173500 | consumed samples: 19325440 | consumed tokens: 39578501120 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.533126E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.307 | TFLOPs: 12.03 | 7: iteration 75500/ 173500 | consumed samples: 19328000 | consumed tokens: 39583744000 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.537578E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.806 | TFLOPs: 12.02 | 7: iteration 75510/ 173500 | consumed samples: 19330560 | consumed tokens: 39588986880 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.529400E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.786 | TFLOPs: 12.01 | 7: iteration 75520/ 173500 | consumed samples: 19333120 | consumed tokens: 39594229760 | elapsed time per iteration (s): 0.08 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 4.533143E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.797 | TFLOPs: 11.75 | 7: iteration 75530/ 173500 | consumed samples: 19335680 | consumed tokens: 39599472640 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.534788E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.896 | TFLOPs: 12.01 | 7: iteration 75540/ 173500 | consumed samples: 19338240 | consumed tokens: 39604715520 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.531788E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.527 | TFLOPs: 11.94 | 7: iteration 75550/ 173500 | consumed samples: 19340800 | consumed tokens: 39609958400 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.531230E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.974 | TFLOPs: 11.70 | 7: iteration 75560/ 173500 | consumed samples: 19343360 | consumed tokens: 39615201280 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.534296E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.503 | TFLOPs: 11.88 | 7: iteration 75570/ 173500 | consumed samples: 19345920 | consumed tokens: 39620444160 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.526873E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.151 | TFLOPs: 11.93 | 7: iteration 75580/ 173500 | consumed samples: 19348480 | consumed tokens: 39625687040 | elapsed time per iteration (s): 0.08 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 4.523174E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.774 | TFLOPs: 11.92 | 7: iteration 75590/ 173500 | consumed samples: 19351040 | consumed tokens: 39630929920 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.525335E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.297 | TFLOPs: 11.97 | 7: iteration 75600/ 173500 | consumed samples: 19353600 | consumed tokens: 39636172800 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.542136E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.577 | TFLOPs: 11.90 | 7: iteration 75610/ 173500 | consumed samples: 19356160 | consumed tokens: 39641415680 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.531520E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.076 | TFLOPs: 11.95 | 7: iteration 75620/ 173500 | consumed samples: 19358720 | consumed tokens: 39646658560 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.526065E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.602 | TFLOPs: 11.99 | 7: iteration 75630/ 173500 | consumed samples: 19361280 | consumed tokens: 39651901440 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.523801E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.571 | TFLOPs: 12.07 | 7: iteration 75640/ 173500 | consumed samples: 19363840 | consumed tokens: 39657144320 | elapsed time per iteration (s): 0.08 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 4.534166E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.552 | TFLOPs: 12.03 | 7: iteration 75650/ 173500 | consumed samples: 19366400 | consumed tokens: 39662387200 | elapsed time per iteration (s): 0.08 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.536708E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.150 | TFLOPs: 12.06 | 7: iteration 75660/ 173500 | consumed samples: 19368960 | consumed tokens: 39667630080 | elapsed time per iteration (s): 0.08 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.525464E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.385 | TFLOPs: 11.60 | 7: iteration 75670/ 173500 | consumed samples: 19371520 | consumed tokens: 39672872960 | elapsed time per iteration (s): 0.08 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.527423E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.131 | TFLOPs: 11.92 | 7: iteration 75680/ 173500 | consumed samples: 19374080 | consumed tokens: 39678115840 | elapsed time per iteration (s): 0.08 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.534235E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.834 | TFLOPs: 11.68 | 7: iteration 75690/ 173500 | consumed samples: 19376640 | consumed tokens: 39683358720 | elapsed time per iteration (s): 0.09 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.532672E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.588 | TFLOPs: 10.70 | 7: iteration 75700/ 173500 | consumed samples: 19379200 | consumed tokens: 39688601600 | elapsed time per iteration (s): 0.09 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 4.534258E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.899 | TFLOPs: 10.91 | 7: iteration 75710/ 173500 | consumed samples: 19381760 | consumed tokens: 39693844480 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.532125E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.089 | TFLOPs: 11.98 | 7: iteration 75720/ 173500 | consumed samples: 19384320 | consumed tokens: 39699087360 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.530001E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.444 | TFLOPs: 11.96 | 7: iteration 75730/ 173500 | consumed samples: 19386880 | consumed tokens: 39704330240 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.539581E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.622 | TFLOPs: 12.00 | 7: iteration 75740/ 173500 | consumed samples: 19389440 | consumed tokens: 39709573120 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.532536E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.935 | TFLOPs: 12.01 | 7: iteration 75750/ 173500 | consumed samples: 19392000 | consumed tokens: 39714816000 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.539496E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.843 | TFLOPs: 12.01 | 7: iteration 75760/ 173500 | consumed samples: 19394560 | consumed tokens: 39720058880 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.531180E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.009 | TFLOPs: 11.82 | 7: iteration 75770/ 173500 | consumed samples: 19397120 | consumed tokens: 39725301760 | elapsed time per iteration (s): 0.08 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 4.532089E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.146 | TFLOPs: 11.92 | 7: iteration 75780/ 173500 | consumed samples: 19399680 | consumed tokens: 39730544640 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.535957E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.158 | TFLOPs: 11.86 | 7: iteration 75790/ 173500 | consumed samples: 19402240 | consumed tokens: 39735787520 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.540824E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.649 | TFLOPs: 11.90 | 7: iteration 75800/ 173500 | consumed samples: 19404800 | consumed tokens: 39741030400 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.531696E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.237 | TFLOPs: 11.86 | 7: iteration 75810/ 173500 | consumed samples: 19407360 | consumed tokens: 39746273280 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.536846E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.812 | TFLOPs: 11.90 | 7: iteration 75820/ 173500 | consumed samples: 19409920 | consumed tokens: 39751516160 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.525426E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.061 | TFLOPs: 11.86 | 7: iteration 75830/ 173500 | consumed samples: 19412480 | consumed tokens: 39756759040 | elapsed time per iteration (s): 0.08 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 4.530724E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.736 | TFLOPs: 11.87 | 7: iteration 75840/ 173500 | consumed samples: 19415040 | consumed tokens: 39762001920 | elapsed time per iteration (s): 0.09 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.545127E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.303 | TFLOPs: 10.54 | 7: iteration 75850/ 173500 | consumed samples: 19417600 | consumed tokens: 39767244800 | elapsed time per iteration (s): 0.08 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.547945E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.895 | TFLOPs: 11.83 | 7: iteration 75860/ 173500 | consumed samples: 19420160 | consumed tokens: 39772487680 | elapsed time per iteration (s): 0.08 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.539679E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.276 | TFLOPs: 11.88 | 7: iteration 75870/ 173500 | consumed samples: 19422720 | consumed tokens: 39777730560 | elapsed time per iteration (s): 0.08 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.527525E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.168 | TFLOPs: 11.92 | 7: iteration 75880/ 173500 | consumed samples: 19425280 | consumed tokens: 39782973440 | elapsed time per iteration (s): 0.08 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.538678E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.736 | TFLOPs: 11.92 | 7: iteration 75890/ 173500 | consumed samples: 19427840 | consumed tokens: 39788216320 | elapsed time per iteration (s): 0.08 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 4.530898E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.420 | TFLOPs: 11.77 | 7: iteration 75900/ 173500 | consumed samples: 19430400 | consumed tokens: 39793459200 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.539976E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.678 | TFLOPs: 11.96 | 7: iteration 75910/ 173500 | consumed samples: 19432960 | consumed tokens: 39798702080 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.540735E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.041 | TFLOPs: 11.91 | 7: iteration 75920/ 173500 | consumed samples: 19435520 | consumed tokens: 39803944960 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.524974E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.432 | TFLOPs: 11.86 | 7: iteration 75930/ 173500 | consumed samples: 19438080 | consumed tokens: 39809187840 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.522353E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.857 | TFLOPs: 12.07 | 7: iteration 75940/ 173500 | consumed samples: 19440640 | consumed tokens: 39814430720 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.539772E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.347 | TFLOPs: 12.05 | 7: iteration 75950/ 173500 | consumed samples: 19443200 | consumed tokens: 39819673600 | elapsed time per iteration (s): 0.08 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 4.532001E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.000 | TFLOPs: 12.05 | 7: iteration 75960/ 173500 | consumed samples: 19445760 | consumed tokens: 39824916480 | elapsed time per iteration (s): 0.08 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.531927E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.320 | TFLOPs: 12.03 | 7: iteration 75970/ 173500 | consumed samples: 19448320 | consumed tokens: 39830159360 | elapsed time per iteration (s): 0.08 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.533109E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.712 | TFLOPs: 12.00 | 7: iteration 75980/ 173500 | consumed samples: 19450880 | consumed tokens: 39835402240 | elapsed time per iteration (s): 0.08 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.529090E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.408 | TFLOPs: 12.03 | 7: iteration 75990/ 173500 | consumed samples: 19453440 | consumed tokens: 39840645120 | elapsed time per iteration (s): 0.08 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.534044E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.479 | TFLOPs: 12.00 | 0: [2023-03-17 02:05:41,443] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=0, lr=[0.0001289804445403464, 0.0001289804445403464, 0.0001289804445403464], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 76000/ 173500 | consumed samples: 19456000 | consumed tokens: 39845888000 | elapsed time per iteration (s): 0.08 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.523631E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.782 | TFLOPs: 12.04 | 0: steps: 76000 loss: 4.5327 iter time (s): 0.080 samples/sec: 3211.636 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 76000 | lm loss value: 4.403834E+00 | lm loss PPL: 8.176378E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 76000 to checkpoints_14m91b100m 0: [2023-03-17 02:05:41,500] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step76000 is begin to save! 0: [2023-03-17 02:05:41,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:05:41,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:05:41,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:05:41,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:05:41,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:05:41,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:05:41,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:05:41,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:05:41,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:05:41,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:05:41,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:05:41,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:05:41,543] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step76000/mp_rank_00_model_states.pt 0: [2023-03-17 02:05:41,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:05:41,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:05:41,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:05:41,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:05:41,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:05:41,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:05:41,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:05:41,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:05:41,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2023-03-17 02:05:41,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:05:41,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: successfully saved checkpoint at iteration 76000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.91 7: iteration 76010/ 173500 | consumed samples: 19458560 | consumed tokens: 39851130880 | elapsed time per iteration (s): 0.10 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 4.531426E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.079 | TFLOPs: 9.70 | 7: iteration 76020/ 173500 | consumed samples: 19461120 | consumed tokens: 39856373760 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.522259E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.980 | TFLOPs: 12.03 | 7: iteration 76030/ 173500 | consumed samples: 19463680 | consumed tokens: 39861616640 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.523519E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.602 | TFLOPs: 11.90 | 7: iteration 76040/ 173500 | consumed samples: 19466240 | consumed tokens: 39866859520 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.539749E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.517 | TFLOPs: 12.00 | 7: iteration 76050/ 173500 | consumed samples: 19468800 | consumed tokens: 39872102400 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.533145E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.345 | TFLOPs: 11.99 | 7: iteration 76060/ 173500 | consumed samples: 19471360 | consumed tokens: 39877345280 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.540717E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.481 | TFLOPs: 12.00 | 7: iteration 76070/ 173500 | consumed samples: 19473920 | consumed tokens: 39882588160 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.531855E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.902 | TFLOPs: 12.02 | 7: iteration 76080/ 173500 | consumed samples: 19476480 | consumed tokens: 39887831040 | elapsed time per iteration (s): 0.08 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 4.533510E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.959 | TFLOPs: 12.04 | 7: iteration 76090/ 173500 | consumed samples: 19479040 | consumed tokens: 39893073920 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.535744E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.304 | TFLOPs: 11.99 | 7: iteration 76100/ 173500 | consumed samples: 19481600 | consumed tokens: 39898316800 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.532872E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.201 | TFLOPs: 11.93 | 7: iteration 76110/ 173500 | consumed samples: 19484160 | consumed tokens: 39903559680 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.532441E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.828 | TFLOPs: 11.93 | 7: iteration 76120/ 173500 | consumed samples: 19486720 | consumed tokens: 39908802560 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.541542E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.408 | TFLOPs: 11.90 | 7: iteration 76130/ 173500 | consumed samples: 19489280 | consumed tokens: 39914045440 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.529501E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.011 | TFLOPs: 11.88 | 7: iteration 76140/ 173500 | consumed samples: 19491840 | consumed tokens: 39919288320 | elapsed time per iteration (s): 0.08 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 4.520393E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.568 | TFLOPs: 11.66 | 7: iteration 76150/ 173500 | consumed samples: 19494400 | consumed tokens: 39924531200 | elapsed time per iteration (s): 0.08 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.530854E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.620 | TFLOPs: 11.89 | 7: iteration 76160/ 173500 | consumed samples: 19496960 | consumed tokens: 39929774080 | elapsed time per iteration (s): 0.10 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.526412E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2557.892 | TFLOPs: 9.51 | 7: iteration 76170/ 173500 | consumed samples: 19499520 | consumed tokens: 39935016960 | elapsed time per iteration (s): 0.10 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.516324E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.654 | TFLOPs: 9.16 | 7: iteration 76180/ 173500 | consumed samples: 19502080 | consumed tokens: 39940259840 | elapsed time per iteration (s): 0.10 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.525570E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.312 | TFLOPs: 9.26 | 7: iteration 76190/ 173500 | consumed samples: 19504640 | consumed tokens: 39945502720 | elapsed time per iteration (s): 0.10 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.537136E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.584 | TFLOPs: 9.41 | 7: iteration 76200/ 173500 | consumed samples: 19507200 | consumed tokens: 39950745600 | elapsed time per iteration (s): 0.22 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 4.536966E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1141.894 | TFLOPs: 4.25 | 7: iteration 76210/ 173500 | consumed samples: 19509760 | consumed tokens: 39955988480 | elapsed time per iteration (s): 0.10 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.550819E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2630.649 | TFLOPs: 9.78 | 7: iteration 76220/ 173500 | consumed samples: 19512320 | consumed tokens: 39961231360 | elapsed time per iteration (s): 0.10 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.534296E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.993 | TFLOPs: 9.29 | 7: iteration 76230/ 173500 | consumed samples: 19514880 | consumed tokens: 39966474240 | elapsed time per iteration (s): 0.10 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.526517E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.979 | TFLOPs: 9.19 | 7: iteration 76240/ 173500 | consumed samples: 19517440 | consumed tokens: 39971717120 | elapsed time per iteration (s): 0.10 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.520820E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2444.955 | TFLOPs: 9.09 | 7: iteration 76250/ 173500 | consumed samples: 19520000 | consumed tokens: 39976960000 | elapsed time per iteration (s): 0.11 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.527448E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2431.358 | TFLOPs: 9.04 | 7: iteration 76260/ 173500 | consumed samples: 19522560 | consumed tokens: 39982202880 | elapsed time per iteration (s): 0.10 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 4.524833E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.740 | TFLOPs: 9.15 | 7: iteration 76270/ 173500 | consumed samples: 19525120 | consumed tokens: 39987445760 | elapsed time per iteration (s): 0.11 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.536895E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.731 | TFLOPs: 9.05 | 7: iteration 76280/ 173500 | consumed samples: 19527680 | consumed tokens: 39992688640 | elapsed time per iteration (s): 0.11 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.504236E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.060 | TFLOPs: 8.98 | 7: iteration 76290/ 173500 | consumed samples: 19530240 | consumed tokens: 39997931520 | elapsed time per iteration (s): 0.11 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.531030E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2414.931 | TFLOPs: 8.98 | 7: iteration 76300/ 173500 | consumed samples: 19532800 | consumed tokens: 40003174400 | elapsed time per iteration (s): 0.10 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.527110E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.093 | TFLOPs: 9.34 | 7: iteration 76310/ 173500 | consumed samples: 19535360 | consumed tokens: 40008417280 | elapsed time per iteration (s): 0.10 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.523937E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.203 | TFLOPs: 9.21 | 7: iteration 76320/ 173500 | consumed samples: 19537920 | consumed tokens: 40013660160 | elapsed time per iteration (s): 0.11 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.538600E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2401.982 | TFLOPs: 8.93 | 7: iteration 76330/ 173500 | consumed samples: 19540480 | consumed tokens: 40018903040 | elapsed time per iteration (s): 0.11 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.530294E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2422.110 | TFLOPs: 9.01 | 7: iteration 76340/ 173500 | consumed samples: 19543040 | consumed tokens: 40024145920 | elapsed time per iteration (s): 0.10 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.531113E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2636.338 | TFLOPs: 9.81 | 7: iteration 76350/ 173500 | consumed samples: 19545600 | consumed tokens: 40029388800 | elapsed time per iteration (s): 0.11 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.529363E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.370 | TFLOPs: 8.91 | 7: iteration 76360/ 173500 | consumed samples: 19548160 | consumed tokens: 40034631680 | elapsed time per iteration (s): 0.11 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.540520E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.047 | TFLOPs: 8.88 | 7: iteration 76370/ 173500 | consumed samples: 19550720 | consumed tokens: 40039874560 | elapsed time per iteration (s): 0.10 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.523371E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.018 | TFLOPs: 9.12 | 7: iteration 76380/ 173500 | consumed samples: 19553280 | consumed tokens: 40045117440 | elapsed time per iteration (s): 0.11 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.528345E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.045 | TFLOPs: 8.95 | 7: iteration 76390/ 173500 | consumed samples: 19555840 | consumed tokens: 40050360320 | elapsed time per iteration (s): 0.11 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 4.538924E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.950 | TFLOPs: 8.95 | 7: iteration 76400/ 173500 | consumed samples: 19558400 | consumed tokens: 40055603200 | elapsed time per iteration (s): 0.10 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.533780E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2450.888 | TFLOPs: 9.12 | 7: iteration 76410/ 173500 | consumed samples: 19560960 | consumed tokens: 40060846080 | elapsed time per iteration (s): 0.10 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.520839E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.527 | TFLOPs: 9.25 | 7: iteration 76420/ 173500 | consumed samples: 19563520 | consumed tokens: 40066088960 | elapsed time per iteration (s): 0.10 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.527250E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.496 | TFLOPs: 9.24 | 7: iteration 76430/ 173500 | consumed samples: 19566080 | consumed tokens: 40071331840 | elapsed time per iteration (s): 0.10 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.530259E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.329 | TFLOPs: 9.30 | 7: iteration 76440/ 173500 | consumed samples: 19568640 | consumed tokens: 40076574720 | elapsed time per iteration (s): 0.11 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.530591E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2346.114 | TFLOPs: 8.73 | 7: iteration 76450/ 173500 | consumed samples: 19571200 | consumed tokens: 40081817600 | elapsed time per iteration (s): 0.11 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 4.539265E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.106 | TFLOPs: 8.82 | 7: iteration 76460/ 173500 | consumed samples: 19573760 | consumed tokens: 40087060480 | elapsed time per iteration (s): 0.11 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.540336E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2397.393 | TFLOPs: 8.92 | 7: iteration 76470/ 173500 | consumed samples: 19576320 | consumed tokens: 40092303360 | elapsed time per iteration (s): 0.11 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.537668E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2432.389 | TFLOPs: 9.05 | 7: iteration 76480/ 173500 | consumed samples: 19578880 | consumed tokens: 40097546240 | elapsed time per iteration (s): 0.10 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.534457E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2451.270 | TFLOPs: 9.12 | 7: iteration 76490/ 173500 | consumed samples: 19581440 | consumed tokens: 40102789120 | elapsed time per iteration (s): 0.11 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.514451E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2336.736 | TFLOPs: 8.69 | 7: iteration 76500/ 173500 | consumed samples: 19584000 | consumed tokens: 40108032000 | elapsed time per iteration (s): 0.11 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.539140E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.925 | TFLOPs: 8.88 | 7: iteration 76510/ 173500 | consumed samples: 19586560 | consumed tokens: 40113274880 | elapsed time per iteration (s): 0.11 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 4.528517E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.543 | TFLOPs: 8.59 | 7: iteration 76520/ 173500 | consumed samples: 19589120 | consumed tokens: 40118517760 | elapsed time per iteration (s): 0.11 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.524155E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.560 | TFLOPs: 8.72 | 7: iteration 76530/ 173500 | consumed samples: 19591680 | consumed tokens: 40123760640 | elapsed time per iteration (s): 0.10 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.536531E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.496 | TFLOPs: 9.23 | 7: iteration 76540/ 173500 | consumed samples: 19594240 | consumed tokens: 40129003520 | elapsed time per iteration (s): 0.11 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.537344E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.123 | TFLOPs: 8.95 | 7: iteration 76550/ 173500 | consumed samples: 19596800 | consumed tokens: 40134246400 | elapsed time per iteration (s): 0.11 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.538070E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.468 | TFLOPs: 9.02 | 7: iteration 76560/ 173500 | consumed samples: 19599360 | consumed tokens: 40139489280 | elapsed time per iteration (s): 0.12 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.528708E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.834 | TFLOPs: 7.86 | 7: iteration 76570/ 173500 | consumed samples: 19601920 | consumed tokens: 40144732160 | elapsed time per iteration (s): 0.09 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 4.530632E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.309 | TFLOPs: 11.13 | 7: iteration 76580/ 173500 | consumed samples: 19604480 | consumed tokens: 40149975040 | elapsed time per iteration (s): 0.10 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.526097E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2474.339 | TFLOPs: 9.20 | 7: iteration 76590/ 173500 | consumed samples: 19607040 | consumed tokens: 40155217920 | elapsed time per iteration (s): 0.09 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.540741E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2707.837 | TFLOPs: 10.07 | 7: iteration 76600/ 173500 | consumed samples: 19609600 | consumed tokens: 40160460800 | elapsed time per iteration (s): 0.08 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.534665E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.965 | TFLOPs: 11.82 | 7: iteration 76610/ 173500 | consumed samples: 19612160 | consumed tokens: 40165703680 | elapsed time per iteration (s): 0.08 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.533373E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.700 | TFLOPs: 11.65 | 7: iteration 76620/ 173500 | consumed samples: 19614720 | consumed tokens: 40170946560 | elapsed time per iteration (s): 0.08 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.523268E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.197 | TFLOPs: 11.49 | 7: iteration 76630/ 173500 | consumed samples: 19617280 | consumed tokens: 40176189440 | elapsed time per iteration (s): 0.09 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 4.528288E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.257 | TFLOPs: 11.04 | 7: iteration 76640/ 173500 | consumed samples: 19619840 | consumed tokens: 40181432320 | elapsed time per iteration (s): 0.08 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.529603E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.516 | TFLOPs: 11.61 | 7: iteration 76650/ 173500 | consumed samples: 19622400 | consumed tokens: 40186675200 | elapsed time per iteration (s): 0.08 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.542703E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.959 | TFLOPs: 11.76 | 7: iteration 76660/ 173500 | consumed samples: 19624960 | consumed tokens: 40191918080 | elapsed time per iteration (s): 0.09 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.539451E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.217 | TFLOPs: 11.19 | 7: iteration 76670/ 173500 | consumed samples: 19627520 | consumed tokens: 40197160960 | elapsed time per iteration (s): 0.08 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.532441E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.830 | TFLOPs: 11.88 | 7: iteration 76680/ 173500 | consumed samples: 19630080 | consumed tokens: 40202403840 | elapsed time per iteration (s): 0.09 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.529192E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.309 | TFLOPs: 10.82 | 7: iteration 76690/ 173500 | consumed samples: 19632640 | consumed tokens: 40207646720 | elapsed time per iteration (s): 0.09 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.527248E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2980.805 | TFLOPs: 11.09 | 7: iteration 76700/ 173500 | consumed samples: 19635200 | consumed tokens: 40212889600 | elapsed time per iteration (s): 0.09 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 4.525317E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.656 | TFLOPs: 11.07 | 7: iteration 76710/ 173500 | consumed samples: 19637760 | consumed tokens: 40218132480 | elapsed time per iteration (s): 0.09 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.527697E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.971 | TFLOPs: 10.36 | 7: iteration 76720/ 173500 | consumed samples: 19640320 | consumed tokens: 40223375360 | elapsed time per iteration (s): 0.09 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.528938E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.506 | TFLOPs: 11.19 | 7: iteration 76730/ 173500 | consumed samples: 19642880 | consumed tokens: 40228618240 | elapsed time per iteration (s): 0.08 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.532102E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.468 | TFLOPs: 11.84 | 7: iteration 76740/ 173500 | consumed samples: 19645440 | consumed tokens: 40233861120 | elapsed time per iteration (s): 0.08 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.538756E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.229 | TFLOPs: 11.32 | 7: iteration 76750/ 173500 | consumed samples: 19648000 | consumed tokens: 40239104000 | elapsed time per iteration (s): 0.08 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.530424E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.404 | TFLOPs: 11.92 | 7: iteration 76760/ 173500 | consumed samples: 19650560 | consumed tokens: 40244346880 | elapsed time per iteration (s): 0.08 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 4.540476E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.031 | TFLOPs: 11.37 | 7: iteration 76770/ 173500 | consumed samples: 19653120 | consumed tokens: 40249589760 | elapsed time per iteration (s): 0.08 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.530885E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.237 | TFLOPs: 11.92 | 7: iteration 76780/ 173500 | consumed samples: 19655680 | consumed tokens: 40254832640 | elapsed time per iteration (s): 0.08 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.521983E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.556 | TFLOPs: 11.85 | 7: iteration 76790/ 173500 | consumed samples: 19658240 | consumed tokens: 40260075520 | elapsed time per iteration (s): 0.08 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.540135E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.497 | TFLOPs: 11.67 | 7: iteration 76800/ 173500 | consumed samples: 19660800 | consumed tokens: 40265318400 | elapsed time per iteration (s): 0.08 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.530209E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.440 | TFLOPs: 11.94 | 7: iteration 76810/ 173500 | consumed samples: 19663360 | consumed tokens: 40270561280 | elapsed time per iteration (s): 0.10 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.538223E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.300 | TFLOPs: 9.41 | 7: iteration 76820/ 173500 | consumed samples: 19665920 | consumed tokens: 40275804160 | elapsed time per iteration (s): 0.09 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 4.531582E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.205 | TFLOPs: 10.43 | 7: iteration 76830/ 173500 | consumed samples: 19668480 | consumed tokens: 40281047040 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.538898E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.229 | TFLOPs: 11.97 | 7: iteration 76840/ 173500 | consumed samples: 19671040 | consumed tokens: 40286289920 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.529878E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.398 | TFLOPs: 11.67 | 7: iteration 76850/ 173500 | consumed samples: 19673600 | consumed tokens: 40291532800 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.537831E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.633 | TFLOPs: 11.69 | 7: iteration 76860/ 173500 | consumed samples: 19676160 | consumed tokens: 40296775680 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.534703E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.069 | TFLOPs: 11.88 | 7: iteration 76870/ 173500 | consumed samples: 19678720 | consumed tokens: 40302018560 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.539127E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.461 | TFLOPs: 11.93 | 7: iteration 76880/ 173500 | consumed samples: 19681280 | consumed tokens: 40307261440 | elapsed time per iteration (s): 0.08 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 4.543221E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.161 | TFLOPs: 11.90 | 7: iteration 76890/ 173500 | consumed samples: 19683840 | consumed tokens: 40312504320 | elapsed time per iteration (s): 0.08 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.536172E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.667 | TFLOPs: 11.92 | 7: iteration 76900/ 173500 | consumed samples: 19686400 | consumed tokens: 40317747200 | elapsed time per iteration (s): 0.09 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.526888E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.054 | TFLOPs: 11.17 | 7: iteration 76910/ 173500 | consumed samples: 19688960 | consumed tokens: 40322990080 | elapsed time per iteration (s): 0.09 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.531204E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.715 | TFLOPs: 10.77 | 7: iteration 76920/ 173500 | consumed samples: 19691520 | consumed tokens: 40328232960 | elapsed time per iteration (s): 0.08 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.529505E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.988 | TFLOPs: 11.95 | 7: iteration 76930/ 173500 | consumed samples: 19694080 | consumed tokens: 40333475840 | elapsed time per iteration (s): 0.08 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.521835E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.188 | TFLOPs: 11.68 | 7: iteration 76940/ 173500 | consumed samples: 19696640 | consumed tokens: 40338718720 | elapsed time per iteration (s): 0.08 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 4.525848E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.599 | TFLOPs: 11.98 | 7: iteration 76950/ 173500 | consumed samples: 19699200 | consumed tokens: 40343961600 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.531195E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.980 | TFLOPs: 11.57 | 7: iteration 76960/ 173500 | consumed samples: 19701760 | consumed tokens: 40349204480 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.534119E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.107 | TFLOPs: 11.89 | 7: iteration 76970/ 173500 | consumed samples: 19704320 | consumed tokens: 40354447360 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.538412E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.933 | TFLOPs: 11.88 | 7: iteration 76980/ 173500 | consumed samples: 19706880 | consumed tokens: 40359690240 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.529593E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.141 | TFLOPs: 11.90 | 7: iteration 76990/ 173500 | consumed samples: 19709440 | consumed tokens: 40364933120 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.522380E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.833 | TFLOPs: 11.82 | 7: iteration 77000/ 173500 | consumed samples: 19712000 | consumed tokens: 40370176000 | elapsed time per iteration (s): 0.08 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.529877E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.938 | TFLOPs: 11.93 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 77000 | lm loss value: 4.357389E+00 | lm loss PPL: 7.805311E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 77000 to checkpoints_14m91b100m 0: [2023-03-17 02:07:14,776] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step77000 is begin to save! 0: [2023-03-17 02:07:14,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:07:14,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:07:14,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:07:14,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:07:14,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:07:14,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:07:14,812] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:07:14,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:07:14,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:07:14,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:07:14,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:07:14,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:07:14,819] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step77000/mp_rank_00_model_states.pt 0: [2023-03-17 02:07:14,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:07:14,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:07:14,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:07:14,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2023-03-17 02:07:14,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:07:14,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2023-03-17 02:07:14,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:07:14,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:07:14,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2023-03-17 02:07:14,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:07:14,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:07:14,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: successfully saved checkpoint at iteration 77000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.97 7: iteration 77010/ 173500 | consumed samples: 19714560 | consumed tokens: 40375418880 | elapsed time per iteration (s): 0.11 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 4.527272E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2240.841 | TFLOPs: 8.33 | 7: iteration 77020/ 173500 | consumed samples: 19717120 | consumed tokens: 40380661760 | elapsed time per iteration (s): 0.11 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.544299E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.376 | TFLOPs: 9.05 | 7: iteration 77030/ 173500 | consumed samples: 19719680 | consumed tokens: 40385904640 | elapsed time per iteration (s): 0.11 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.544099E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.346 | TFLOPs: 8.98 | 7: iteration 77040/ 173500 | consumed samples: 19722240 | consumed tokens: 40391147520 | elapsed time per iteration (s): 0.08 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.533504E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.213 | TFLOPs: 11.33 | 7: iteration 77050/ 173500 | consumed samples: 19724800 | consumed tokens: 40396390400 | elapsed time per iteration (s): 0.08 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.524646E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.981 | TFLOPs: 11.82 | 7: iteration 77060/ 173500 | consumed samples: 19727360 | consumed tokens: 40401633280 | elapsed time per iteration (s): 0.08 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.543964E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.003 | TFLOPs: 11.91 | 7: iteration 77070/ 173500 | consumed samples: 19729920 | consumed tokens: 40406876160 | elapsed time per iteration (s): 0.08 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 4.526402E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.443 | TFLOPs: 11.94 | 7: iteration 77080/ 173500 | consumed samples: 19732480 | consumed tokens: 40412119040 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.541539E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.255 | TFLOPs: 11.62 | 7: iteration 77090/ 173500 | consumed samples: 19735040 | consumed tokens: 40417361920 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.538482E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.152 | TFLOPs: 11.32 | 7: iteration 77100/ 173500 | consumed samples: 19737600 | consumed tokens: 40422604800 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.537402E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.722 | TFLOPs: 11.56 | 7: iteration 77110/ 173500 | consumed samples: 19740160 | consumed tokens: 40427847680 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.533885E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.566 | TFLOPs: 11.36 | 7: iteration 77120/ 173500 | consumed samples: 19742720 | consumed tokens: 40433090560 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.523711E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.442 | TFLOPs: 11.55 | 7: iteration 77130/ 173500 | consumed samples: 19745280 | consumed tokens: 40438333440 | elapsed time per iteration (s): 0.08 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 4.531811E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.221 | TFLOPs: 11.63 | 7: iteration 77140/ 173500 | consumed samples: 19747840 | consumed tokens: 40443576320 | elapsed time per iteration (s): 0.08 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.531196E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.353 | TFLOPs: 11.62 | 7: iteration 77150/ 173500 | consumed samples: 19750400 | consumed tokens: 40448819200 | elapsed time per iteration (s): 0.08 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.528917E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.572 | TFLOPs: 11.61 | 7: iteration 77160/ 173500 | consumed samples: 19752960 | consumed tokens: 40454062080 | elapsed time per iteration (s): 0.08 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.544138E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.500 | TFLOPs: 11.87 | 7: iteration 77170/ 173500 | consumed samples: 19755520 | consumed tokens: 40459304960 | elapsed time per iteration (s): 0.09 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.526658E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.517 | TFLOPs: 10.81 | 7: iteration 77180/ 173500 | consumed samples: 19758080 | consumed tokens: 40464547840 | elapsed time per iteration (s): 0.08 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.527470E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.201 | TFLOPs: 11.89 | 7: iteration 77190/ 173500 | consumed samples: 19760640 | consumed tokens: 40469790720 | elapsed time per iteration (s): 0.08 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 4.526971E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.018 | TFLOPs: 11.66 | 7: iteration 77200/ 173500 | consumed samples: 19763200 | consumed tokens: 40475033600 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.516889E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.477 | TFLOPs: 11.63 | 7: iteration 77210/ 173500 | consumed samples: 19765760 | consumed tokens: 40480276480 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.543968E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.166 | TFLOPs: 11.67 | 7: iteration 77220/ 173500 | consumed samples: 19768320 | consumed tokens: 40485519360 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.529685E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.669 | TFLOPs: 11.92 | 7: iteration 77230/ 173500 | consumed samples: 19770880 | consumed tokens: 40490762240 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.537523E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.438 | TFLOPs: 11.67 | 7: iteration 77240/ 173500 | consumed samples: 19773440 | consumed tokens: 40496005120 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.534996E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.758 | TFLOPs: 11.94 | 7: iteration 77250/ 173500 | consumed samples: 19776000 | consumed tokens: 40501248000 | elapsed time per iteration (s): 0.08 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 4.538290E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.438 | TFLOPs: 11.81 | 7: iteration 77260/ 173500 | consumed samples: 19778560 | consumed tokens: 40506490880 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.533362E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.707 | TFLOPs: 11.95 | 7: iteration 77270/ 173500 | consumed samples: 19781120 | consumed tokens: 40511733760 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.530748E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.794 | TFLOPs: 11.90 | 7: iteration 77280/ 173500 | consumed samples: 19783680 | consumed tokens: 40516976640 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.525920E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.098 | TFLOPs: 11.96 | 7: iteration 77290/ 173500 | consumed samples: 19786240 | consumed tokens: 40522219520 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.538834E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.531 | TFLOPs: 11.95 | 7: iteration 77300/ 173500 | consumed samples: 19788800 | consumed tokens: 40527462400 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.521375E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.392 | TFLOPs: 11.95 | 7: iteration 77310/ 173500 | consumed samples: 19791360 | consumed tokens: 40532705280 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.542856E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.101 | TFLOPs: 11.92 | 7: iteration 77320/ 173500 | consumed samples: 19793920 | consumed tokens: 40537948160 | elapsed time per iteration (s): 0.08 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 4.539597E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.669 | TFLOPs: 11.91 | 7: iteration 77330/ 173500 | consumed samples: 19796480 | consumed tokens: 40543191040 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.523552E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.937 | TFLOPs: 11.56 | 7: iteration 77340/ 173500 | consumed samples: 19799040 | consumed tokens: 40548433920 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.542085E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.582 | TFLOPs: 11.91 | 7: iteration 77350/ 173500 | consumed samples: 19801600 | consumed tokens: 40553676800 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.531792E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.759 | TFLOPs: 11.84 | 7: iteration 77360/ 173500 | consumed samples: 19804160 | consumed tokens: 40558919680 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.527274E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.570 | TFLOPs: 11.91 | 7: iteration 77370/ 173500 | consumed samples: 19806720 | consumed tokens: 40564162560 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.540752E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.728 | TFLOPs: 11.62 | 7: iteration 77380/ 173500 | consumed samples: 19809280 | consumed tokens: 40569405440 | elapsed time per iteration (s): 0.08 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 4.533955E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.940 | TFLOPs: 11.95 | 7: iteration 77390/ 173500 | consumed samples: 19811840 | consumed tokens: 40574648320 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.530408E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.639 | TFLOPs: 11.86 | 7: iteration 77400/ 173500 | consumed samples: 19814400 | consumed tokens: 40579891200 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.523877E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.512 | TFLOPs: 11.69 | 7: iteration 77410/ 173500 | consumed samples: 19816960 | consumed tokens: 40585134080 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.538315E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.517 | TFLOPs: 11.93 | 7: iteration 77420/ 173500 | consumed samples: 19819520 | consumed tokens: 40590376960 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.522824E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.916 | TFLOPs: 11.69 | 7: iteration 77430/ 173500 | consumed samples: 19822080 | consumed tokens: 40595619840 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.537520E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.551 | TFLOPs: 11.69 | 7: iteration 77440/ 173500 | consumed samples: 19824640 | consumed tokens: 40600862720 | elapsed time per iteration (s): 0.08 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.532621E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.471 | TFLOPs: 11.85 | 7: iteration 77450/ 173500 | consumed samples: 19827200 | consumed tokens: 40606105600 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.527169E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.078 | TFLOPs: 11.91 | 7: iteration 77460/ 173500 | consumed samples: 19829760 | consumed tokens: 40611348480 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.513857E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.838 | TFLOPs: 11.90 | 7: iteration 77470/ 173500 | consumed samples: 19832320 | consumed tokens: 40616591360 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.544638E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.161 | TFLOPs: 11.55 | 7: iteration 77480/ 173500 | consumed samples: 19834880 | consumed tokens: 40621834240 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.523377E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.488 | TFLOPs: 11.92 | 7: iteration 77490/ 173500 | consumed samples: 19837440 | consumed tokens: 40627077120 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.530159E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.450 | TFLOPs: 11.92 | 7: iteration 77500/ 173500 | consumed samples: 19840000 | consumed tokens: 40632320000 | elapsed time per iteration (s): 0.08 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 4.534620E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.062 | TFLOPs: 11.66 | 7: iteration 77510/ 173500 | consumed samples: 19842560 | consumed tokens: 40637562880 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.536589E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.237 | TFLOPs: 11.47 | 7: iteration 77520/ 173500 | consumed samples: 19845120 | consumed tokens: 40642805760 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.535589E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.326 | TFLOPs: 11.93 | 7: iteration 77530/ 173500 | consumed samples: 19847680 | consumed tokens: 40648048640 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.522889E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.239 | TFLOPs: 11.95 | 7: iteration 77540/ 173500 | consumed samples: 19850240 | consumed tokens: 40653291520 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.535227E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.417 | TFLOPs: 11.90 | 7: iteration 77550/ 173500 | consumed samples: 19852800 | consumed tokens: 40658534400 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.537089E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.868 | TFLOPs: 11.94 | 7: iteration 77560/ 173500 | consumed samples: 19855360 | consumed tokens: 40663777280 | elapsed time per iteration (s): 0.08 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 4.537058E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.492 | TFLOPs: 11.87 | 7: iteration 77570/ 173500 | consumed samples: 19857920 | consumed tokens: 40669020160 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.525840E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.003 | TFLOPs: 11.92 | 7: iteration 77580/ 173500 | consumed samples: 19860480 | consumed tokens: 40674263040 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.531924E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.036 | TFLOPs: 11.92 | 7: iteration 77590/ 173500 | consumed samples: 19863040 | consumed tokens: 40679505920 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.518180E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.633 | TFLOPs: 11.91 | 7: iteration 77600/ 173500 | consumed samples: 19865600 | consumed tokens: 40684748800 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.521308E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.005 | TFLOPs: 11.92 | 7: iteration 77610/ 173500 | consumed samples: 19868160 | consumed tokens: 40689991680 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.506709E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.761 | TFLOPs: 11.40 | 7: iteration 77620/ 173500 | consumed samples: 19870720 | consumed tokens: 40695234560 | elapsed time per iteration (s): 0.08 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 4.538010E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.302 | TFLOPs: 11.86 | 7: iteration 77630/ 173500 | consumed samples: 19873280 | consumed tokens: 40700477440 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.539997E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.775 | TFLOPs: 11.94 | 7: iteration 77640/ 173500 | consumed samples: 19875840 | consumed tokens: 40705720320 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.534156E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.465 | TFLOPs: 11.90 | 7: iteration 77650/ 173500 | consumed samples: 19878400 | consumed tokens: 40710963200 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.535486E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.497 | TFLOPs: 11.96 | 7: iteration 77660/ 173500 | consumed samples: 19880960 | consumed tokens: 40716206080 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.536992E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.208 | TFLOPs: 11.67 | 7: iteration 77670/ 173500 | consumed samples: 19883520 | consumed tokens: 40721448960 | elapsed time per iteration (s): 0.09 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.539393E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.439 | TFLOPs: 10.57 | 7: iteration 77680/ 173500 | consumed samples: 19886080 | consumed tokens: 40726691840 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.514478E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.631 | TFLOPs: 11.98 | 7: iteration 77690/ 173500 | consumed samples: 19888640 | consumed tokens: 40731934720 | elapsed time per iteration (s): 0.08 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 4.529915E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.028 | TFLOPs: 11.94 | 7: iteration 77700/ 173500 | consumed samples: 19891200 | consumed tokens: 40737177600 | elapsed time per iteration (s): 0.08 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.523644E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.169 | TFLOPs: 11.98 | 7: iteration 77710/ 173500 | consumed samples: 19893760 | consumed tokens: 40742420480 | elapsed time per iteration (s): 0.08 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.526851E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.566 | TFLOPs: 11.45 | 7: iteration 77720/ 173500 | consumed samples: 19896320 | consumed tokens: 40747663360 | elapsed time per iteration (s): 0.08 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.528236E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.461 | TFLOPs: 11.96 | 7: iteration 77730/ 173500 | consumed samples: 19898880 | consumed tokens: 40752906240 | elapsed time per iteration (s): 0.08 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.535058E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.271 | TFLOPs: 11.75 | 7: iteration 77740/ 173500 | consumed samples: 19901440 | consumed tokens: 40758149120 | elapsed time per iteration (s): 0.08 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.542271E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.321 | TFLOPs: 11.98 | 7: iteration 77750/ 173500 | consumed samples: 19904000 | consumed tokens: 40763392000 | elapsed time per iteration (s): 0.09 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 4.527051E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.773 | TFLOPs: 10.71 | 7: iteration 77760/ 173500 | consumed samples: 19906560 | consumed tokens: 40768634880 | elapsed time per iteration (s): 0.09 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.532325E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.903 | TFLOPs: 11.00 | 7: iteration 77770/ 173500 | consumed samples: 19909120 | consumed tokens: 40773877760 | elapsed time per iteration (s): 0.08 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.541077E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.230 | TFLOPs: 11.22 | 7: iteration 77780/ 173500 | consumed samples: 19911680 | consumed tokens: 40779120640 | elapsed time per iteration (s): 0.08 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.539250E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.263 | TFLOPs: 12.00 | 7: iteration 77790/ 173500 | consumed samples: 19914240 | consumed tokens: 40784363520 | elapsed time per iteration (s): 0.08 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.532227E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.248 | TFLOPs: 12.01 | 7: iteration 77800/ 173500 | consumed samples: 19916800 | consumed tokens: 40789606400 | elapsed time per iteration (s): 0.08 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.534882E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.094 | TFLOPs: 11.96 | 7: iteration 77810/ 173500 | consumed samples: 19919360 | consumed tokens: 40794849280 | elapsed time per iteration (s): 0.08 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 4.531522E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.224 | TFLOPs: 11.86 | 7: iteration 77820/ 173500 | consumed samples: 19921920 | consumed tokens: 40800092160 | elapsed time per iteration (s): 0.08 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.532486E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.171 | TFLOPs: 11.78 | 7: iteration 77830/ 173500 | consumed samples: 19924480 | consumed tokens: 40805335040 | elapsed time per iteration (s): 0.08 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.537714E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.368 | TFLOPs: 11.54 | 7: iteration 77840/ 173500 | consumed samples: 19927040 | consumed tokens: 40810577920 | elapsed time per iteration (s): 0.08 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.519579E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.256 | TFLOPs: 11.55 | 7: iteration 77850/ 173500 | consumed samples: 19929600 | consumed tokens: 40815820800 | elapsed time per iteration (s): 0.09 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.529522E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.425 | TFLOPs: 11.08 | 7: iteration 77860/ 173500 | consumed samples: 19932160 | consumed tokens: 40821063680 | elapsed time per iteration (s): 0.09 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.543557E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.725 | TFLOPs: 10.97 | 7: iteration 77870/ 173500 | consumed samples: 19934720 | consumed tokens: 40826306560 | elapsed time per iteration (s): 0.09 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 4.526772E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.779 | TFLOPs: 10.88 | 7: iteration 77880/ 173500 | consumed samples: 19937280 | consumed tokens: 40831549440 | elapsed time per iteration (s): 0.12 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.525421E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.219 | TFLOPs: 8.09 | 7: iteration 77890/ 173500 | consumed samples: 19939840 | consumed tokens: 40836792320 | elapsed time per iteration (s): 0.12 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.530060E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2124.858 | TFLOPs: 7.90 | 7: iteration 77900/ 173500 | consumed samples: 19942400 | consumed tokens: 40842035200 | elapsed time per iteration (s): 0.08 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.516074E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.746 | TFLOPs: 11.62 | 7: iteration 77910/ 173500 | consumed samples: 19944960 | consumed tokens: 40847278080 | elapsed time per iteration (s): 0.09 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.521063E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.109 | TFLOPs: 11.10 | 7: iteration 77920/ 173500 | consumed samples: 19947520 | consumed tokens: 40852520960 | elapsed time per iteration (s): 0.09 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.532873E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.820 | TFLOPs: 11.02 | 7: iteration 77930/ 173500 | consumed samples: 19950080 | consumed tokens: 40857763840 | elapsed time per iteration (s): 0.08 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 4.536706E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.403 | TFLOPs: 11.28 | 7: iteration 77940/ 173500 | consumed samples: 19952640 | consumed tokens: 40863006720 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.520389E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.630 | TFLOPs: 11.11 | 7: iteration 77950/ 173500 | consumed samples: 19955200 | consumed tokens: 40868249600 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.517744E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.034 | TFLOPs: 10.84 | 7: iteration 77960/ 173500 | consumed samples: 19957760 | consumed tokens: 40873492480 | elapsed time per iteration (s): 0.08 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.523154E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.391 | TFLOPs: 11.61 | 7: iteration 77970/ 173500 | consumed samples: 19960320 | consumed tokens: 40878735360 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.536895E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.368 | TFLOPs: 10.86 | 7: iteration 77980/ 173500 | consumed samples: 19962880 | consumed tokens: 40883978240 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.542001E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.628 | TFLOPs: 11.09 | 7: iteration 77990/ 173500 | consumed samples: 19965440 | consumed tokens: 40889221120 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.530640E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.292 | TFLOPs: 10.99 | 0: [2023-03-17 02:08:38,156] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=0, lr=[0.00012575030905458257, 0.00012575030905458257, 0.00012575030905458257], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 78000/ 173500 | consumed samples: 19968000 | consumed tokens: 40894464000 | elapsed time per iteration (s): 0.09 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 4.529899E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.077 | TFLOPs: 10.75 | 0: steps: 78000 loss: 4.5166 iter time (s): 0.088 samples/sec: 2922.019 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 78000 | lm loss value: 4.364898E+00 | lm loss PPL: 7.864139E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 78000 to checkpoints_14m91b100m 0: [2023-03-17 02:08:38,215] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step78000 is begin to save! 0: [2023-03-17 02:08:38,218] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:08:38,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:08:38,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:08:38,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:08:38,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:08:38,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:08:38,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:08:38,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:08:38,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:08:38,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:08:38,257] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:08:38,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:08:38,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step78000/mp_rank_00_model_states.pt 0: [2023-03-17 02:08:38,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:08:38,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:08:38,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:08:38,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:08:38,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 02:08:38,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2023-03-17 02:08:38,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: successfully saved checkpoint at iteration 78000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.65 7: iteration 78010/ 173500 | consumed samples: 19970560 | consumed tokens: 40899706880 | elapsed time per iteration (s): 0.10 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.531610E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2629.246 | TFLOPs: 9.78 | 7: iteration 78020/ 173500 | consumed samples: 19973120 | consumed tokens: 40904949760 | elapsed time per iteration (s): 0.09 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.535461E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.662 | TFLOPs: 11.09 | 7: iteration 78030/ 173500 | consumed samples: 19975680 | consumed tokens: 40910192640 | elapsed time per iteration (s): 0.09 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.539735E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.401 | TFLOPs: 10.81 | 7: iteration 78040/ 173500 | consumed samples: 19978240 | consumed tokens: 40915435520 | elapsed time per iteration (s): 0.08 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.524739E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.665 | TFLOPs: 11.52 | 7: iteration 78050/ 173500 | consumed samples: 19980800 | consumed tokens: 40920678400 | elapsed time per iteration (s): 0.08 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.543604E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.335 | TFLOPs: 11.33 | 7: iteration 78060/ 173500 | consumed samples: 19983360 | consumed tokens: 40925921280 | elapsed time per iteration (s): 0.09 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 4.527053E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.681 | TFLOPs: 11.04 | 7: iteration 78070/ 173500 | consumed samples: 19985920 | consumed tokens: 40931164160 | elapsed time per iteration (s): 0.08 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.532908E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.464 | TFLOPs: 11.22 | 7: iteration 78080/ 173500 | consumed samples: 19988480 | consumed tokens: 40936407040 | elapsed time per iteration (s): 0.09 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.529734E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.682 | TFLOPs: 10.56 | 7: iteration 78090/ 173500 | consumed samples: 19991040 | consumed tokens: 40941649920 | elapsed time per iteration (s): 0.09 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.532028E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.658 | TFLOPs: 10.81 | 7: iteration 78100/ 173500 | consumed samples: 19993600 | consumed tokens: 40946892800 | elapsed time per iteration (s): 0.08 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.524860E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.894 | TFLOPs: 11.84 | 7: iteration 78110/ 173500 | consumed samples: 19996160 | consumed tokens: 40952135680 | elapsed time per iteration (s): 0.08 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.528587E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.144 | TFLOPs: 11.55 | 7: iteration 78120/ 173500 | consumed samples: 19998720 | consumed tokens: 40957378560 | elapsed time per iteration (s): 0.11 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 4.526850E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2259.902 | TFLOPs: 8.41 | 7: iteration 78130/ 173500 | consumed samples: 20001280 | consumed tokens: 40962621440 | elapsed time per iteration (s): 0.11 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.529839E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.720 | TFLOPs: 8.41 | 7: iteration 78140/ 173500 | consumed samples: 20003840 | consumed tokens: 40967864320 | elapsed time per iteration (s): 0.10 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.541418E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2635.610 | TFLOPs: 9.80 | 7: iteration 78150/ 173500 | consumed samples: 20006400 | consumed tokens: 40973107200 | elapsed time per iteration (s): 0.10 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.530136E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.071 | TFLOPs: 9.49 | 7: iteration 78160/ 173500 | consumed samples: 20008960 | consumed tokens: 40978350080 | elapsed time per iteration (s): 0.10 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.530283E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.764 | TFLOPs: 9.27 | 7: iteration 78170/ 173500 | consumed samples: 20011520 | consumed tokens: 40983592960 | elapsed time per iteration (s): 0.11 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.523228E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2403.722 | TFLOPs: 8.94 | 7: iteration 78180/ 173500 | consumed samples: 20014080 | consumed tokens: 40988835840 | elapsed time per iteration (s): 0.10 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 4.532037E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.338 | TFLOPs: 9.52 | 7: iteration 78190/ 173500 | consumed samples: 20016640 | consumed tokens: 40994078720 | elapsed time per iteration (s): 0.10 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.544139E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.399 | TFLOPs: 9.37 | 7: iteration 78200/ 173500 | consumed samples: 20019200 | consumed tokens: 40999321600 | elapsed time per iteration (s): 0.10 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.536321E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.685 | TFLOPs: 9.26 | 7: iteration 78210/ 173500 | consumed samples: 20021760 | consumed tokens: 41004564480 | elapsed time per iteration (s): 0.10 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.540282E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.892 | TFLOPs: 9.30 | 7: iteration 78220/ 173500 | consumed samples: 20024320 | consumed tokens: 41009807360 | elapsed time per iteration (s): 0.11 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.542157E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.720 | TFLOPs: 8.88 | 7: iteration 78230/ 173500 | consumed samples: 20026880 | consumed tokens: 41015050240 | elapsed time per iteration (s): 0.11 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.528533E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.232 | TFLOPs: 8.95 | 7: iteration 78240/ 173500 | consumed samples: 20029440 | consumed tokens: 41020293120 | elapsed time per iteration (s): 0.10 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 4.532220E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.230 | TFLOPs: 9.23 | 7: iteration 78250/ 173500 | consumed samples: 20032000 | consumed tokens: 41025536000 | elapsed time per iteration (s): 0.10 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.542746E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.493 | TFLOPs: 9.76 | 7: iteration 78260/ 173500 | consumed samples: 20034560 | consumed tokens: 41030778880 | elapsed time per iteration (s): 0.10 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.526121E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.893 | TFLOPs: 9.19 | 7: iteration 78270/ 173500 | consumed samples: 20037120 | consumed tokens: 41036021760 | elapsed time per iteration (s): 0.10 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.522230E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.095 | TFLOPs: 9.30 | 7: iteration 78280/ 173500 | consumed samples: 20039680 | consumed tokens: 41041264640 | elapsed time per iteration (s): 0.09 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.530878E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.823 | TFLOPs: 10.44 | 7: iteration 78290/ 173500 | consumed samples: 20042240 | consumed tokens: 41046507520 | elapsed time per iteration (s): 0.10 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.525590E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.444 | TFLOPs: 9.76 | 7: iteration 78300/ 173500 | consumed samples: 20044800 | consumed tokens: 41051750400 | elapsed time per iteration (s): 0.10 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 4.526937E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.509 | TFLOPs: 9.09 | 7: iteration 78310/ 173500 | consumed samples: 20047360 | consumed tokens: 41056993280 | elapsed time per iteration (s): 0.10 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.528721E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.582 | TFLOPs: 9.41 | 7: iteration 78320/ 173500 | consumed samples: 20049920 | consumed tokens: 41062236160 | elapsed time per iteration (s): 0.11 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.529288E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2369.394 | TFLOPs: 8.81 | 7: iteration 78330/ 173500 | consumed samples: 20052480 | consumed tokens: 41067479040 | elapsed time per iteration (s): 0.10 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.535517E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.059 | TFLOPs: 9.49 | 7: iteration 78340/ 173500 | consumed samples: 20055040 | consumed tokens: 41072721920 | elapsed time per iteration (s): 0.10 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.525315E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.951 | TFLOPs: 9.48 | 7: iteration 78350/ 173500 | consumed samples: 20057600 | consumed tokens: 41077964800 | elapsed time per iteration (s): 0.10 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.531957E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.540 | TFLOPs: 10.00 | 7: iteration 78360/ 173500 | consumed samples: 20060160 | consumed tokens: 41083207680 | elapsed time per iteration (s): 0.10 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.514719E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.581 | TFLOPs: 9.96 | 7: iteration 78370/ 173500 | consumed samples: 20062720 | consumed tokens: 41088450560 | elapsed time per iteration (s): 0.11 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 4.535737E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2282.994 | TFLOPs: 8.49 | 7: iteration 78380/ 173500 | consumed samples: 20065280 | consumed tokens: 41093693440 | elapsed time per iteration (s): 0.08 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.532782E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.319 | TFLOPs: 11.76 | 7: iteration 78390/ 173500 | consumed samples: 20067840 | consumed tokens: 41098936320 | elapsed time per iteration (s): 0.09 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.524633E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.683 | TFLOPs: 10.47 | 7: iteration 78400/ 173500 | consumed samples: 20070400 | consumed tokens: 41104179200 | elapsed time per iteration (s): 0.08 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.527221E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.278 | TFLOPs: 11.25 | 7: iteration 78410/ 173500 | consumed samples: 20072960 | consumed tokens: 41109422080 | elapsed time per iteration (s): 0.08 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.536140E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.704 | TFLOPs: 11.85 | 7: iteration 78420/ 173500 | consumed samples: 20075520 | consumed tokens: 41114664960 | elapsed time per iteration (s): 0.08 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.521609E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.987 | TFLOPs: 11.84 | 7: iteration 78430/ 173500 | consumed samples: 20078080 | consumed tokens: 41119907840 | elapsed time per iteration (s): 0.08 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 4.527882E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.227 | TFLOPs: 11.86 | 7: iteration 78440/ 173500 | consumed samples: 20080640 | consumed tokens: 41125150720 | elapsed time per iteration (s): 0.09 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.522028E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2916.683 | TFLOPs: 10.85 | 7: iteration 78450/ 173500 | consumed samples: 20083200 | consumed tokens: 41130393600 | elapsed time per iteration (s): 0.08 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.538242E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.750 | TFLOPs: 11.38 | 7: iteration 78460/ 173500 | consumed samples: 20085760 | consumed tokens: 41135636480 | elapsed time per iteration (s): 0.08 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.526303E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.863 | TFLOPs: 11.84 | 7: iteration 78470/ 173500 | consumed samples: 20088320 | consumed tokens: 41140879360 | elapsed time per iteration (s): 0.09 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.524645E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.456 | TFLOPs: 10.94 | 7: iteration 78480/ 173500 | consumed samples: 20090880 | consumed tokens: 41146122240 | elapsed time per iteration (s): 0.09 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.529659E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.314 | TFLOPs: 10.75 | 7: iteration 78490/ 173500 | consumed samples: 20093440 | consumed tokens: 41151365120 | elapsed time per iteration (s): 0.08 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.545038E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.703 | TFLOPs: 11.83 | 7: iteration 78500/ 173500 | consumed samples: 20096000 | consumed tokens: 41156608000 | elapsed time per iteration (s): 0.08 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.546032E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.950 | TFLOPs: 11.62 | 7: iteration 78510/ 173500 | consumed samples: 20098560 | consumed tokens: 41161850880 | elapsed time per iteration (s): 0.08 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.531700E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.381 | TFLOPs: 11.84 | 7: iteration 78520/ 173500 | consumed samples: 20101120 | consumed tokens: 41167093760 | elapsed time per iteration (s): 0.08 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.535917E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.952 | TFLOPs: 11.79 | 7: iteration 78530/ 173500 | consumed samples: 20103680 | consumed tokens: 41172336640 | elapsed time per iteration (s): 0.12 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.526024E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.870 | TFLOPs: 8.07 | 7: iteration 78540/ 173500 | consumed samples: 20106240 | consumed tokens: 41177579520 | elapsed time per iteration (s): 0.12 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.520764E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2188.841 | TFLOPs: 8.14 | 7: iteration 78550/ 173500 | consumed samples: 20108800 | consumed tokens: 41182822400 | elapsed time per iteration (s): 0.09 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 4.541150E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.679 | TFLOPs: 10.81 | 7: iteration 78560/ 173500 | consumed samples: 20111360 | consumed tokens: 41188065280 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.527221E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.160 | TFLOPs: 11.59 | 7: iteration 78570/ 173500 | consumed samples: 20113920 | consumed tokens: 41193308160 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.516851E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.458 | TFLOPs: 11.77 | 7: iteration 78580/ 173500 | consumed samples: 20116480 | consumed tokens: 41198551040 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.532174E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.474 | TFLOPs: 11.50 | 7: iteration 78590/ 173500 | consumed samples: 20119040 | consumed tokens: 41203793920 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.523883E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.748 | TFLOPs: 11.77 | 7: iteration 78600/ 173500 | consumed samples: 20121600 | consumed tokens: 41209036800 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.539061E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.351 | TFLOPs: 11.83 | 7: iteration 78610/ 173500 | consumed samples: 20124160 | consumed tokens: 41214279680 | elapsed time per iteration (s): 0.08 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 4.531212E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.000 | TFLOPs: 11.57 | 7: iteration 78620/ 173500 | consumed samples: 20126720 | consumed tokens: 41219522560 | elapsed time per iteration (s): 0.08 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.521891E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.166 | TFLOPs: 11.84 | 7: iteration 78630/ 173500 | consumed samples: 20129280 | consumed tokens: 41224765440 | elapsed time per iteration (s): 0.08 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.541010E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.549 | TFLOPs: 11.57 | 7: iteration 78640/ 173500 | consumed samples: 20131840 | consumed tokens: 41230008320 | elapsed time per iteration (s): 0.08 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.537222E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.756 | TFLOPs: 11.83 | 7: iteration 78650/ 173500 | consumed samples: 20134400 | consumed tokens: 41235251200 | elapsed time per iteration (s): 0.08 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.528266E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.615 | TFLOPs: 11.29 | 7: iteration 78660/ 173500 | consumed samples: 20136960 | consumed tokens: 41240494080 | elapsed time per iteration (s): 0.10 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.537397E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2658.260 | TFLOPs: 9.89 | 7: iteration 78670/ 173500 | consumed samples: 20139520 | consumed tokens: 41245736960 | elapsed time per iteration (s): 0.08 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 4.530185E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.166 | TFLOPs: 11.21 | 7: iteration 78680/ 173500 | consumed samples: 20142080 | consumed tokens: 41250979840 | elapsed time per iteration (s): 0.09 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.529259E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.873 | TFLOPs: 10.95 | 7: iteration 78690/ 173500 | consumed samples: 20144640 | consumed tokens: 41256222720 | elapsed time per iteration (s): 0.09 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.514113E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2934.484 | TFLOPs: 10.91 | 7: iteration 78700/ 173500 | consumed samples: 20147200 | consumed tokens: 41261465600 | elapsed time per iteration (s): 0.10 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.533974E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2473.139 | TFLOPs: 9.20 | 7: iteration 78710/ 173500 | consumed samples: 20149760 | consumed tokens: 41266708480 | elapsed time per iteration (s): 0.09 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.526216E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.018 | TFLOPs: 10.97 | 7: iteration 78720/ 173500 | consumed samples: 20152320 | consumed tokens: 41271951360 | elapsed time per iteration (s): 0.08 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.523690E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.678 | TFLOPs: 11.58 | 7: iteration 78730/ 173500 | consumed samples: 20154880 | consumed tokens: 41277194240 | elapsed time per iteration (s): 0.09 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 4.534369E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.187 | TFLOPs: 10.97 | 7: iteration 78740/ 173500 | consumed samples: 20157440 | consumed tokens: 41282437120 | elapsed time per iteration (s): 0.08 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.513914E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.531 | TFLOPs: 11.86 | 7: iteration 78750/ 173500 | consumed samples: 20160000 | consumed tokens: 41287680000 | elapsed time per iteration (s): 0.11 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.521873E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2393.579 | TFLOPs: 8.90 | 7: iteration 78760/ 173500 | consumed samples: 20162560 | consumed tokens: 41292922880 | elapsed time per iteration (s): 0.08 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.540204E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.894 | TFLOPs: 11.60 | 7: iteration 78770/ 173500 | consumed samples: 20165120 | consumed tokens: 41298165760 | elapsed time per iteration (s): 0.09 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.526876E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.394 | TFLOPs: 11.15 | 7: iteration 78780/ 173500 | consumed samples: 20167680 | consumed tokens: 41303408640 | elapsed time per iteration (s): 0.08 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.533612E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.697 | TFLOPs: 11.92 | 7: iteration 78790/ 173500 | consumed samples: 20170240 | consumed tokens: 41308651520 | elapsed time per iteration (s): 0.11 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.545641E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2401.360 | TFLOPs: 8.93 | 7: iteration 78800/ 173500 | consumed samples: 20172800 | consumed tokens: 41313894400 | elapsed time per iteration (s): 0.13 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 4.528116E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.580 | TFLOPs: 7.21 | 7: iteration 78810/ 173500 | consumed samples: 20175360 | consumed tokens: 41319137280 | elapsed time per iteration (s): 0.11 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.514761E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.981 | TFLOPs: 9.03 | 7: iteration 78820/ 173500 | consumed samples: 20177920 | consumed tokens: 41324380160 | elapsed time per iteration (s): 0.12 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.532687E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2175.956 | TFLOPs: 8.09 | 7: iteration 78830/ 173500 | consumed samples: 20180480 | consumed tokens: 41329623040 | elapsed time per iteration (s): 0.08 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.520154E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.703 | TFLOPs: 11.88 | 7: iteration 78840/ 173500 | consumed samples: 20183040 | consumed tokens: 41334865920 | elapsed time per iteration (s): 0.08 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.523647E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.884 | TFLOPs: 11.64 | 7: iteration 78850/ 173500 | consumed samples: 20185600 | consumed tokens: 41340108800 | elapsed time per iteration (s): 0.09 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.522464E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2863.573 | TFLOPs: 10.65 | 7: iteration 78860/ 173500 | consumed samples: 20188160 | consumed tokens: 41345351680 | elapsed time per iteration (s): 0.09 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 4.527171E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.538 | TFLOPs: 10.41 | 7: iteration 78870/ 173500 | consumed samples: 20190720 | consumed tokens: 41350594560 | elapsed time per iteration (s): 0.09 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.525487E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2718.668 | TFLOPs: 10.11 | 7: iteration 78880/ 173500 | consumed samples: 20193280 | consumed tokens: 41355837440 | elapsed time per iteration (s): 0.08 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.528335E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.696 | TFLOPs: 12.01 | 7: iteration 78890/ 173500 | consumed samples: 20195840 | consumed tokens: 41361080320 | elapsed time per iteration (s): 0.08 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.528370E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.356 | TFLOPs: 11.99 | 7: iteration 78900/ 173500 | consumed samples: 20198400 | consumed tokens: 41366323200 | elapsed time per iteration (s): 0.08 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.532135E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.143 | TFLOPs: 11.95 | 7: iteration 78910/ 173500 | consumed samples: 20200960 | consumed tokens: 41371566080 | elapsed time per iteration (s): 0.08 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.540118E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.663 | TFLOPs: 11.95 | 7: iteration 78920/ 173500 | consumed samples: 20203520 | consumed tokens: 41376808960 | elapsed time per iteration (s): 0.08 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 4.527399E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.635 | TFLOPs: 11.92 | 7: iteration 78930/ 173500 | consumed samples: 20206080 | consumed tokens: 41382051840 | elapsed time per iteration (s): 0.08 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.525126E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.710 | TFLOPs: 11.30 | 7: iteration 78940/ 173500 | consumed samples: 20208640 | consumed tokens: 41387294720 | elapsed time per iteration (s): 0.09 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.526540E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2707.894 | TFLOPs: 10.07 | 7: iteration 78950/ 173500 | consumed samples: 20211200 | consumed tokens: 41392537600 | elapsed time per iteration (s): 0.08 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.535038E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.028 | TFLOPs: 11.82 | 7: iteration 78960/ 173500 | consumed samples: 20213760 | consumed tokens: 41397780480 | elapsed time per iteration (s): 0.11 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.531188E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.560 | TFLOPs: 8.88 | 7: iteration 78970/ 173500 | consumed samples: 20216320 | consumed tokens: 41403023360 | elapsed time per iteration (s): 0.11 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.540030E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2280.144 | TFLOPs: 8.48 | 7: iteration 78980/ 173500 | consumed samples: 20218880 | consumed tokens: 41408266240 | elapsed time per iteration (s): 0.10 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 4.531261E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.125 | TFLOPs: 9.24 | 7: iteration 78990/ 173500 | consumed samples: 20221440 | consumed tokens: 41413509120 | elapsed time per iteration (s): 0.10 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.536468E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.120 | TFLOPs: 9.53 | 7: iteration 79000/ 173500 | consumed samples: 20224000 | consumed tokens: 41418752000 | elapsed time per iteration (s): 0.09 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.529671E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.422 | TFLOPs: 11.00 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 79000 | lm loss value: 4.397038E+00 | lm loss PPL: 8.120997E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 79000 to checkpoints_14m91b100m 0: [2023-03-17 02:10:10,403] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step79000 is begin to save! 0: [2023-03-17 02:10:10,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:10:10,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:10:10,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:10:10,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:10:10,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:10:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:10:10,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:10:10,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:10:10,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:10:10,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:10:10,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:10:10,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:10:10,446] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step79000/mp_rank_00_model_states.pt 0: [2023-03-17 02:10:10,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:10:10,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:10:10,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:10:10,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2023-03-17 02:10:10,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2023-03-17 02:10:10,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:10:10,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: successfully saved checkpoint at iteration 79000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.26 7: iteration 79010/ 173500 | consumed samples: 20226560 | consumed tokens: 41423994880 | elapsed time per iteration (s): 0.09 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.528555E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.898 | TFLOPs: 10.50 | 7: iteration 79020/ 173500 | consumed samples: 20229120 | consumed tokens: 41429237760 | elapsed time per iteration (s): 0.08 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.549530E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.458 | TFLOPs: 11.95 | 7: iteration 79030/ 173500 | consumed samples: 20231680 | consumed tokens: 41434480640 | elapsed time per iteration (s): 0.09 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.527744E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.232 | TFLOPs: 10.14 | 7: iteration 79040/ 173500 | consumed samples: 20234240 | consumed tokens: 41439723520 | elapsed time per iteration (s): 0.09 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 4.520461E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.423 | TFLOPs: 11.01 | 7: iteration 79050/ 173500 | consumed samples: 20236800 | consumed tokens: 41444966400 | elapsed time per iteration (s): 0.08 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.538549E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.767 | TFLOPs: 11.86 | 7: iteration 79060/ 173500 | consumed samples: 20239360 | consumed tokens: 41450209280 | elapsed time per iteration (s): 0.08 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.531651E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.503 | TFLOPs: 11.86 | 7: iteration 79070/ 173500 | consumed samples: 20241920 | consumed tokens: 41455452160 | elapsed time per iteration (s): 0.09 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.539222E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.024 | TFLOPs: 10.75 | 7: iteration 79080/ 173500 | consumed samples: 20244480 | consumed tokens: 41460695040 | elapsed time per iteration (s): 0.09 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.521718E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.092 | TFLOPs: 10.97 | 7: iteration 79090/ 173500 | consumed samples: 20247040 | consumed tokens: 41465937920 | elapsed time per iteration (s): 0.08 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.531126E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.733 | TFLOPs: 11.95 | 7: iteration 79100/ 173500 | consumed samples: 20249600 | consumed tokens: 41471180800 | elapsed time per iteration (s): 0.08 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 4.522537E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.633 | TFLOPs: 11.90 | 7: iteration 79110/ 173500 | consumed samples: 20252160 | consumed tokens: 41476423680 | elapsed time per iteration (s): 0.08 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.539433E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.614 | TFLOPs: 11.80 | 7: iteration 79120/ 173500 | consumed samples: 20254720 | consumed tokens: 41481666560 | elapsed time per iteration (s): 0.08 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.531010E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.256 | TFLOPs: 11.49 | 7: iteration 79130/ 173500 | consumed samples: 20257280 | consumed tokens: 41486909440 | elapsed time per iteration (s): 0.09 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.540472E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2826.407 | TFLOPs: 10.51 | 7: iteration 79140/ 173500 | consumed samples: 20259840 | consumed tokens: 41492152320 | elapsed time per iteration (s): 0.08 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.539265E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.046 | TFLOPs: 11.83 | 7: iteration 79150/ 173500 | consumed samples: 20262400 | consumed tokens: 41497395200 | elapsed time per iteration (s): 0.08 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.522343E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.301 | TFLOPs: 11.84 | 7: iteration 79160/ 173500 | consumed samples: 20264960 | consumed tokens: 41502638080 | elapsed time per iteration (s): 0.08 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.525850E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.500 | TFLOPs: 11.90 | 7: iteration 79170/ 173500 | consumed samples: 20267520 | consumed tokens: 41507880960 | elapsed time per iteration (s): 0.09 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 4.529832E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2870.888 | TFLOPs: 10.68 | 7: iteration 79180/ 173500 | consumed samples: 20270080 | consumed tokens: 41513123840 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.532557E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.870 | TFLOPs: 11.90 | 7: iteration 79190/ 173500 | consumed samples: 20272640 | consumed tokens: 41518366720 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.543109E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.068 | TFLOPs: 11.93 | 7: iteration 79200/ 173500 | consumed samples: 20275200 | consumed tokens: 41523609600 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.535279E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.945 | TFLOPs: 11.98 | 7: iteration 79210/ 173500 | consumed samples: 20277760 | consumed tokens: 41528852480 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.534503E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.405 | TFLOPs: 11.41 | 7: iteration 79220/ 173500 | consumed samples: 20280320 | consumed tokens: 41534095360 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.532504E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.242 | TFLOPs: 11.68 | 7: iteration 79230/ 173500 | consumed samples: 20282880 | consumed tokens: 41539338240 | elapsed time per iteration (s): 0.08 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 4.538293E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.206 | TFLOPs: 11.45 | 7: iteration 79240/ 173500 | consumed samples: 20285440 | consumed tokens: 41544581120 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.512316E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.728 | TFLOPs: 11.70 | 7: iteration 79250/ 173500 | consumed samples: 20288000 | consumed tokens: 41549824000 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.535737E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.172 | TFLOPs: 11.88 | 7: iteration 79260/ 173500 | consumed samples: 20290560 | consumed tokens: 41555066880 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.529818E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.689 | TFLOPs: 11.98 | 7: iteration 79270/ 173500 | consumed samples: 20293120 | consumed tokens: 41560309760 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.528011E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.693 | TFLOPs: 11.99 | 7: iteration 79280/ 173500 | consumed samples: 20295680 | consumed tokens: 41565552640 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.528104E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.176 | TFLOPs: 11.51 | 7: iteration 79290/ 173500 | consumed samples: 20298240 | consumed tokens: 41570795520 | elapsed time per iteration (s): 0.08 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 4.533839E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.107 | TFLOPs: 11.88 | 7: iteration 79300/ 173500 | consumed samples: 20300800 | consumed tokens: 41576038400 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.535399E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.956 | TFLOPs: 11.99 | 7: iteration 79310/ 173500 | consumed samples: 20303360 | consumed tokens: 41581281280 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.539794E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.212 | TFLOPs: 11.96 | 7: iteration 79320/ 173500 | consumed samples: 20305920 | consumed tokens: 41586524160 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.529442E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.565 | TFLOPs: 11.93 | 7: iteration 79330/ 173500 | consumed samples: 20308480 | consumed tokens: 41591767040 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.529068E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.731 | TFLOPs: 11.93 | 7: iteration 79340/ 173500 | consumed samples: 20311040 | consumed tokens: 41597009920 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.537464E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.572 | TFLOPs: 11.94 | 7: iteration 79350/ 173500 | consumed samples: 20313600 | consumed tokens: 41602252800 | elapsed time per iteration (s): 0.08 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 4.523925E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.433 | TFLOPs: 11.93 | 7: iteration 79360/ 173500 | consumed samples: 20316160 | consumed tokens: 41607495680 | elapsed time per iteration (s): 0.08 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.536151E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.130 | TFLOPs: 11.62 | 7: iteration 79370/ 173500 | consumed samples: 20318720 | consumed tokens: 41612738560 | elapsed time per iteration (s): 0.09 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.533149E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2884.191 | TFLOPs: 10.73 | 7: iteration 79380/ 173500 | consumed samples: 20321280 | consumed tokens: 41617981440 | elapsed time per iteration (s): 0.08 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.531150E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.299 | TFLOPs: 11.65 | 7: iteration 79390/ 173500 | consumed samples: 20323840 | consumed tokens: 41623224320 | elapsed time per iteration (s): 0.10 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.531436E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2445.080 | TFLOPs: 9.09 | 7: iteration 79400/ 173500 | consumed samples: 20326400 | consumed tokens: 41628467200 | elapsed time per iteration (s): 0.12 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.528075E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.772 | TFLOPs: 7.75 | 7: iteration 79410/ 173500 | consumed samples: 20328960 | consumed tokens: 41633710080 | elapsed time per iteration (s): 0.08 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 4.529618E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.345 | TFLOPs: 11.61 | 7: iteration 79420/ 173500 | consumed samples: 20331520 | consumed tokens: 41638952960 | elapsed time per iteration (s): 0.08 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.531988E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.519 | TFLOPs: 11.83 | 7: iteration 79430/ 173500 | consumed samples: 20334080 | consumed tokens: 41644195840 | elapsed time per iteration (s): 0.08 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.527856E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.550 | TFLOPs: 11.80 | 7: iteration 79440/ 173500 | consumed samples: 20336640 | consumed tokens: 41649438720 | elapsed time per iteration (s): 0.09 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.542009E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.263 | TFLOPs: 11.16 | 7: iteration 79450/ 173500 | consumed samples: 20339200 | consumed tokens: 41654681600 | elapsed time per iteration (s): 0.08 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.528391E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.151 | TFLOPs: 11.96 | 7: iteration 79460/ 173500 | consumed samples: 20341760 | consumed tokens: 41659924480 | elapsed time per iteration (s): 0.09 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.526674E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.114 | TFLOPs: 11.08 | 7: iteration 79470/ 173500 | consumed samples: 20344320 | consumed tokens: 41665167360 | elapsed time per iteration (s): 0.08 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 4.532742E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.553 | TFLOPs: 11.97 | 7: iteration 79480/ 173500 | consumed samples: 20346880 | consumed tokens: 41670410240 | elapsed time per iteration (s): 0.08 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.540704E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.988 | TFLOPs: 11.43 | 7: iteration 79490/ 173500 | consumed samples: 20349440 | consumed tokens: 41675653120 | elapsed time per iteration (s): 0.11 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.528566E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.147 | TFLOPs: 8.80 | 7: iteration 79500/ 173500 | consumed samples: 20352000 | consumed tokens: 41680896000 | elapsed time per iteration (s): 0.08 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.532421E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.507 | TFLOPs: 11.87 | 7: iteration 79510/ 173500 | consumed samples: 20354560 | consumed tokens: 41686138880 | elapsed time per iteration (s): 0.08 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.520560E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.735 | TFLOPs: 11.70 | 7: iteration 79520/ 173500 | consumed samples: 20357120 | consumed tokens: 41691381760 | elapsed time per iteration (s): 0.08 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.534709E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.136 | TFLOPs: 11.88 | 7: iteration 79530/ 173500 | consumed samples: 20359680 | consumed tokens: 41696624640 | elapsed time per iteration (s): 0.08 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.527828E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.934 | TFLOPs: 11.87 | 7: iteration 79540/ 173500 | consumed samples: 20362240 | consumed tokens: 41701867520 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.524258E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.118 | TFLOPs: 11.91 | 7: iteration 79550/ 173500 | consumed samples: 20364800 | consumed tokens: 41707110400 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.539385E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.717 | TFLOPs: 11.85 | 7: iteration 79560/ 173500 | consumed samples: 20367360 | consumed tokens: 41712353280 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.518594E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.513 | TFLOPs: 11.83 | 7: iteration 79570/ 173500 | consumed samples: 20369920 | consumed tokens: 41717596160 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.541183E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.456 | TFLOPs: 11.93 | 7: iteration 79580/ 173500 | consumed samples: 20372480 | consumed tokens: 41722839040 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.538153E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.693 | TFLOPs: 11.91 | 7: iteration 79590/ 173500 | consumed samples: 20375040 | consumed tokens: 41728081920 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.535529E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.944 | TFLOPs: 11.94 | 7: iteration 79600/ 173500 | consumed samples: 20377600 | consumed tokens: 41733324800 | elapsed time per iteration (s): 0.08 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 4.544704E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.532 | TFLOPs: 11.86 | 7: iteration 79610/ 173500 | consumed samples: 20380160 | consumed tokens: 41738567680 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.526517E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.317 | TFLOPs: 11.95 | 7: iteration 79620/ 173500 | consumed samples: 20382720 | consumed tokens: 41743810560 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.530369E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.770 | TFLOPs: 11.93 | 7: iteration 79630/ 173500 | consumed samples: 20385280 | consumed tokens: 41749053440 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.534329E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.427 | TFLOPs: 11.93 | 7: iteration 79640/ 173500 | consumed samples: 20387840 | consumed tokens: 41754296320 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.532496E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.822 | TFLOPs: 11.91 | 7: iteration 79650/ 173500 | consumed samples: 20390400 | consumed tokens: 41759539200 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.519972E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.948 | TFLOPs: 11.92 | 7: iteration 79660/ 173500 | consumed samples: 20392960 | consumed tokens: 41764782080 | elapsed time per iteration (s): 0.08 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 4.529232E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.789 | TFLOPs: 11.84 | 7: iteration 79670/ 173500 | consumed samples: 20395520 | consumed tokens: 41770024960 | elapsed time per iteration (s): 0.08 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.515712E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.140 | TFLOPs: 11.93 | 7: iteration 79680/ 173500 | consumed samples: 20398080 | consumed tokens: 41775267840 | elapsed time per iteration (s): 0.08 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.521657E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.154 | TFLOPs: 11.92 | 7: iteration 79690/ 173500 | consumed samples: 20400640 | consumed tokens: 41780510720 | elapsed time per iteration (s): 0.08 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.547387E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.592 | TFLOPs: 11.98 | 7: iteration 79700/ 173500 | consumed samples: 20403200 | consumed tokens: 41785753600 | elapsed time per iteration (s): 0.09 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.517077E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2797.689 | TFLOPs: 10.41 | 7: iteration 79710/ 173500 | consumed samples: 20405760 | consumed tokens: 41790996480 | elapsed time per iteration (s): 0.08 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.538711E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.463 | TFLOPs: 11.92 | 7: iteration 79720/ 173500 | consumed samples: 20408320 | consumed tokens: 41796239360 | elapsed time per iteration (s): 0.08 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 4.532669E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.059 | TFLOPs: 11.93 | 7: iteration 79730/ 173500 | consumed samples: 20410880 | consumed tokens: 41801482240 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.544324E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.126 | TFLOPs: 11.78 | 7: iteration 79740/ 173500 | consumed samples: 20413440 | consumed tokens: 41806725120 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.538325E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.193 | TFLOPs: 11.82 | 7: iteration 79750/ 173500 | consumed samples: 20416000 | consumed tokens: 41811968000 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.539489E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.476 | TFLOPs: 11.89 | 7: iteration 79760/ 173500 | consumed samples: 20418560 | consumed tokens: 41817210880 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.525527E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.014 | TFLOPs: 11.73 | 7: iteration 79770/ 173500 | consumed samples: 20421120 | consumed tokens: 41822453760 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.540998E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.393 | TFLOPs: 11.86 | 7: iteration 79780/ 173500 | consumed samples: 20423680 | consumed tokens: 41827696640 | elapsed time per iteration (s): 0.08 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 4.521404E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.111 | TFLOPs: 11.89 | 7: iteration 79790/ 173500 | consumed samples: 20426240 | consumed tokens: 41832939520 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.516779E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.517 | TFLOPs: 11.93 | 7: iteration 79800/ 173500 | consumed samples: 20428800 | consumed tokens: 41838182400 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.528388E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.781 | TFLOPs: 11.62 | 7: iteration 79810/ 173500 | consumed samples: 20431360 | consumed tokens: 41843425280 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.543204E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.395 | TFLOPs: 11.91 | 7: iteration 79820/ 173500 | consumed samples: 20433920 | consumed tokens: 41848668160 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.528741E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.270 | TFLOPs: 11.90 | 7: iteration 79830/ 173500 | consumed samples: 20436480 | consumed tokens: 41853911040 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.530945E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.339 | TFLOPs: 11.97 | 7: iteration 79840/ 173500 | consumed samples: 20439040 | consumed tokens: 41859153920 | elapsed time per iteration (s): 0.08 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 4.537264E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.980 | TFLOPs: 11.87 | 7: iteration 79850/ 173500 | consumed samples: 20441600 | consumed tokens: 41864396800 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.523204E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.575 | TFLOPs: 11.90 | 7: iteration 79860/ 173500 | consumed samples: 20444160 | consumed tokens: 41869639680 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.517868E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.801 | TFLOPs: 11.89 | 7: iteration 79870/ 173500 | consumed samples: 20446720 | consumed tokens: 41874882560 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.524792E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.576 | TFLOPs: 11.96 | 7: iteration 79880/ 173500 | consumed samples: 20449280 | consumed tokens: 41880125440 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.526174E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.762 | TFLOPs: 11.96 | 7: iteration 79890/ 173500 | consumed samples: 20451840 | consumed tokens: 41885368320 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.525182E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.661 | TFLOPs: 11.75 | 7: iteration 79900/ 173500 | consumed samples: 20454400 | consumed tokens: 41890611200 | elapsed time per iteration (s): 0.08 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 4.528551E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.111 | TFLOPs: 11.94 | 7: iteration 79910/ 173500 | consumed samples: 20456960 | consumed tokens: 41895854080 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.533515E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.822 | TFLOPs: 11.95 | 7: iteration 79920/ 173500 | consumed samples: 20459520 | consumed tokens: 41901096960 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.534227E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.784 | TFLOPs: 11.94 | 7: iteration 79930/ 173500 | consumed samples: 20462080 | consumed tokens: 41906339840 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.526043E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.739 | TFLOPs: 11.94 | 7: iteration 79940/ 173500 | consumed samples: 20464640 | consumed tokens: 41911582720 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.532651E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.387 | TFLOPs: 11.97 | 7: iteration 79950/ 173500 | consumed samples: 20467200 | consumed tokens: 41916825600 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.530342E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.687 | TFLOPs: 11.92 | 7: iteration 79960/ 173500 | consumed samples: 20469760 | consumed tokens: 41922068480 | elapsed time per iteration (s): 0.08 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 4.519157E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.226 | TFLOPs: 11.52 | 7: iteration 79970/ 173500 | consumed samples: 20472320 | consumed tokens: 41927311360 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.537778E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.728 | TFLOPs: 11.86 | 7: iteration 79980/ 173500 | consumed samples: 20474880 | consumed tokens: 41932554240 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.533863E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.102 | TFLOPs: 11.93 | 7: iteration 79990/ 173500 | consumed samples: 20477440 | consumed tokens: 41937797120 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.531923E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.174 | TFLOPs: 11.87 | 0: [2023-03-17 02:11:32,599] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=0, lr=[0.00012249910047811783, 0.00012249910047811783, 0.00012249910047811783], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 80000/ 173500 | consumed samples: 20480000 | consumed tokens: 41943040000 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.524290E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.957 | TFLOPs: 11.90 | 0: steps: 80000 loss: 4.5103 iter time (s): 0.086 samples/sec: 2960.148 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 80000 | lm loss value: 4.398220E+00 | lm loss PPL: 8.130602E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 80000 to checkpoints_14m91b100m 0: [2023-03-17 02:11:32,656] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step80000 is begin to save! 0: [2023-03-17 02:11:32,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:11:32,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:11:32,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:11:32,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:11:32,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:11:32,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:11:32,694] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:11:32,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:11:32,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:11:32,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:11:32,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:11:32,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:11:32,701] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step80000/mp_rank_00_model_states.pt 0: [2023-03-17 02:11:32,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:11:32,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:11:32,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:11:32,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 0: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:11:32,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2023-03-17 02:11:32,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:11:32,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:11:32,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: successfully saved checkpoint at iteration 80000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.76 7: iteration 80010/ 173500 | consumed samples: 20482560 | consumed tokens: 41948282880 | elapsed time per iteration (s): 0.09 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.542196E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.884 | TFLOPs: 10.21 | 7: iteration 80020/ 173500 | consumed samples: 20485120 | consumed tokens: 41953525760 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.535872E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.384 | TFLOPs: 11.92 | 7: iteration 80030/ 173500 | consumed samples: 20487680 | consumed tokens: 41958768640 | elapsed time per iteration (s): 0.08 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 4.525768E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.964 | TFLOPs: 11.97 | 7: iteration 80040/ 173500 | consumed samples: 20490240 | consumed tokens: 41964011520 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.539140E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.493 | TFLOPs: 11.88 | 7: iteration 80050/ 173500 | consumed samples: 20492800 | consumed tokens: 41969254400 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.537098E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.663 | TFLOPs: 11.87 | 7: iteration 80060/ 173500 | consumed samples: 20495360 | consumed tokens: 41974497280 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.539638E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.139 | TFLOPs: 11.87 | 7: iteration 80070/ 173500 | consumed samples: 20497920 | consumed tokens: 41979740160 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.529121E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.137 | TFLOPs: 11.93 | 7: iteration 80080/ 173500 | consumed samples: 20500480 | consumed tokens: 41984983040 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.535305E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.980 | TFLOPs: 11.92 | 7: iteration 80090/ 173500 | consumed samples: 20503040 | consumed tokens: 41990225920 | elapsed time per iteration (s): 0.08 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 4.530047E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.712 | TFLOPs: 11.92 | 7: iteration 80100/ 173500 | consumed samples: 20505600 | consumed tokens: 41995468800 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.533360E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.378 | TFLOPs: 11.92 | 7: iteration 80110/ 173500 | consumed samples: 20508160 | consumed tokens: 42000711680 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.533121E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.263 | TFLOPs: 11.90 | 7: iteration 80120/ 173500 | consumed samples: 20510720 | consumed tokens: 42005954560 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.531125E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.909 | TFLOPs: 11.94 | 7: iteration 80130/ 173500 | consumed samples: 20513280 | consumed tokens: 42011197440 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.530529E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.244 | TFLOPs: 11.88 | 7: iteration 80140/ 173500 | consumed samples: 20515840 | consumed tokens: 42016440320 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.512888E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.070 | TFLOPs: 11.90 | 7: iteration 80150/ 173500 | consumed samples: 20518400 | consumed tokens: 42021683200 | elapsed time per iteration (s): 0.08 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 4.529141E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.984 | TFLOPs: 11.95 | 7: iteration 80160/ 173500 | consumed samples: 20520960 | consumed tokens: 42026926080 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.529881E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.968 | TFLOPs: 11.91 | 7: iteration 80170/ 173500 | consumed samples: 20523520 | consumed tokens: 42032168960 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.529999E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.174 | TFLOPs: 11.87 | 7: iteration 80180/ 173500 | consumed samples: 20526080 | consumed tokens: 42037411840 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.524822E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.366 | TFLOPs: 11.95 | 7: iteration 80190/ 173500 | consumed samples: 20528640 | consumed tokens: 42042654720 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.525645E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.433 | TFLOPs: 11.95 | 7: iteration 80200/ 173500 | consumed samples: 20531200 | consumed tokens: 42047897600 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.532510E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.365 | TFLOPs: 11.96 | 7: iteration 80210/ 173500 | consumed samples: 20533760 | consumed tokens: 42053140480 | elapsed time per iteration (s): 0.08 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 4.535693E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.917 | TFLOPs: 11.65 | 7: iteration 80220/ 173500 | consumed samples: 20536320 | consumed tokens: 42058383360 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.532247E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.562 | TFLOPs: 11.96 | 7: iteration 80230/ 173500 | consumed samples: 20538880 | consumed tokens: 42063626240 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.539819E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.623 | TFLOPs: 11.94 | 7: iteration 80240/ 173500 | consumed samples: 20541440 | consumed tokens: 42068869120 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.527756E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.755 | TFLOPs: 11.91 | 7: iteration 80250/ 173500 | consumed samples: 20544000 | consumed tokens: 42074112000 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.527229E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.262 | TFLOPs: 11.91 | 7: iteration 80260/ 173500 | consumed samples: 20546560 | consumed tokens: 42079354880 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.533552E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.937 | TFLOPs: 11.95 | 7: iteration 80270/ 173500 | consumed samples: 20549120 | consumed tokens: 42084597760 | elapsed time per iteration (s): 0.08 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 4.508117E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.946 | TFLOPs: 11.87 | 7: iteration 80280/ 173500 | consumed samples: 20551680 | consumed tokens: 42089840640 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.529910E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.942 | TFLOPs: 11.88 | 7: iteration 80290/ 173500 | consumed samples: 20554240 | consumed tokens: 42095083520 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.538644E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.553 | TFLOPs: 11.88 | 7: iteration 80300/ 173500 | consumed samples: 20556800 | consumed tokens: 42100326400 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.519344E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.811 | TFLOPs: 11.97 | 7: iteration 80310/ 173500 | consumed samples: 20559360 | consumed tokens: 42105569280 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.534525E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.588 | TFLOPs: 11.95 | 7: iteration 80320/ 173500 | consumed samples: 20561920 | consumed tokens: 42110812160 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.536340E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.906 | TFLOPs: 11.92 | 7: iteration 80330/ 173500 | consumed samples: 20564480 | consumed tokens: 42116055040 | elapsed time per iteration (s): 0.08 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 4.527761E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.455 | TFLOPs: 11.94 | 7: iteration 80340/ 173500 | consumed samples: 20567040 | consumed tokens: 42121297920 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.530301E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.971 | TFLOPs: 11.97 | 7: iteration 80350/ 173500 | consumed samples: 20569600 | consumed tokens: 42126540800 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.534891E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.907 | TFLOPs: 11.99 | 7: iteration 80360/ 173500 | consumed samples: 20572160 | consumed tokens: 42131783680 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.536209E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.097 | TFLOPs: 11.98 | 7: iteration 80370/ 173500 | consumed samples: 20574720 | consumed tokens: 42137026560 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.525416E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.806 | TFLOPs: 11.96 | 7: iteration 80380/ 173500 | consumed samples: 20577280 | consumed tokens: 42142269440 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.533926E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.850 | TFLOPs: 11.93 | 7: iteration 80390/ 173500 | consumed samples: 20579840 | consumed tokens: 42147512320 | elapsed time per iteration (s): 0.08 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 4.526020E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.469 | TFLOPs: 11.92 | 7: iteration 80400/ 173500 | consumed samples: 20582400 | consumed tokens: 42152755200 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.530484E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.084 | TFLOPs: 11.99 | 7: iteration 80410/ 173500 | consumed samples: 20584960 | consumed tokens: 42157998080 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.526828E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.774 | TFLOPs: 11.98 | 7: iteration 80420/ 173500 | consumed samples: 20587520 | consumed tokens: 42163240960 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.530639E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.416 | TFLOPs: 11.97 | 7: iteration 80430/ 173500 | consumed samples: 20590080 | consumed tokens: 42168483840 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.522323E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.832 | TFLOPs: 11.88 | 7: iteration 80440/ 173500 | consumed samples: 20592640 | consumed tokens: 42173726720 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.531112E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.170 | TFLOPs: 11.97 | 7: iteration 80450/ 173500 | consumed samples: 20595200 | consumed tokens: 42178969600 | elapsed time per iteration (s): 0.08 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 4.523814E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.640 | TFLOPs: 11.92 | 7: iteration 80460/ 173500 | consumed samples: 20597760 | consumed tokens: 42184212480 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.532496E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.023 | TFLOPs: 11.94 | 7: iteration 80470/ 173500 | consumed samples: 20600320 | consumed tokens: 42189455360 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.531116E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.752 | TFLOPs: 11.88 | 7: iteration 80480/ 173500 | consumed samples: 20602880 | consumed tokens: 42194698240 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.531710E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.124 | TFLOPs: 11.92 | 7: iteration 80490/ 173500 | consumed samples: 20605440 | consumed tokens: 42199941120 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.525592E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.517 | TFLOPs: 11.90 | 7: iteration 80500/ 173500 | consumed samples: 20608000 | consumed tokens: 42205184000 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.541047E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.101 | TFLOPs: 11.97 | 7: iteration 80510/ 173500 | consumed samples: 20610560 | consumed tokens: 42210426880 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.530274E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.843 | TFLOPs: 11.94 | 7: iteration 80520/ 173500 | consumed samples: 20613120 | consumed tokens: 42215669760 | elapsed time per iteration (s): 0.08 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 4.533809E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.496 | TFLOPs: 11.91 | 7: iteration 80530/ 173500 | consumed samples: 20615680 | consumed tokens: 42220912640 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.532152E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.552 | TFLOPs: 11.90 | 7: iteration 80540/ 173500 | consumed samples: 20618240 | consumed tokens: 42226155520 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.531664E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.973 | TFLOPs: 11.93 | 7: iteration 80550/ 173500 | consumed samples: 20620800 | consumed tokens: 42231398400 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.531654E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.336 | TFLOPs: 11.67 | 7: iteration 80560/ 173500 | consumed samples: 20623360 | consumed tokens: 42236641280 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.522229E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.625 | TFLOPs: 11.92 | 7: iteration 80570/ 173500 | consumed samples: 20625920 | consumed tokens: 42241884160 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.538157E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.836 | TFLOPs: 11.93 | 7: iteration 80580/ 173500 | consumed samples: 20628480 | consumed tokens: 42247127040 | elapsed time per iteration (s): 0.08 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 4.532695E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.236 | TFLOPs: 11.97 | 7: iteration 80590/ 173500 | consumed samples: 20631040 | consumed tokens: 42252369920 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.529129E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.381 | TFLOPs: 11.93 | 7: iteration 80600/ 173500 | consumed samples: 20633600 | consumed tokens: 42257612800 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.516172E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.967 | TFLOPs: 11.96 | 7: iteration 80610/ 173500 | consumed samples: 20636160 | consumed tokens: 42262855680 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.536025E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.418 | TFLOPs: 11.83 | 7: iteration 80620/ 173500 | consumed samples: 20638720 | consumed tokens: 42268098560 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.528911E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.692 | TFLOPs: 11.98 | 7: iteration 80630/ 173500 | consumed samples: 20641280 | consumed tokens: 42273341440 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.516019E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.637 | TFLOPs: 11.96 | 7: iteration 80640/ 173500 | consumed samples: 20643840 | consumed tokens: 42278584320 | elapsed time per iteration (s): 0.08 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.521423E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.273 | TFLOPs: 11.98 | 7: iteration 80650/ 173500 | consumed samples: 20646400 | consumed tokens: 42283827200 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.519415E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.497 | TFLOPs: 11.95 | 7: iteration 80660/ 173500 | consumed samples: 20648960 | consumed tokens: 42289070080 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.541216E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.668 | TFLOPs: 11.96 | 7: iteration 80670/ 173500 | consumed samples: 20651520 | consumed tokens: 42294312960 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.531935E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.230 | TFLOPs: 11.67 | 7: iteration 80680/ 173500 | consumed samples: 20654080 | consumed tokens: 42299555840 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.531140E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.473 | TFLOPs: 11.98 | 7: iteration 80690/ 173500 | consumed samples: 20656640 | consumed tokens: 42304798720 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.528615E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.917 | TFLOPs: 11.88 | 7: iteration 80700/ 173500 | consumed samples: 20659200 | consumed tokens: 42310041600 | elapsed time per iteration (s): 0.08 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 4.528118E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.846 | TFLOPs: 11.96 | 7: iteration 80710/ 173500 | consumed samples: 20661760 | consumed tokens: 42315284480 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.542902E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.497 | TFLOPs: 11.94 | 7: iteration 80720/ 173500 | consumed samples: 20664320 | consumed tokens: 42320527360 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.533344E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.468 | TFLOPs: 11.98 | 7: iteration 80730/ 173500 | consumed samples: 20666880 | consumed tokens: 42325770240 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.523703E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.723 | TFLOPs: 11.95 | 7: iteration 80740/ 173500 | consumed samples: 20669440 | consumed tokens: 42331013120 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.536765E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.974 | TFLOPs: 11.98 | 7: iteration 80750/ 173500 | consumed samples: 20672000 | consumed tokens: 42336256000 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.527660E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.068 | TFLOPs: 11.96 | 7: iteration 80760/ 173500 | consumed samples: 20674560 | consumed tokens: 42341498880 | elapsed time per iteration (s): 0.08 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 4.526829E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.979 | TFLOPs: 11.95 | 7: iteration 80770/ 173500 | consumed samples: 20677120 | consumed tokens: 42346741760 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.530171E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.039 | TFLOPs: 11.87 | 7: iteration 80780/ 173500 | consumed samples: 20679680 | consumed tokens: 42351984640 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.525708E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.788 | TFLOPs: 11.97 | 7: iteration 80790/ 173500 | consumed samples: 20682240 | consumed tokens: 42357227520 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.519392E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.016 | TFLOPs: 12.00 | 7: iteration 80800/ 173500 | consumed samples: 20684800 | consumed tokens: 42362470400 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.528102E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.016 | TFLOPs: 11.67 | 7: iteration 80810/ 173500 | consumed samples: 20687360 | consumed tokens: 42367713280 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.533331E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.401 | TFLOPs: 11.98 | 7: iteration 80820/ 173500 | consumed samples: 20689920 | consumed tokens: 42372956160 | elapsed time per iteration (s): 0.08 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 4.517727E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.504 | TFLOPs: 11.93 | 7: iteration 80830/ 173500 | consumed samples: 20692480 | consumed tokens: 42378199040 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.532106E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.365 | TFLOPs: 11.89 | 7: iteration 80840/ 173500 | consumed samples: 20695040 | consumed tokens: 42383441920 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.524859E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.339 | TFLOPs: 11.96 | 7: iteration 80850/ 173500 | consumed samples: 20697600 | consumed tokens: 42388684800 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.531356E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.137 | TFLOPs: 11.77 | 7: iteration 80860/ 173500 | consumed samples: 20700160 | consumed tokens: 42393927680 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.532875E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.941 | TFLOPs: 11.99 | 7: iteration 80870/ 173500 | consumed samples: 20702720 | consumed tokens: 42399170560 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.525420E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.249 | TFLOPs: 11.81 | 7: iteration 80880/ 173500 | consumed samples: 20705280 | consumed tokens: 42404413440 | elapsed time per iteration (s): 0.08 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 4.519695E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.048 | TFLOPs: 11.93 | 7: iteration 80890/ 173500 | consumed samples: 20707840 | consumed tokens: 42409656320 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.517949E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.847 | TFLOPs: 11.95 | 7: iteration 80900/ 173500 | consumed samples: 20710400 | consumed tokens: 42414899200 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.530354E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.060 | TFLOPs: 11.87 | 7: iteration 80910/ 173500 | consumed samples: 20712960 | consumed tokens: 42420142080 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.523148E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.625 | TFLOPs: 11.81 | 7: iteration 80920/ 173500 | consumed samples: 20715520 | consumed tokens: 42425384960 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.528323E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.896 | TFLOPs: 11.93 | 7: iteration 80930/ 173500 | consumed samples: 20718080 | consumed tokens: 42430627840 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.533063E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.476 | TFLOPs: 11.92 | 7: iteration 80940/ 173500 | consumed samples: 20720640 | consumed tokens: 42435870720 | elapsed time per iteration (s): 0.08 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 4.549738E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.769 | TFLOPs: 11.96 | 7: iteration 80950/ 173500 | consumed samples: 20723200 | consumed tokens: 42441113600 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.525406E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.735 | TFLOPs: 11.98 | 7: iteration 80960/ 173500 | consumed samples: 20725760 | consumed tokens: 42446356480 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.528461E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.061 | TFLOPs: 11.95 | 7: iteration 80970/ 173500 | consumed samples: 20728320 | consumed tokens: 42451599360 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.521397E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.702 | TFLOPs: 11.93 | 7: iteration 80980/ 173500 | consumed samples: 20730880 | consumed tokens: 42456842240 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.530064E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.726 | TFLOPs: 11.93 | 7: iteration 80990/ 173500 | consumed samples: 20733440 | consumed tokens: 42462085120 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.529768E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.608 | TFLOPs: 11.94 | 7: iteration 81000/ 173500 | consumed samples: 20736000 | consumed tokens: 42467328000 | elapsed time per iteration (s): 0.08 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.535696E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.257 | TFLOPs: 11.43 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 81000 | lm loss value: 4.395378E+00 | lm loss PPL: 8.107524E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 81000 to checkpoints_14m91b100m 0: [2023-03-17 02:12:52,697] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step81000 is begin to save! 0: [2023-03-17 02:12:52,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:12:52,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:12:52,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:12:52,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:12:52,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:12:52,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:12:52,734] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:12:52,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:12:52,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:12:52,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:12:52,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:12:52,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:12:52,741] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step81000/mp_rank_00_model_states.pt 0: [2023-03-17 02:12:52,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:12:52,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:12:52,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:12:52,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:12:52,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:12:52,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 02:12:52,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2023-03-17 02:12:52,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: successfully saved checkpoint at iteration 81000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.46 7: iteration 81010/ 173500 | consumed samples: 20738560 | consumed tokens: 42472570880 | elapsed time per iteration (s): 0.09 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 4.535992E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.367 | TFLOPs: 10.42 | 7: iteration 81020/ 173500 | consumed samples: 20741120 | consumed tokens: 42477813760 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.522631E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.409 | TFLOPs: 11.95 | 7: iteration 81030/ 173500 | consumed samples: 20743680 | consumed tokens: 42483056640 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.521051E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.834 | TFLOPs: 11.95 | 7: iteration 81040/ 173500 | consumed samples: 20746240 | consumed tokens: 42488299520 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.545421E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.930 | TFLOPs: 11.89 | 7: iteration 81050/ 173500 | consumed samples: 20748800 | consumed tokens: 42493542400 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.521796E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.724 | TFLOPs: 11.91 | 7: iteration 81060/ 173500 | consumed samples: 20751360 | consumed tokens: 42498785280 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.536501E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.820 | TFLOPs: 11.98 | 7: iteration 81070/ 173500 | consumed samples: 20753920 | consumed tokens: 42504028160 | elapsed time per iteration (s): 0.08 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 4.522842E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.934 | TFLOPs: 11.97 | 7: iteration 81080/ 173500 | consumed samples: 20756480 | consumed tokens: 42509271040 | elapsed time per iteration (s): 0.08 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.541495E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.807 | TFLOPs: 11.96 | 7: iteration 81090/ 173500 | consumed samples: 20759040 | consumed tokens: 42514513920 | elapsed time per iteration (s): 0.09 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.519191E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.138 | TFLOPs: 10.79 | 7: iteration 81100/ 173500 | consumed samples: 20761600 | consumed tokens: 42519756800 | elapsed time per iteration (s): 0.08 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.514684E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.365 | TFLOPs: 11.89 | 7: iteration 81110/ 173500 | consumed samples: 20764160 | consumed tokens: 42524999680 | elapsed time per iteration (s): 0.08 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.526371E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.917 | TFLOPs: 11.89 | 7: iteration 81120/ 173500 | consumed samples: 20766720 | consumed tokens: 42530242560 | elapsed time per iteration (s): 0.08 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.523964E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.727 | TFLOPs: 11.90 | 7: iteration 81130/ 173500 | consumed samples: 20769280 | consumed tokens: 42535485440 | elapsed time per iteration (s): 0.08 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 4.511734E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.374 | TFLOPs: 11.90 | 7: iteration 81140/ 173500 | consumed samples: 20771840 | consumed tokens: 42540728320 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.541155E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.609 | TFLOPs: 11.95 | 7: iteration 81150/ 173500 | consumed samples: 20774400 | consumed tokens: 42545971200 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.527899E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.584 | TFLOPs: 11.95 | 7: iteration 81160/ 173500 | consumed samples: 20776960 | consumed tokens: 42551214080 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.528661E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.560 | TFLOPs: 11.90 | 7: iteration 81170/ 173500 | consumed samples: 20779520 | consumed tokens: 42556456960 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.522945E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.470 | TFLOPs: 11.89 | 7: iteration 81180/ 173500 | consumed samples: 20782080 | consumed tokens: 42561699840 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.529593E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.337 | TFLOPs: 11.92 | 7: iteration 81190/ 173500 | consumed samples: 20784640 | consumed tokens: 42566942720 | elapsed time per iteration (s): 0.08 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 4.534236E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.112 | TFLOPs: 11.90 | 7: iteration 81200/ 173500 | consumed samples: 20787200 | consumed tokens: 42572185600 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.526418E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.813 | TFLOPs: 11.93 | 7: iteration 81210/ 173500 | consumed samples: 20789760 | consumed tokens: 42577428480 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.528022E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.104 | TFLOPs: 11.94 | 7: iteration 81220/ 173500 | consumed samples: 20792320 | consumed tokens: 42582671360 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.545844E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.582 | TFLOPs: 11.93 | 7: iteration 81230/ 173500 | consumed samples: 20794880 | consumed tokens: 42587914240 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.545545E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.458 | TFLOPs: 11.93 | 7: iteration 81240/ 173500 | consumed samples: 20797440 | consumed tokens: 42593157120 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.530932E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.082 | TFLOPs: 11.88 | 7: iteration 81250/ 173500 | consumed samples: 20800000 | consumed tokens: 42598400000 | elapsed time per iteration (s): 0.08 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 4.542008E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.980 | TFLOPs: 11.93 | 7: iteration 81260/ 173500 | consumed samples: 20802560 | consumed tokens: 42603642880 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.524991E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.527 | TFLOPs: 11.93 | 7: iteration 81270/ 173500 | consumed samples: 20805120 | consumed tokens: 42608885760 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.530896E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.394 | TFLOPs: 11.78 | 7: iteration 81280/ 173500 | consumed samples: 20807680 | consumed tokens: 42614128640 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.529054E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.313 | TFLOPs: 11.92 | 7: iteration 81290/ 173500 | consumed samples: 20810240 | consumed tokens: 42619371520 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.534046E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.100 | TFLOPs: 11.94 | 7: iteration 81300/ 173500 | consumed samples: 20812800 | consumed tokens: 42624614400 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.529542E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.142 | TFLOPs: 11.90 | 7: iteration 81310/ 173500 | consumed samples: 20815360 | consumed tokens: 42629857280 | elapsed time per iteration (s): 0.08 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 4.535753E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.455 | TFLOPs: 11.66 | 7: iteration 81320/ 173500 | consumed samples: 20817920 | consumed tokens: 42635100160 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.536310E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.460 | TFLOPs: 11.89 | 7: iteration 81330/ 173500 | consumed samples: 20820480 | consumed tokens: 42640343040 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.523169E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.728 | TFLOPs: 11.87 | 7: iteration 81340/ 173500 | consumed samples: 20823040 | consumed tokens: 42645585920 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.534066E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.974 | TFLOPs: 11.89 | 7: iteration 81350/ 173500 | consumed samples: 20825600 | consumed tokens: 42650828800 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.503932E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.369 | TFLOPs: 11.92 | 7: iteration 81360/ 173500 | consumed samples: 20828160 | consumed tokens: 42656071680 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.531823E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.834 | TFLOPs: 11.91 | 7: iteration 81370/ 173500 | consumed samples: 20830720 | consumed tokens: 42661314560 | elapsed time per iteration (s): 0.08 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 4.525143E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.410 | TFLOPs: 11.90 | 7: iteration 81380/ 173500 | consumed samples: 20833280 | consumed tokens: 42666557440 | elapsed time per iteration (s): 0.08 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.532890E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.592 | TFLOPs: 11.90 | 7: iteration 81390/ 173500 | consumed samples: 20835840 | consumed tokens: 42671800320 | elapsed time per iteration (s): 0.09 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.536192E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.495 | TFLOPs: 10.82 | 7: iteration 81400/ 173500 | consumed samples: 20838400 | consumed tokens: 42677043200 | elapsed time per iteration (s): 0.09 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.515102E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.762 | TFLOPs: 10.74 | 7: iteration 81410/ 173500 | consumed samples: 20840960 | consumed tokens: 42682286080 | elapsed time per iteration (s): 0.08 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.524475E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.095 | TFLOPs: 11.51 | 7: iteration 81420/ 173500 | consumed samples: 20843520 | consumed tokens: 42687528960 | elapsed time per iteration (s): 0.08 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.524989E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.129 | TFLOPs: 11.85 | 7: iteration 81430/ 173500 | consumed samples: 20846080 | consumed tokens: 42692771840 | elapsed time per iteration (s): 0.08 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 4.526835E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.845 | TFLOPs: 11.86 | 7: iteration 81440/ 173500 | consumed samples: 20848640 | consumed tokens: 42698014720 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.523513E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.580 | TFLOPs: 11.87 | 7: iteration 81450/ 173500 | consumed samples: 20851200 | consumed tokens: 42703257600 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.541299E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.192 | TFLOPs: 11.88 | 7: iteration 81460/ 173500 | consumed samples: 20853760 | consumed tokens: 42708500480 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.526810E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.194 | TFLOPs: 11.78 | 7: iteration 81470/ 173500 | consumed samples: 20856320 | consumed tokens: 42713743360 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.524716E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.218 | TFLOPs: 11.89 | 7: iteration 81480/ 173500 | consumed samples: 20858880 | consumed tokens: 42718986240 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.520025E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.255 | TFLOPs: 11.49 | 7: iteration 81490/ 173500 | consumed samples: 20861440 | consumed tokens: 42724229120 | elapsed time per iteration (s): 0.08 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 4.537088E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.229 | TFLOPs: 11.93 | 7: iteration 81500/ 173500 | consumed samples: 20864000 | consumed tokens: 42729472000 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.526630E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.651 | TFLOPs: 11.87 | 7: iteration 81510/ 173500 | consumed samples: 20866560 | consumed tokens: 42734714880 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.535078E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.132 | TFLOPs: 11.87 | 7: iteration 81520/ 173500 | consumed samples: 20869120 | consumed tokens: 42739957760 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.527750E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.992 | TFLOPs: 11.89 | 7: iteration 81530/ 173500 | consumed samples: 20871680 | consumed tokens: 42745200640 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.522814E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.436 | TFLOPs: 11.89 | 7: iteration 81540/ 173500 | consumed samples: 20874240 | consumed tokens: 42750443520 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.522991E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.104 | TFLOPs: 11.30 | 7: iteration 81550/ 173500 | consumed samples: 20876800 | consumed tokens: 42755686400 | elapsed time per iteration (s): 0.09 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.545580E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.460 | TFLOPs: 10.57 | 7: iteration 81560/ 173500 | consumed samples: 20879360 | consumed tokens: 42760929280 | elapsed time per iteration (s): 0.08 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 4.526652E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.033 | TFLOPs: 11.83 | 7: iteration 81570/ 173500 | consumed samples: 20881920 | consumed tokens: 42766172160 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.523350E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.990 | TFLOPs: 11.86 | 7: iteration 81580/ 173500 | consumed samples: 20884480 | consumed tokens: 42771415040 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.513684E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.364 | TFLOPs: 11.84 | 7: iteration 81590/ 173500 | consumed samples: 20887040 | consumed tokens: 42776657920 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.534675E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.408 | TFLOPs: 11.55 | 7: iteration 81600/ 173500 | consumed samples: 20889600 | consumed tokens: 42781900800 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.524928E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.555 | TFLOPs: 11.85 | 7: iteration 81610/ 173500 | consumed samples: 20892160 | consumed tokens: 42787143680 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.529583E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.910 | TFLOPs: 11.88 | 7: iteration 81620/ 173500 | consumed samples: 20894720 | consumed tokens: 42792386560 | elapsed time per iteration (s): 0.08 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 4.524817E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.602 | TFLOPs: 11.83 | 7: iteration 81630/ 173500 | consumed samples: 20897280 | consumed tokens: 42797629440 | elapsed time per iteration (s): 0.08 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.530391E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.917 | TFLOPs: 11.73 | 7: iteration 81640/ 173500 | consumed samples: 20899840 | consumed tokens: 42802872320 | elapsed time per iteration (s): 0.08 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.521255E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.526 | TFLOPs: 11.82 | 7: iteration 81650/ 173500 | consumed samples: 20902400 | consumed tokens: 42808115200 | elapsed time per iteration (s): 0.08 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.523469E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.043 | TFLOPs: 11.82 | 7: iteration 81660/ 173500 | consumed samples: 20904960 | consumed tokens: 42813358080 | elapsed time per iteration (s): 0.22 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.536909E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1142.340 | TFLOPs: 4.25 | 7: iteration 81670/ 173500 | consumed samples: 20907520 | consumed tokens: 42818600960 | elapsed time per iteration (s): 0.08 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.521494E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.144 | TFLOPs: 11.69 | 7: iteration 81680/ 173500 | consumed samples: 20910080 | consumed tokens: 42823843840 | elapsed time per iteration (s): 0.08 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.537806E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.710 | TFLOPs: 11.80 | 7: iteration 81690/ 173500 | consumed samples: 20912640 | consumed tokens: 42829086720 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.534172E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.931 | TFLOPs: 11.80 | 7: iteration 81700/ 173500 | consumed samples: 20915200 | consumed tokens: 42834329600 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.527769E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.180 | TFLOPs: 11.81 | 7: iteration 81710/ 173500 | consumed samples: 20917760 | consumed tokens: 42839572480 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.513841E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.446 | TFLOPs: 11.80 | 7: iteration 81720/ 173500 | consumed samples: 20920320 | consumed tokens: 42844815360 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.513574E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.220 | TFLOPs: 11.83 | 7: iteration 81730/ 173500 | consumed samples: 20922880 | consumed tokens: 42850058240 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.527312E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.246 | TFLOPs: 11.83 | 7: iteration 81740/ 173500 | consumed samples: 20925440 | consumed tokens: 42855301120 | elapsed time per iteration (s): 0.08 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 4.526941E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.873 | TFLOPs: 11.86 | 7: iteration 81750/ 173500 | consumed samples: 20928000 | consumed tokens: 42860544000 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.518442E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.979 | TFLOPs: 11.85 | 7: iteration 81760/ 173500 | consumed samples: 20930560 | consumed tokens: 42865786880 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.523361E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.064 | TFLOPs: 11.86 | 7: iteration 81770/ 173500 | consumed samples: 20933120 | consumed tokens: 42871029760 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.522763E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.878 | TFLOPs: 11.84 | 7: iteration 81780/ 173500 | consumed samples: 20935680 | consumed tokens: 42876272640 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.529069E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.996 | TFLOPs: 11.83 | 7: iteration 81790/ 173500 | consumed samples: 20938240 | consumed tokens: 42881515520 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.523652E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.729 | TFLOPs: 11.78 | 7: iteration 81800/ 173500 | consumed samples: 20940800 | consumed tokens: 42886758400 | elapsed time per iteration (s): 0.08 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 4.533846E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.898 | TFLOPs: 11.85 | 7: iteration 81810/ 173500 | consumed samples: 20943360 | consumed tokens: 42892001280 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.532292E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.025 | TFLOPs: 11.80 | 7: iteration 81820/ 173500 | consumed samples: 20945920 | consumed tokens: 42897244160 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.521190E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.251 | TFLOPs: 11.84 | 7: iteration 81830/ 173500 | consumed samples: 20948480 | consumed tokens: 42902487040 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.526971E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.655 | TFLOPs: 11.85 | 7: iteration 81840/ 173500 | consumed samples: 20951040 | consumed tokens: 42907729920 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.539949E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.306 | TFLOPs: 11.81 | 7: iteration 81850/ 173500 | consumed samples: 20953600 | consumed tokens: 42912972800 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.530831E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.581 | TFLOPs: 11.75 | 7: iteration 81860/ 173500 | consumed samples: 20956160 | consumed tokens: 42918215680 | elapsed time per iteration (s): 0.08 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 4.515630E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.224 | TFLOPs: 11.80 | 7: iteration 81870/ 173500 | consumed samples: 20958720 | consumed tokens: 42923458560 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.522391E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.979 | TFLOPs: 11.79 | 7: iteration 81880/ 173500 | consumed samples: 20961280 | consumed tokens: 42928701440 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.529351E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.445 | TFLOPs: 11.77 | 7: iteration 81890/ 173500 | consumed samples: 20963840 | consumed tokens: 42933944320 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.541657E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.601 | TFLOPs: 11.78 | 7: iteration 81900/ 173500 | consumed samples: 20966400 | consumed tokens: 42939187200 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.521655E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.049 | TFLOPs: 11.78 | 7: iteration 81910/ 173500 | consumed samples: 20968960 | consumed tokens: 42944430080 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.517993E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.128 | TFLOPs: 11.77 | 7: iteration 81920/ 173500 | consumed samples: 20971520 | consumed tokens: 42949672960 | elapsed time per iteration (s): 0.08 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 4.534531E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.891 | TFLOPs: 11.66 | 7: iteration 81930/ 173500 | consumed samples: 20974080 | consumed tokens: 42954915840 | elapsed time per iteration (s): 0.08 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.545500E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.038 | TFLOPs: 11.68 | 7: iteration 81940/ 173500 | consumed samples: 20976640 | consumed tokens: 42960158720 | elapsed time per iteration (s): 0.08 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.509146E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.952 | TFLOPs: 11.79 | 7: iteration 81950/ 173500 | consumed samples: 20979200 | consumed tokens: 42965401600 | elapsed time per iteration (s): 0.08 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.528011E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.237 | TFLOPs: 11.42 | 7: iteration 81960/ 173500 | consumed samples: 20981760 | consumed tokens: 42970644480 | elapsed time per iteration (s): 0.08 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.531936E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.332 | TFLOPs: 11.75 | 7: iteration 81970/ 173500 | consumed samples: 20984320 | consumed tokens: 42975887360 | elapsed time per iteration (s): 0.09 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.538108E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.717 | TFLOPs: 11.07 | 7: iteration 81980/ 173500 | consumed samples: 20986880 | consumed tokens: 42981130240 | elapsed time per iteration (s): 0.09 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 4.534274E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.521 | TFLOPs: 10.57 | 7: iteration 81990/ 173500 | consumed samples: 20989440 | consumed tokens: 42986373120 | elapsed time per iteration (s): 0.09 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.517267E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.156 | TFLOPs: 10.30 | 0: [2023-03-17 02:14:15,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=0, lr=[0.00011923116875818059, 0.00011923116875818059, 0.00011923116875818059], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 82000/ 173500 | consumed samples: 20992000 | consumed tokens: 42991616000 | elapsed time per iteration (s): 0.10 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.537630E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.062 | TFLOPs: 9.34 | 0: steps: 82000 loss: 4.5482 iter time (s): 0.081 samples/sec: 3171.203 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 82000 | lm loss value: 4.398156E+00 | lm loss PPL: 8.130083E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 82000 to checkpoints_14m91b100m 0: [2023-03-17 02:14:15,558] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step82000 is begin to save! 0: [2023-03-17 02:14:15,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:14:15,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:14:15,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:14:15,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:14:15,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:14:15,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:14:15,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:14:15,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:14:15,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:14:15,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:14:15,601] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:14:15,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:14:15,602] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step82000/mp_rank_00_model_states.pt 0: [2023-03-17 02:14:15,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:14:15,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:14:15,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:14:15,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:14:15,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:14:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:14:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: successfully saved checkpoint at iteration 82000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.79 7: iteration 82010/ 173500 | consumed samples: 20994560 | consumed tokens: 42996858880 | elapsed time per iteration (s): 0.12 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.529242E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2177.623 | TFLOPs: 8.10 | 7: iteration 82020/ 173500 | consumed samples: 20997120 | consumed tokens: 43002101760 | elapsed time per iteration (s): 0.09 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.519215E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.336 | TFLOPs: 11.08 | 7: iteration 82030/ 173500 | consumed samples: 20999680 | consumed tokens: 43007344640 | elapsed time per iteration (s): 0.11 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.530053E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.734 | TFLOPs: 8.35 | 7: iteration 82040/ 173500 | consumed samples: 21002240 | consumed tokens: 43012587520 | elapsed time per iteration (s): 0.11 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 4.529026E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.371 | TFLOPs: 8.72 | 7: iteration 82050/ 173500 | consumed samples: 21004800 | consumed tokens: 43017830400 | elapsed time per iteration (s): 0.09 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.527590E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2819.823 | TFLOPs: 10.49 | 7: iteration 82060/ 173500 | consumed samples: 21007360 | consumed tokens: 43023073280 | elapsed time per iteration (s): 0.08 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.535101E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.419 | TFLOPs: 11.81 | 7: iteration 82070/ 173500 | consumed samples: 21009920 | consumed tokens: 43028316160 | elapsed time per iteration (s): 0.10 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.527442E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2649.645 | TFLOPs: 9.86 | 7: iteration 82080/ 173500 | consumed samples: 21012480 | consumed tokens: 43033559040 | elapsed time per iteration (s): 0.11 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.528750E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2342.428 | TFLOPs: 8.71 | 7: iteration 82090/ 173500 | consumed samples: 21015040 | consumed tokens: 43038801920 | elapsed time per iteration (s): 0.09 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.521358E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.361 | TFLOPs: 10.43 | 7: iteration 82100/ 173500 | consumed samples: 21017600 | consumed tokens: 43044044800 | elapsed time per iteration (s): 0.08 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.532589E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.703 | TFLOPs: 11.79 | 7: iteration 82110/ 173500 | consumed samples: 21020160 | consumed tokens: 43049287680 | elapsed time per iteration (s): 0.08 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 4.531038E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.341 | TFLOPs: 11.81 | 7: iteration 82120/ 173500 | consumed samples: 21022720 | consumed tokens: 43054530560 | elapsed time per iteration (s): 0.09 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.521628E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.191 | TFLOPs: 10.67 | 7: iteration 82130/ 173500 | consumed samples: 21025280 | consumed tokens: 43059773440 | elapsed time per iteration (s): 0.12 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.519971E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2217.609 | TFLOPs: 8.25 | 7: iteration 82140/ 173500 | consumed samples: 21027840 | consumed tokens: 43065016320 | elapsed time per iteration (s): 0.11 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.538581E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2423.156 | TFLOPs: 9.01 | 7: iteration 82150/ 173500 | consumed samples: 21030400 | consumed tokens: 43070259200 | elapsed time per iteration (s): 0.10 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.526718E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.980 | TFLOPs: 9.45 | 7: iteration 82160/ 173500 | consumed samples: 21032960 | consumed tokens: 43075502080 | elapsed time per iteration (s): 0.08 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.512636E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.304 | TFLOPs: 11.85 | 7: iteration 82170/ 173500 | consumed samples: 21035520 | consumed tokens: 43080744960 | elapsed time per iteration (s): 0.08 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 4.530406E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.080 | TFLOPs: 11.98 | 7: iteration 82180/ 173500 | consumed samples: 21038080 | consumed tokens: 43085987840 | elapsed time per iteration (s): 0.08 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.533001E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.316 | TFLOPs: 11.38 | 7: iteration 82190/ 173500 | consumed samples: 21040640 | consumed tokens: 43091230720 | elapsed time per iteration (s): 0.10 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.525164E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2569.481 | TFLOPs: 9.56 | 7: iteration 82200/ 173500 | consumed samples: 21043200 | consumed tokens: 43096473600 | elapsed time per iteration (s): 0.10 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.525961E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.384 | TFLOPs: 9.94 | 7: iteration 82210/ 173500 | consumed samples: 21045760 | consumed tokens: 43101716480 | elapsed time per iteration (s): 0.11 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.527591E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2391.696 | TFLOPs: 8.90 | 7: iteration 82220/ 173500 | consumed samples: 21048320 | consumed tokens: 43106959360 | elapsed time per iteration (s): 0.11 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.535145E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2289.061 | TFLOPs: 8.51 | 7: iteration 82230/ 173500 | consumed samples: 21050880 | consumed tokens: 43112202240 | elapsed time per iteration (s): 0.10 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 4.527122E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2503.059 | TFLOPs: 9.31 | 7: iteration 82240/ 173500 | consumed samples: 21053440 | consumed tokens: 43117445120 | elapsed time per iteration (s): 0.11 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.519142E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2350.657 | TFLOPs: 8.74 | 7: iteration 82250/ 173500 | consumed samples: 21056000 | consumed tokens: 43122688000 | elapsed time per iteration (s): 0.08 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.530434E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.093 | TFLOPs: 11.97 | 7: iteration 82260/ 173500 | consumed samples: 21058560 | consumed tokens: 43127930880 | elapsed time per iteration (s): 0.08 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.522111E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.318 | TFLOPs: 11.94 | 7: iteration 82270/ 173500 | consumed samples: 21061120 | consumed tokens: 43133173760 | elapsed time per iteration (s): 0.08 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.532236E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.190 | TFLOPs: 11.93 | 7: iteration 82280/ 173500 | consumed samples: 21063680 | consumed tokens: 43138416640 | elapsed time per iteration (s): 0.10 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.537112E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.421 | TFLOPs: 9.46 | 7: iteration 82290/ 173500 | consumed samples: 21066240 | consumed tokens: 43143659520 | elapsed time per iteration (s): 0.12 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 4.539441E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2049.631 | TFLOPs: 7.62 | 7: iteration 82300/ 173500 | consumed samples: 21068800 | consumed tokens: 43148902400 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.539280E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.647 | TFLOPs: 12.01 | 7: iteration 82310/ 173500 | consumed samples: 21071360 | consumed tokens: 43154145280 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.528244E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.839 | TFLOPs: 11.94 | 7: iteration 82320/ 173500 | consumed samples: 21073920 | consumed tokens: 43159388160 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.536483E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.294 | TFLOPs: 11.75 | 7: iteration 82330/ 173500 | consumed samples: 21076480 | consumed tokens: 43164631040 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.530925E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.415 | TFLOPs: 11.99 | 7: iteration 82340/ 173500 | consumed samples: 21079040 | consumed tokens: 43169873920 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.535111E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.512 | TFLOPs: 11.89 | 7: iteration 82350/ 173500 | consumed samples: 21081600 | consumed tokens: 43175116800 | elapsed time per iteration (s): 0.08 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 4.530004E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.820 | TFLOPs: 12.01 | 7: iteration 82360/ 173500 | consumed samples: 21084160 | consumed tokens: 43180359680 | elapsed time per iteration (s): 0.08 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.527923E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.409 | TFLOPs: 11.58 | 7: iteration 82370/ 173500 | consumed samples: 21086720 | consumed tokens: 43185602560 | elapsed time per iteration (s): 0.08 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.518996E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.583 | TFLOPs: 11.96 | 7: iteration 82380/ 173500 | consumed samples: 21089280 | consumed tokens: 43190845440 | elapsed time per iteration (s): 0.10 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.530521E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.483 | TFLOPs: 9.15 | 7: iteration 82390/ 173500 | consumed samples: 21091840 | consumed tokens: 43196088320 | elapsed time per iteration (s): 0.12 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.542587E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2064.512 | TFLOPs: 7.68 | 7: iteration 82400/ 173500 | consumed samples: 21094400 | consumed tokens: 43201331200 | elapsed time per iteration (s): 0.13 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.523147E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.676 | TFLOPs: 7.61 | 7: iteration 82410/ 173500 | consumed samples: 21096960 | consumed tokens: 43206574080 | elapsed time per iteration (s): 0.12 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 4.535941E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.736 | TFLOPs: 8.07 | 7: iteration 82420/ 173500 | consumed samples: 21099520 | consumed tokens: 43211816960 | elapsed time per iteration (s): 0.12 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.529570E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.471 | TFLOPs: 8.04 | 7: iteration 82430/ 173500 | consumed samples: 21102080 | consumed tokens: 43217059840 | elapsed time per iteration (s): 0.12 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.534822E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2122.165 | TFLOPs: 7.89 | 7: iteration 82440/ 173500 | consumed samples: 21104640 | consumed tokens: 43222302720 | elapsed time per iteration (s): 0.12 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.530511E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2203.238 | TFLOPs: 8.20 | 7: iteration 82450/ 173500 | consumed samples: 21107200 | consumed tokens: 43227545600 | elapsed time per iteration (s): 0.10 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.519967E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.662 | TFLOPs: 9.74 | 7: iteration 82460/ 173500 | consumed samples: 21109760 | consumed tokens: 43232788480 | elapsed time per iteration (s): 0.08 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.520811E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.064 | TFLOPs: 11.90 | 7: iteration 82470/ 173500 | consumed samples: 21112320 | consumed tokens: 43238031360 | elapsed time per iteration (s): 0.08 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 4.532671E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.653 | TFLOPs: 11.99 | 7: iteration 82480/ 173500 | consumed samples: 21114880 | consumed tokens: 43243274240 | elapsed time per iteration (s): 0.08 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.523533E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.466 | TFLOPs: 12.00 | 7: iteration 82490/ 173500 | consumed samples: 21117440 | consumed tokens: 43248517120 | elapsed time per iteration (s): 0.08 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.528316E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.575 | TFLOPs: 11.98 | 7: iteration 82500/ 173500 | consumed samples: 21120000 | consumed tokens: 43253760000 | elapsed time per iteration (s): 0.08 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.531013E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.370 | TFLOPs: 12.00 | 7: iteration 82510/ 173500 | consumed samples: 21122560 | consumed tokens: 43259002880 | elapsed time per iteration (s): 0.08 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.528399E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.957 | TFLOPs: 12.02 | 7: iteration 82520/ 173500 | consumed samples: 21125120 | consumed tokens: 43264245760 | elapsed time per iteration (s): 0.08 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.527564E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.871 | TFLOPs: 11.25 | 7: iteration 82530/ 173500 | consumed samples: 21127680 | consumed tokens: 43269488640 | elapsed time per iteration (s): 0.10 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 4.520619E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.111 | TFLOPs: 9.12 | 7: iteration 82540/ 173500 | consumed samples: 21130240 | consumed tokens: 43274731520 | elapsed time per iteration (s): 0.11 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.516909E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.459 | TFLOPs: 8.85 | 7: iteration 82550/ 173500 | consumed samples: 21132800 | consumed tokens: 43279974400 | elapsed time per iteration (s): 0.10 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.520297E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.901 | TFLOPs: 9.34 | 7: iteration 82560/ 173500 | consumed samples: 21135360 | consumed tokens: 43285217280 | elapsed time per iteration (s): 0.10 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.523654E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2438.747 | TFLOPs: 9.07 | 7: iteration 82570/ 173500 | consumed samples: 21137920 | consumed tokens: 43290460160 | elapsed time per iteration (s): 0.11 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.532396E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2296.533 | TFLOPs: 8.54 | 7: iteration 82580/ 173500 | consumed samples: 21140480 | consumed tokens: 43295703040 | elapsed time per iteration (s): 0.11 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.532669E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2345.233 | TFLOPs: 8.72 | 7: iteration 82590/ 173500 | consumed samples: 21143040 | consumed tokens: 43300945920 | elapsed time per iteration (s): 0.11 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 4.528098E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.753 | TFLOPs: 8.78 | 7: iteration 82600/ 173500 | consumed samples: 21145600 | consumed tokens: 43306188800 | elapsed time per iteration (s): 0.09 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.537811E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.276 | TFLOPs: 10.32 | 7: iteration 82610/ 173500 | consumed samples: 21148160 | consumed tokens: 43311431680 | elapsed time per iteration (s): 0.08 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.524907E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.625 | TFLOPs: 11.89 | 7: iteration 82620/ 173500 | consumed samples: 21150720 | consumed tokens: 43316674560 | elapsed time per iteration (s): 0.08 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.536225E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.703 | TFLOPs: 11.86 | 7: iteration 82630/ 173500 | consumed samples: 21153280 | consumed tokens: 43321917440 | elapsed time per iteration (s): 0.11 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.520292E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2314.683 | TFLOPs: 8.61 | 7: iteration 82640/ 173500 | consumed samples: 21155840 | consumed tokens: 43327160320 | elapsed time per iteration (s): 0.13 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.527290E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.519 | TFLOPs: 7.28 | 7: iteration 82650/ 173500 | consumed samples: 21158400 | consumed tokens: 43332403200 | elapsed time per iteration (s): 0.13 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 4.529938E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.139 | TFLOPs: 7.28 | 7: iteration 82660/ 173500 | consumed samples: 21160960 | consumed tokens: 43337646080 | elapsed time per iteration (s): 0.09 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.534995E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.188 | TFLOPs: 10.53 | 7: iteration 82670/ 173500 | consumed samples: 21163520 | consumed tokens: 43342888960 | elapsed time per iteration (s): 0.08 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.522443E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.252 | TFLOPs: 11.59 | 7: iteration 82680/ 173500 | consumed samples: 21166080 | consumed tokens: 43348131840 | elapsed time per iteration (s): 0.08 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.534580E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.380 | TFLOPs: 11.91 | 7: iteration 82690/ 173500 | consumed samples: 21168640 | consumed tokens: 43353374720 | elapsed time per iteration (s): 0.08 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.527154E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.094 | TFLOPs: 11.90 | 7: iteration 82700/ 173500 | consumed samples: 21171200 | consumed tokens: 43358617600 | elapsed time per iteration (s): 0.08 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.515717E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.580 | TFLOPs: 11.90 | 7: iteration 82710/ 173500 | consumed samples: 21173760 | consumed tokens: 43363860480 | elapsed time per iteration (s): 0.10 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.540596E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.466 | TFLOPs: 9.42 | 7: iteration 82720/ 173500 | consumed samples: 21176320 | consumed tokens: 43369103360 | elapsed time per iteration (s): 0.11 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.524561E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.230 | TFLOPs: 8.95 | 7: iteration 82730/ 173500 | consumed samples: 21178880 | consumed tokens: 43374346240 | elapsed time per iteration (s): 0.10 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.526911E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.377 | TFLOPs: 9.69 | 7: iteration 82740/ 173500 | consumed samples: 21181440 | consumed tokens: 43379589120 | elapsed time per iteration (s): 0.08 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.519099E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.832 | TFLOPs: 12.01 | 7: iteration 82750/ 173500 | consumed samples: 21184000 | consumed tokens: 43384832000 | elapsed time per iteration (s): 0.09 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.524392E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.731 | TFLOPs: 10.04 | 7: iteration 82760/ 173500 | consumed samples: 21186560 | consumed tokens: 43390074880 | elapsed time per iteration (s): 0.13 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.528492E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.066 | TFLOPs: 7.26 | 7: iteration 82770/ 173500 | consumed samples: 21189120 | consumed tokens: 43395317760 | elapsed time per iteration (s): 0.09 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.528706E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.871 | TFLOPs: 11.17 | 7: iteration 82780/ 173500 | consumed samples: 21191680 | consumed tokens: 43400560640 | elapsed time per iteration (s): 0.08 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 4.524025E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.385 | TFLOPs: 11.90 | 7: iteration 82790/ 173500 | consumed samples: 21194240 | consumed tokens: 43405803520 | elapsed time per iteration (s): 0.12 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.518697E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2215.895 | TFLOPs: 8.24 | 7: iteration 82800/ 173500 | consumed samples: 21196800 | consumed tokens: 43411046400 | elapsed time per iteration (s): 0.08 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.538969E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.045 | TFLOPs: 11.20 | 7: iteration 82810/ 173500 | consumed samples: 21199360 | consumed tokens: 43416289280 | elapsed time per iteration (s): 0.08 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.545041E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.247 | TFLOPs: 11.93 | 7: iteration 82820/ 173500 | consumed samples: 21201920 | consumed tokens: 43421532160 | elapsed time per iteration (s): 0.08 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.518301E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.836 | TFLOPs: 11.94 | 7: iteration 82830/ 173500 | consumed samples: 21204480 | consumed tokens: 43426775040 | elapsed time per iteration (s): 0.08 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.518404E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.328 | TFLOPs: 11.88 | 7: iteration 82840/ 173500 | consumed samples: 21207040 | consumed tokens: 43432017920 | elapsed time per iteration (s): 0.09 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 4.531089E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2855.741 | TFLOPs: 10.62 | 7: iteration 82850/ 173500 | consumed samples: 21209600 | consumed tokens: 43437260800 | elapsed time per iteration (s): 0.09 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.530671E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2880.737 | TFLOPs: 10.72 | 7: iteration 82860/ 173500 | consumed samples: 21212160 | consumed tokens: 43442503680 | elapsed time per iteration (s): 0.12 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.543427E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.432 | TFLOPs: 7.78 | 7: iteration 82870/ 173500 | consumed samples: 21214720 | consumed tokens: 43447746560 | elapsed time per iteration (s): 0.10 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.527233E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2634.731 | TFLOPs: 9.80 | 7: iteration 82880/ 173500 | consumed samples: 21217280 | consumed tokens: 43452989440 | elapsed time per iteration (s): 0.08 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.532648E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.183 | TFLOPs: 11.70 | 7: iteration 82890/ 173500 | consumed samples: 21219840 | consumed tokens: 43458232320 | elapsed time per iteration (s): 0.08 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.522812E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.366 | TFLOPs: 11.97 | 7: iteration 82900/ 173500 | consumed samples: 21222400 | consumed tokens: 43463475200 | elapsed time per iteration (s): 0.08 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 4.539119E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.949 | TFLOPs: 11.99 | 7: iteration 82910/ 173500 | consumed samples: 21224960 | consumed tokens: 43468718080 | elapsed time per iteration (s): 0.12 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.526215E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2182.575 | TFLOPs: 8.12 | 7: iteration 82920/ 173500 | consumed samples: 21227520 | consumed tokens: 43473960960 | elapsed time per iteration (s): 0.12 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.524368E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2139.969 | TFLOPs: 7.96 | 7: iteration 82930/ 173500 | consumed samples: 21230080 | consumed tokens: 43479203840 | elapsed time per iteration (s): 0.10 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.532912E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2515.136 | TFLOPs: 9.36 | 7: iteration 82940/ 173500 | consumed samples: 21232640 | consumed tokens: 43484446720 | elapsed time per iteration (s): 0.08 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.523635E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.321 | TFLOPs: 11.98 | 7: iteration 82950/ 173500 | consumed samples: 21235200 | consumed tokens: 43489689600 | elapsed time per iteration (s): 0.11 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.525290E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2234.362 | TFLOPs: 8.31 | 7: iteration 82960/ 173500 | consumed samples: 21237760 | consumed tokens: 43494932480 | elapsed time per iteration (s): 0.08 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 4.537972E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.103 | TFLOPs: 12.03 | 7: iteration 82970/ 173500 | consumed samples: 21240320 | consumed tokens: 43500175360 | elapsed time per iteration (s): 0.08 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.522807E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.025 | TFLOPs: 12.01 | 7: iteration 82980/ 173500 | consumed samples: 21242880 | consumed tokens: 43505418240 | elapsed time per iteration (s): 0.08 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.522057E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.314 | TFLOPs: 12.01 | 7: iteration 82990/ 173500 | consumed samples: 21245440 | consumed tokens: 43510661120 | elapsed time per iteration (s): 0.08 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.515085E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.877 | TFLOPs: 12.05 | 7: iteration 83000/ 173500 | consumed samples: 21248000 | consumed tokens: 43515904000 | elapsed time per iteration (s): 0.08 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.516741E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.246 | TFLOPs: 12.04 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 83000 | lm loss value: 4.407465E+00 | lm loss PPL: 8.206117E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 83000 to checkpoints_14m91b100m 0: [2023-03-17 02:15:50,506] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step83000 is begin to save! 0: [2023-03-17 02:15:50,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:15:50,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:15:50,536] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:15:50,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:15:50,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:15:50,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:15:50,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:15:50,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:15:50,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:15:50,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:15:50,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:15:50,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:15:50,549] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step83000/mp_rank_00_model_states.pt 0: [2023-03-17 02:15:50,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:15:50,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:15:50,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:15:50,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:15:50,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:15:50,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2023-03-17 02:15:50,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2023-03-17 02:15:50,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:15:50,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:15:50,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: successfully saved checkpoint at iteration 83000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.16 7: iteration 83010/ 173500 | consumed samples: 21250560 | consumed tokens: 43521146880 | elapsed time per iteration (s): 0.09 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.516098E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.805 | TFLOPs: 10.50 | 7: iteration 83020/ 173500 | consumed samples: 21253120 | consumed tokens: 43526389760 | elapsed time per iteration (s): 0.08 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 4.521900E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.464 | TFLOPs: 12.02 | 7: iteration 83030/ 173500 | consumed samples: 21255680 | consumed tokens: 43531632640 | elapsed time per iteration (s): 0.08 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.538600E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.976 | TFLOPs: 11.99 | 7: iteration 83040/ 173500 | consumed samples: 21258240 | consumed tokens: 43536875520 | elapsed time per iteration (s): 0.08 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.526886E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.671 | TFLOPs: 11.99 | 7: iteration 83050/ 173500 | consumed samples: 21260800 | consumed tokens: 43542118400 | elapsed time per iteration (s): 0.08 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.532623E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.756 | TFLOPs: 11.96 | 7: iteration 83060/ 173500 | consumed samples: 21263360 | consumed tokens: 43547361280 | elapsed time per iteration (s): 0.08 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.514721E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.370 | TFLOPs: 11.97 | 7: iteration 83070/ 173500 | consumed samples: 21265920 | consumed tokens: 43552604160 | elapsed time per iteration (s): 0.10 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.538475E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2528.687 | TFLOPs: 9.41 | 7: iteration 83080/ 173500 | consumed samples: 21268480 | consumed tokens: 43557847040 | elapsed time per iteration (s): 0.09 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 4.525186E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.272 | TFLOPs: 10.09 | 7: iteration 83090/ 173500 | consumed samples: 21271040 | consumed tokens: 43563089920 | elapsed time per iteration (s): 0.08 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.518570E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.058 | TFLOPs: 11.28 | 7: iteration 83100/ 173500 | consumed samples: 21273600 | consumed tokens: 43568332800 | elapsed time per iteration (s): 0.08 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.508705E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.475 | TFLOPs: 11.94 | 7: iteration 83110/ 173500 | consumed samples: 21276160 | consumed tokens: 43573575680 | elapsed time per iteration (s): 0.09 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.527728E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.494 | TFLOPs: 10.31 | 7: iteration 83120/ 173500 | consumed samples: 21278720 | consumed tokens: 43578818560 | elapsed time per iteration (s): 0.10 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.520550E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.239 | TFLOPs: 9.46 | 7: iteration 83130/ 173500 | consumed samples: 21281280 | consumed tokens: 43584061440 | elapsed time per iteration (s): 0.08 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.538462E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.997 | TFLOPs: 12.06 | 7: iteration 83140/ 173500 | consumed samples: 21283840 | consumed tokens: 43589304320 | elapsed time per iteration (s): 0.09 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 4.530799E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.917 | TFLOPs: 11.14 | 7: iteration 83150/ 173500 | consumed samples: 21286400 | consumed tokens: 43594547200 | elapsed time per iteration (s): 0.10 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.524258E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.678 | TFLOPs: 9.09 | 7: iteration 83160/ 173500 | consumed samples: 21288960 | consumed tokens: 43599790080 | elapsed time per iteration (s): 0.10 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.526157E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.160 | TFLOPs: 9.30 | 7: iteration 83170/ 173500 | consumed samples: 21291520 | consumed tokens: 43605032960 | elapsed time per iteration (s): 0.09 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.517843E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.773 | TFLOPs: 10.76 | 7: iteration 83180/ 173500 | consumed samples: 21294080 | consumed tokens: 43610275840 | elapsed time per iteration (s): 0.08 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.515977E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.636 | TFLOPs: 11.99 | 7: iteration 83190/ 173500 | consumed samples: 21296640 | consumed tokens: 43615518720 | elapsed time per iteration (s): 0.08 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.529066E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.524 | TFLOPs: 11.49 | 7: iteration 83200/ 173500 | consumed samples: 21299200 | consumed tokens: 43620761600 | elapsed time per iteration (s): 0.09 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 4.526804E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.792 | TFLOPs: 11.05 | 7: iteration 83210/ 173500 | consumed samples: 21301760 | consumed tokens: 43626004480 | elapsed time per iteration (s): 0.10 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.531984E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2692.418 | TFLOPs: 10.01 | 7: iteration 83220/ 173500 | consumed samples: 21304320 | consumed tokens: 43631247360 | elapsed time per iteration (s): 0.09 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.528554E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2991.664 | TFLOPs: 11.13 | 7: iteration 83230/ 173500 | consumed samples: 21306880 | consumed tokens: 43636490240 | elapsed time per iteration (s): 0.08 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.536048E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.892 | TFLOPs: 11.80 | 7: iteration 83240/ 173500 | consumed samples: 21309440 | consumed tokens: 43641733120 | elapsed time per iteration (s): 0.08 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.522409E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.054 | TFLOPs: 11.81 | 7: iteration 83250/ 173500 | consumed samples: 21312000 | consumed tokens: 43646976000 | elapsed time per iteration (s): 0.08 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.534641E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.519 | TFLOPs: 11.81 | 7: iteration 83260/ 173500 | consumed samples: 21314560 | consumed tokens: 43652218880 | elapsed time per iteration (s): 0.08 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 4.525576E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.996 | TFLOPs: 11.83 | 7: iteration 83270/ 173500 | consumed samples: 21317120 | consumed tokens: 43657461760 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.532168E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.265 | TFLOPs: 11.96 | 7: iteration 83280/ 173500 | consumed samples: 21319680 | consumed tokens: 43662704640 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.522364E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.662 | TFLOPs: 12.00 | 7: iteration 83290/ 173500 | consumed samples: 21322240 | consumed tokens: 43667947520 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.538838E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.129 | TFLOPs: 12.04 | 7: iteration 83300/ 173500 | consumed samples: 21324800 | consumed tokens: 43673190400 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.526807E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.478 | TFLOPs: 12.02 | 7: iteration 83310/ 173500 | consumed samples: 21327360 | consumed tokens: 43678433280 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.536742E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.385 | TFLOPs: 12.02 | 7: iteration 83320/ 173500 | consumed samples: 21329920 | consumed tokens: 43683676160 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.519235E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.916 | TFLOPs: 11.92 | 7: iteration 83330/ 173500 | consumed samples: 21332480 | consumed tokens: 43688919040 | elapsed time per iteration (s): 0.08 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 4.524845E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.452 | TFLOPs: 11.95 | 7: iteration 83340/ 173500 | consumed samples: 21335040 | consumed tokens: 43694161920 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.519154E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.561 | TFLOPs: 11.92 | 7: iteration 83350/ 173500 | consumed samples: 21337600 | consumed tokens: 43699404800 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.527340E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.091 | TFLOPs: 11.92 | 7: iteration 83360/ 173500 | consumed samples: 21340160 | consumed tokens: 43704647680 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.525051E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.173 | TFLOPs: 11.52 | 7: iteration 83370/ 173500 | consumed samples: 21342720 | consumed tokens: 43709890560 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.528269E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.331 | TFLOPs: 11.65 | 7: iteration 83380/ 173500 | consumed samples: 21345280 | consumed tokens: 43715133440 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.525922E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.318 | TFLOPs: 11.99 | 7: iteration 83390/ 173500 | consumed samples: 21347840 | consumed tokens: 43720376320 | elapsed time per iteration (s): 0.08 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 4.520572E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.077 | TFLOPs: 11.92 | 7: iteration 83400/ 173500 | consumed samples: 21350400 | consumed tokens: 43725619200 | elapsed time per iteration (s): 0.08 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.525966E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.136 | TFLOPs: 11.99 | 7: iteration 83410/ 173500 | consumed samples: 21352960 | consumed tokens: 43730862080 | elapsed time per iteration (s): 0.08 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.530611E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.347 | TFLOPs: 11.82 | 7: iteration 83420/ 173500 | consumed samples: 21355520 | consumed tokens: 43736104960 | elapsed time per iteration (s): 0.08 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.520554E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.889 | TFLOPs: 11.79 | 7: iteration 83430/ 173500 | consumed samples: 21358080 | consumed tokens: 43741347840 | elapsed time per iteration (s): 0.08 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.534123E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.933 | TFLOPs: 11.80 | 7: iteration 83440/ 173500 | consumed samples: 21360640 | consumed tokens: 43746590720 | elapsed time per iteration (s): 0.08 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.526541E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.926 | TFLOPs: 11.79 | 7: iteration 83450/ 173500 | consumed samples: 21363200 | consumed tokens: 43751833600 | elapsed time per iteration (s): 0.12 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 4.520830E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2076.660 | TFLOPs: 7.72 | 7: iteration 83460/ 173500 | consumed samples: 21365760 | consumed tokens: 43757076480 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.510963E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.116 | TFLOPs: 11.89 | 7: iteration 83470/ 173500 | consumed samples: 21368320 | consumed tokens: 43762319360 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.539128E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.519 | TFLOPs: 11.59 | 7: iteration 83480/ 173500 | consumed samples: 21370880 | consumed tokens: 43767562240 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.518365E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.280 | TFLOPs: 11.84 | 7: iteration 83490/ 173500 | consumed samples: 21373440 | consumed tokens: 43772805120 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.534145E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.860 | TFLOPs: 11.85 | 7: iteration 83500/ 173500 | consumed samples: 21376000 | consumed tokens: 43778048000 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.533610E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.922 | TFLOPs: 11.88 | 7: iteration 83510/ 173500 | consumed samples: 21378560 | consumed tokens: 43783290880 | elapsed time per iteration (s): 0.08 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 4.527380E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.868 | TFLOPs: 11.86 | 7: iteration 83520/ 173500 | consumed samples: 21381120 | consumed tokens: 43788533760 | elapsed time per iteration (s): 0.08 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.535242E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.318 | TFLOPs: 11.88 | 7: iteration 83530/ 173500 | consumed samples: 21383680 | consumed tokens: 43793776640 | elapsed time per iteration (s): 0.08 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.527078E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.180 | TFLOPs: 11.94 | 7: iteration 83540/ 173500 | consumed samples: 21386240 | consumed tokens: 43799019520 | elapsed time per iteration (s): 0.08 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.523663E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.280 | TFLOPs: 11.88 | 7: iteration 83550/ 173500 | consumed samples: 21388800 | consumed tokens: 43804262400 | elapsed time per iteration (s): 0.08 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.539706E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.693 | TFLOPs: 11.64 | 7: iteration 83560/ 173500 | consumed samples: 21391360 | consumed tokens: 43809505280 | elapsed time per iteration (s): 0.10 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.540299E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.023 | TFLOPs: 9.12 | 7: iteration 83570/ 173500 | consumed samples: 21393920 | consumed tokens: 43814748160 | elapsed time per iteration (s): 0.10 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 4.526669E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.532 | TFLOPs: 9.34 | 7: iteration 83580/ 173500 | consumed samples: 21396480 | consumed tokens: 43819991040 | elapsed time per iteration (s): 0.10 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.528350E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.569 | TFLOPs: 9.25 | 7: iteration 83590/ 173500 | consumed samples: 21399040 | consumed tokens: 43825233920 | elapsed time per iteration (s): 0.11 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.526402E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2383.366 | TFLOPs: 8.87 | 7: iteration 83600/ 173500 | consumed samples: 21401600 | consumed tokens: 43830476800 | elapsed time per iteration (s): 0.08 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.537950E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.420 | TFLOPs: 11.43 | 7: iteration 83610/ 173500 | consumed samples: 21404160 | consumed tokens: 43835719680 | elapsed time per iteration (s): 0.08 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.529624E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.231 | TFLOPs: 12.02 | 7: iteration 83620/ 173500 | consumed samples: 21406720 | consumed tokens: 43840962560 | elapsed time per iteration (s): 0.08 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.521481E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.561 | TFLOPs: 11.96 | 7: iteration 83630/ 173500 | consumed samples: 21409280 | consumed tokens: 43846205440 | elapsed time per iteration (s): 0.08 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 4.530828E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.343 | TFLOPs: 11.92 | 7: iteration 83640/ 173500 | consumed samples: 21411840 | consumed tokens: 43851448320 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.527483E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.793 | TFLOPs: 12.04 | 7: iteration 83650/ 173500 | consumed samples: 21414400 | consumed tokens: 43856691200 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.528919E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.245 | TFLOPs: 11.50 | 7: iteration 83660/ 173500 | consumed samples: 21416960 | consumed tokens: 43861934080 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.520621E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.104 | TFLOPs: 12.02 | 7: iteration 83670/ 173500 | consumed samples: 21419520 | consumed tokens: 43867176960 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.524260E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.626 | TFLOPs: 11.92 | 7: iteration 83680/ 173500 | consumed samples: 21422080 | consumed tokens: 43872419840 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.520557E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.528 | TFLOPs: 12.00 | 7: iteration 83690/ 173500 | consumed samples: 21424640 | consumed tokens: 43877662720 | elapsed time per iteration (s): 0.08 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 4.527907E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.257 | TFLOPs: 12.03 | 7: iteration 83700/ 173500 | consumed samples: 21427200 | consumed tokens: 43882905600 | elapsed time per iteration (s): 0.08 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.527726E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.459 | TFLOPs: 12.02 | 7: iteration 83710/ 173500 | consumed samples: 21429760 | consumed tokens: 43888148480 | elapsed time per iteration (s): 0.08 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.524177E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.025 | TFLOPs: 11.96 | 7: iteration 83720/ 173500 | consumed samples: 21432320 | consumed tokens: 43893391360 | elapsed time per iteration (s): 0.08 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.537626E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.236 | TFLOPs: 11.96 | 7: iteration 83730/ 173500 | consumed samples: 21434880 | consumed tokens: 43898634240 | elapsed time per iteration (s): 0.09 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.530730E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.285 | TFLOPs: 10.44 | 7: iteration 83740/ 173500 | consumed samples: 21437440 | consumed tokens: 43903877120 | elapsed time per iteration (s): 0.10 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.544926E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.523 | TFLOPs: 10.01 | 7: iteration 83750/ 173500 | consumed samples: 21440000 | consumed tokens: 43909120000 | elapsed time per iteration (s): 0.08 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 4.531459E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.073 | TFLOPs: 11.93 | 7: iteration 83760/ 173500 | consumed samples: 21442560 | consumed tokens: 43914362880 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.536651E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.981 | TFLOPs: 12.02 | 7: iteration 83770/ 173500 | consumed samples: 21445120 | consumed tokens: 43919605760 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.514939E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.122 | TFLOPs: 11.99 | 7: iteration 83780/ 173500 | consumed samples: 21447680 | consumed tokens: 43924848640 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.523312E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.816 | TFLOPs: 12.03 | 7: iteration 83790/ 173500 | consumed samples: 21450240 | consumed tokens: 43930091520 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.534611E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.167 | TFLOPs: 12.01 | 7: iteration 83800/ 173500 | consumed samples: 21452800 | consumed tokens: 43935334400 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.519571E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.566 | TFLOPs: 11.96 | 7: iteration 83810/ 173500 | consumed samples: 21455360 | consumed tokens: 43940577280 | elapsed time per iteration (s): 0.08 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.536473E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.315 | TFLOPs: 12.01 | 7: iteration 83820/ 173500 | consumed samples: 21457920 | consumed tokens: 43945820160 | elapsed time per iteration (s): 0.08 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.538143E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.579 | TFLOPs: 11.98 | 7: iteration 83830/ 173500 | consumed samples: 21460480 | consumed tokens: 43951063040 | elapsed time per iteration (s): 0.08 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.523820E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.805 | TFLOPs: 11.97 | 7: iteration 83840/ 173500 | consumed samples: 21463040 | consumed tokens: 43956305920 | elapsed time per iteration (s): 0.08 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.528266E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.677 | TFLOPs: 12.01 | 7: iteration 83850/ 173500 | consumed samples: 21465600 | consumed tokens: 43961548800 | elapsed time per iteration (s): 0.10 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.523476E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2565.263 | TFLOPs: 9.54 | 7: iteration 83860/ 173500 | consumed samples: 21468160 | consumed tokens: 43966791680 | elapsed time per iteration (s): 0.10 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.522887E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.722 | TFLOPs: 9.09 | 7: iteration 83870/ 173500 | consumed samples: 21470720 | consumed tokens: 43972034560 | elapsed time per iteration (s): 0.08 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 4.534375E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.075 | TFLOPs: 11.95 | 7: iteration 83880/ 173500 | consumed samples: 21473280 | consumed tokens: 43977277440 | elapsed time per iteration (s): 0.08 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.529132E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.820 | TFLOPs: 11.91 | 7: iteration 83890/ 173500 | consumed samples: 21475840 | consumed tokens: 43982520320 | elapsed time per iteration (s): 0.08 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.528131E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.690 | TFLOPs: 11.89 | 7: iteration 83900/ 173500 | consumed samples: 21478400 | consumed tokens: 43987763200 | elapsed time per iteration (s): 0.08 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.523390E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.226 | TFLOPs: 11.61 | 7: iteration 83910/ 173500 | consumed samples: 21480960 | consumed tokens: 43993006080 | elapsed time per iteration (s): 0.18 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.518376E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1430.646 | TFLOPs: 5.32 | 7: iteration 83920/ 173500 | consumed samples: 21483520 | consumed tokens: 43998248960 | elapsed time per iteration (s): 0.08 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.528490E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.715 | TFLOPs: 11.29 | 7: iteration 83930/ 173500 | consumed samples: 21486080 | consumed tokens: 44003491840 | elapsed time per iteration (s): 0.08 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 4.521762E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.141 | TFLOPs: 11.89 | 7: iteration 83940/ 173500 | consumed samples: 21488640 | consumed tokens: 44008734720 | elapsed time per iteration (s): 0.08 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.527467E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.969 | TFLOPs: 11.82 | 7: iteration 83950/ 173500 | consumed samples: 21491200 | consumed tokens: 44013977600 | elapsed time per iteration (s): 0.13 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.529514E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2021.220 | TFLOPs: 7.52 | 7: iteration 83960/ 173500 | consumed samples: 21493760 | consumed tokens: 44019220480 | elapsed time per iteration (s): 0.12 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.523883E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.696 | TFLOPs: 7.75 | 7: iteration 83970/ 173500 | consumed samples: 21496320 | consumed tokens: 44024463360 | elapsed time per iteration (s): 0.08 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.535219E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.634 | TFLOPs: 11.78 | 7: iteration 83980/ 173500 | consumed samples: 21498880 | consumed tokens: 44029706240 | elapsed time per iteration (s): 0.08 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.531404E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.083 | TFLOPs: 11.98 | 7: iteration 83990/ 173500 | consumed samples: 21501440 | consumed tokens: 44034949120 | elapsed time per iteration (s): 0.08 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.509838E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.814 | TFLOPs: 11.99 | 0: [2023-03-17 02:17:16,254] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=0, lr=[0.00011595088621669176, 0.00011595088621669176, 0.00011595088621669176], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 84000/ 173500 | consumed samples: 21504000 | consumed tokens: 44040192000 | elapsed time per iteration (s): 0.08 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 4.529989E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.111 | TFLOPs: 12.01 | 0: steps: 84000 loss: 4.5168 iter time (s): 0.090 samples/sec: 2855.169 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 84000 | lm loss value: 4.436751E+00 | lm loss PPL: 8.449995E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 84000 to checkpoints_14m91b100m 0: [2023-03-17 02:17:16,310] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step84000 is begin to save! 0: [2023-03-17 02:17:16,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:17:16,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:17:16,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:17:16,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:17:16,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:17:16,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:17:16,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:17:16,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:17:16,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:17:16,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:17:16,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:17:16,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:17:16,353] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step84000/mp_rank_00_model_states.pt 0: [2023-03-17 02:17:16,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:17:16,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:17:16,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:17:16,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2023-03-17 02:17:16,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 02:17:16,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2023-03-17 02:17:16,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: successfully saved checkpoint at iteration 84000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.37 7: iteration 84010/ 173500 | consumed samples: 21506560 | consumed tokens: 44045434880 | elapsed time per iteration (s): 0.10 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.539277E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2446.281 | TFLOPs: 9.10 | 7: iteration 84020/ 173500 | consumed samples: 21509120 | consumed tokens: 44050677760 | elapsed time per iteration (s): 0.08 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.528366E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.927 | TFLOPs: 11.95 | 7: iteration 84030/ 173500 | consumed samples: 21511680 | consumed tokens: 44055920640 | elapsed time per iteration (s): 0.08 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.532884E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.241 | TFLOPs: 11.97 | 7: iteration 84040/ 173500 | consumed samples: 21514240 | consumed tokens: 44061163520 | elapsed time per iteration (s): 0.12 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.543328E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.894 | TFLOPs: 8.02 | 7: iteration 84050/ 173500 | consumed samples: 21516800 | consumed tokens: 44066406400 | elapsed time per iteration (s): 0.13 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.523447E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.648 | TFLOPs: 7.21 | 7: iteration 84060/ 173500 | consumed samples: 21519360 | consumed tokens: 44071649280 | elapsed time per iteration (s): 0.10 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 4.529876E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.636 | TFLOPs: 9.12 | 7: iteration 84070/ 173500 | consumed samples: 21521920 | consumed tokens: 44076892160 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.542056E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.452 | TFLOPs: 12.02 | 7: iteration 84080/ 173500 | consumed samples: 21524480 | consumed tokens: 44082135040 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.528003E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.553 | TFLOPs: 12.02 | 7: iteration 84090/ 173500 | consumed samples: 21527040 | consumed tokens: 44087377920 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.527385E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.378 | TFLOPs: 11.92 | 7: iteration 84100/ 173500 | consumed samples: 21529600 | consumed tokens: 44092620800 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.524267E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.502 | TFLOPs: 11.96 | 7: iteration 84110/ 173500 | consumed samples: 21532160 | consumed tokens: 44097863680 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.520693E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.088 | TFLOPs: 11.88 | 7: iteration 84120/ 173500 | consumed samples: 21534720 | consumed tokens: 44103106560 | elapsed time per iteration (s): 0.08 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 4.524077E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.155 | TFLOPs: 11.57 | 7: iteration 84130/ 173500 | consumed samples: 21537280 | consumed tokens: 44108349440 | elapsed time per iteration (s): 0.09 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.529078E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.105 | TFLOPs: 10.46 | 7: iteration 84140/ 173500 | consumed samples: 21539840 | consumed tokens: 44113592320 | elapsed time per iteration (s): 0.09 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.534779E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.451 | TFLOPs: 10.23 | 7: iteration 84150/ 173500 | consumed samples: 21542400 | consumed tokens: 44118835200 | elapsed time per iteration (s): 0.08 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.527195E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.581 | TFLOPs: 11.94 | 7: iteration 84160/ 173500 | consumed samples: 21544960 | consumed tokens: 44124078080 | elapsed time per iteration (s): 0.08 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.531118E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.078 | TFLOPs: 11.92 | 7: iteration 84170/ 173500 | consumed samples: 21547520 | consumed tokens: 44129320960 | elapsed time per iteration (s): 0.09 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.530799E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.017 | TFLOPs: 10.10 | 7: iteration 84180/ 173500 | consumed samples: 21550080 | consumed tokens: 44134563840 | elapsed time per iteration (s): 0.13 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 4.519963E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.263 | TFLOPs: 7.21 | 7: iteration 84190/ 173500 | consumed samples: 21552640 | consumed tokens: 44139806720 | elapsed time per iteration (s): 0.08 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.522073E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.529 | TFLOPs: 11.64 | 7: iteration 84200/ 173500 | consumed samples: 21555200 | consumed tokens: 44145049600 | elapsed time per iteration (s): 0.08 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.514791E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.466 | TFLOPs: 11.94 | 7: iteration 84210/ 173500 | consumed samples: 21557760 | consumed tokens: 44150292480 | elapsed time per iteration (s): 0.09 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.525143E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.891 | TFLOPs: 11.17 | 7: iteration 84220/ 173500 | consumed samples: 21560320 | consumed tokens: 44155535360 | elapsed time per iteration (s): 0.08 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.528583E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.518 | TFLOPs: 11.91 | 7: iteration 84230/ 173500 | consumed samples: 21562880 | consumed tokens: 44160778240 | elapsed time per iteration (s): 0.08 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.526661E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.030 | TFLOPs: 11.75 | 7: iteration 84240/ 173500 | consumed samples: 21565440 | consumed tokens: 44166021120 | elapsed time per iteration (s): 0.08 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 4.539798E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.704 | TFLOPs: 11.89 | 7: iteration 84250/ 173500 | consumed samples: 21568000 | consumed tokens: 44171264000 | elapsed time per iteration (s): 0.08 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.527086E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.530 | TFLOPs: 11.83 | 7: iteration 84260/ 173500 | consumed samples: 21570560 | consumed tokens: 44176506880 | elapsed time per iteration (s): 0.08 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.511643E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.750 | TFLOPs: 11.38 | 7: iteration 84270/ 173500 | consumed samples: 21573120 | consumed tokens: 44181749760 | elapsed time per iteration (s): 0.09 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.523166E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.045 | TFLOPs: 10.44 | 7: iteration 84280/ 173500 | consumed samples: 21575680 | consumed tokens: 44186992640 | elapsed time per iteration (s): 0.11 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.530797E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.551 | TFLOPs: 8.35 | 7: iteration 84290/ 173500 | consumed samples: 21578240 | consumed tokens: 44192235520 | elapsed time per iteration (s): 0.10 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.538669E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2518.442 | TFLOPs: 9.37 | 7: iteration 84300/ 173500 | consumed samples: 21580800 | consumed tokens: 44197478400 | elapsed time per iteration (s): 0.08 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 4.527102E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.610 | TFLOPs: 11.59 | 7: iteration 84310/ 173500 | consumed samples: 21583360 | consumed tokens: 44202721280 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.518371E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.337 | TFLOPs: 11.48 | 7: iteration 84320/ 173500 | consumed samples: 21585920 | consumed tokens: 44207964160 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.527713E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.168 | TFLOPs: 11.84 | 7: iteration 84330/ 173500 | consumed samples: 21588480 | consumed tokens: 44213207040 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.526220E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.549 | TFLOPs: 11.52 | 7: iteration 84340/ 173500 | consumed samples: 21591040 | consumed tokens: 44218449920 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.525716E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.846 | TFLOPs: 11.58 | 7: iteration 84350/ 173500 | consumed samples: 21593600 | consumed tokens: 44223692800 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.539165E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.367 | TFLOPs: 11.48 | 7: iteration 84360/ 173500 | consumed samples: 21596160 | consumed tokens: 44228935680 | elapsed time per iteration (s): 0.08 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 4.541953E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.655 | TFLOPs: 11.69 | 7: iteration 84370/ 173500 | consumed samples: 21598720 | consumed tokens: 44234178560 | elapsed time per iteration (s): 0.08 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.522348E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.939 | TFLOPs: 11.84 | 7: iteration 84380/ 173500 | consumed samples: 21601280 | consumed tokens: 44239421440 | elapsed time per iteration (s): 0.08 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.513455E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.405 | TFLOPs: 11.79 | 7: iteration 84390/ 173500 | consumed samples: 21603840 | consumed tokens: 44244664320 | elapsed time per iteration (s): 0.08 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.526224E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.081 | TFLOPs: 11.75 | 7: iteration 84400/ 173500 | consumed samples: 21606400 | consumed tokens: 44249907200 | elapsed time per iteration (s): 0.08 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.531195E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.439 | TFLOPs: 11.82 | 7: iteration 84410/ 173500 | consumed samples: 21608960 | consumed tokens: 44255150080 | elapsed time per iteration (s): 0.08 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.524826E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.002 | TFLOPs: 11.79 | 7: iteration 84420/ 173500 | consumed samples: 21611520 | consumed tokens: 44260392960 | elapsed time per iteration (s): 0.10 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 4.537182E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2657.310 | TFLOPs: 9.88 | 7: iteration 84430/ 173500 | consumed samples: 21614080 | consumed tokens: 44265635840 | elapsed time per iteration (s): 0.09 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.515250E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2903.571 | TFLOPs: 10.80 | 7: iteration 84440/ 173500 | consumed samples: 21616640 | consumed tokens: 44270878720 | elapsed time per iteration (s): 0.09 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.533317E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.282 | TFLOPs: 10.18 | 7: iteration 84450/ 173500 | consumed samples: 21619200 | consumed tokens: 44276121600 | elapsed time per iteration (s): 0.10 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.539590E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.855 | TFLOPs: 9.84 | 7: iteration 84460/ 173500 | consumed samples: 21621760 | consumed tokens: 44281364480 | elapsed time per iteration (s): 0.09 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.525204E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.879 | TFLOPs: 10.54 | 7: iteration 84470/ 173500 | consumed samples: 21624320 | consumed tokens: 44286607360 | elapsed time per iteration (s): 0.09 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.527110E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.367 | TFLOPs: 10.03 | 7: iteration 84480/ 173500 | consumed samples: 21626880 | consumed tokens: 44291850240 | elapsed time per iteration (s): 0.08 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 4.526691E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.153 | TFLOPs: 11.94 | 7: iteration 84490/ 173500 | consumed samples: 21629440 | consumed tokens: 44297093120 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.522999E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.712 | TFLOPs: 11.98 | 7: iteration 84500/ 173500 | consumed samples: 21632000 | consumed tokens: 44302336000 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.529462E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.804 | TFLOPs: 11.99 | 7: iteration 84510/ 173500 | consumed samples: 21634560 | consumed tokens: 44307578880 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.526209E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.811 | TFLOPs: 12.01 | 7: iteration 84520/ 173500 | consumed samples: 21637120 | consumed tokens: 44312821760 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.528235E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.667 | TFLOPs: 12.02 | 7: iteration 84530/ 173500 | consumed samples: 21639680 | consumed tokens: 44318064640 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.524348E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.986 | TFLOPs: 11.60 | 7: iteration 84540/ 173500 | consumed samples: 21642240 | consumed tokens: 44323307520 | elapsed time per iteration (s): 0.08 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 4.513397E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.246 | TFLOPs: 12.02 | 7: iteration 84550/ 173500 | consumed samples: 21644800 | consumed tokens: 44328550400 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.533490E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.271 | TFLOPs: 11.94 | 7: iteration 84560/ 173500 | consumed samples: 21647360 | consumed tokens: 44333793280 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.521555E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.289 | TFLOPs: 11.97 | 7: iteration 84570/ 173500 | consumed samples: 21649920 | consumed tokens: 44339036160 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.536423E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.389 | TFLOPs: 11.96 | 7: iteration 84580/ 173500 | consumed samples: 21652480 | consumed tokens: 44344279040 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.510673E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.159 | TFLOPs: 11.97 | 7: iteration 84590/ 173500 | consumed samples: 21655040 | consumed tokens: 44349521920 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.523253E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.623 | TFLOPs: 11.98 | 7: iteration 84600/ 173500 | consumed samples: 21657600 | consumed tokens: 44354764800 | elapsed time per iteration (s): 0.08 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 4.533392E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.001 | TFLOPs: 11.97 | 7: iteration 84610/ 173500 | consumed samples: 21660160 | consumed tokens: 44360007680 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.531076E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.126 | TFLOPs: 11.86 | 7: iteration 84620/ 173500 | consumed samples: 21662720 | consumed tokens: 44365250560 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.524422E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.712 | TFLOPs: 11.80 | 7: iteration 84630/ 173500 | consumed samples: 21665280 | consumed tokens: 44370493440 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.534821E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.483 | TFLOPs: 11.87 | 7: iteration 84640/ 173500 | consumed samples: 21667840 | consumed tokens: 44375736320 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.531368E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.555 | TFLOPs: 11.80 | 7: iteration 84650/ 173500 | consumed samples: 21670400 | consumed tokens: 44380979200 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.519047E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.242 | TFLOPs: 11.77 | 7: iteration 84660/ 173500 | consumed samples: 21672960 | consumed tokens: 44386222080 | elapsed time per iteration (s): 0.08 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 4.521284E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.346 | TFLOPs: 11.86 | 7: iteration 84670/ 173500 | consumed samples: 21675520 | consumed tokens: 44391464960 | elapsed time per iteration (s): 0.08 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.537551E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.464 | TFLOPs: 11.74 | 7: iteration 84680/ 173500 | consumed samples: 21678080 | consumed tokens: 44396707840 | elapsed time per iteration (s): 0.11 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.536056E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2356.635 | TFLOPs: 8.77 | 7: iteration 84690/ 173500 | consumed samples: 21680640 | consumed tokens: 44401950720 | elapsed time per iteration (s): 0.11 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.537669E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2398.322 | TFLOPs: 8.92 | 7: iteration 84700/ 173500 | consumed samples: 21683200 | consumed tokens: 44407193600 | elapsed time per iteration (s): 0.08 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.526125E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.417 | TFLOPs: 11.84 | 7: iteration 84710/ 173500 | consumed samples: 21685760 | consumed tokens: 44412436480 | elapsed time per iteration (s): 0.08 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.533351E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.448 | TFLOPs: 11.87 | 7: iteration 84720/ 173500 | consumed samples: 21688320 | consumed tokens: 44417679360 | elapsed time per iteration (s): 0.08 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.534186E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.576 | TFLOPs: 11.86 | 7: iteration 84730/ 173500 | consumed samples: 21690880 | consumed tokens: 44422922240 | elapsed time per iteration (s): 0.08 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 4.547053E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.919 | TFLOPs: 11.87 | 7: iteration 84740/ 173500 | consumed samples: 21693440 | consumed tokens: 44428165120 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.516527E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.158 | TFLOPs: 11.86 | 7: iteration 84750/ 173500 | consumed samples: 21696000 | consumed tokens: 44433408000 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.533779E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.613 | TFLOPs: 11.60 | 7: iteration 84760/ 173500 | consumed samples: 21698560 | consumed tokens: 44438650880 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.519275E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.916 | TFLOPs: 11.91 | 7: iteration 84770/ 173500 | consumed samples: 21701120 | consumed tokens: 44443893760 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.536887E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.697 | TFLOPs: 11.89 | 7: iteration 84780/ 173500 | consumed samples: 21703680 | consumed tokens: 44449136640 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.525131E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.361 | TFLOPs: 11.89 | 7: iteration 84790/ 173500 | consumed samples: 21706240 | consumed tokens: 44454379520 | elapsed time per iteration (s): 0.08 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 4.527814E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.090 | TFLOPs: 11.91 | 7: iteration 84800/ 173500 | consumed samples: 21708800 | consumed tokens: 44459622400 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.525760E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.947 | TFLOPs: 11.87 | 7: iteration 84810/ 173500 | consumed samples: 21711360 | consumed tokens: 44464865280 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.529460E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.091 | TFLOPs: 11.79 | 7: iteration 84820/ 173500 | consumed samples: 21713920 | consumed tokens: 44470108160 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.526600E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.255 | TFLOPs: 11.80 | 7: iteration 84830/ 173500 | consumed samples: 21716480 | consumed tokens: 44475351040 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.536139E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.045 | TFLOPs: 11.69 | 7: iteration 84840/ 173500 | consumed samples: 21719040 | consumed tokens: 44480593920 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.534092E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.809 | TFLOPs: 11.81 | 7: iteration 84850/ 173500 | consumed samples: 21721600 | consumed tokens: 44485836800 | elapsed time per iteration (s): 0.08 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.543492E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.850 | TFLOPs: 11.83 | 7: iteration 84860/ 173500 | consumed samples: 21724160 | consumed tokens: 44491079680 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.544868E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.239 | TFLOPs: 11.82 | 7: iteration 84870/ 173500 | consumed samples: 21726720 | consumed tokens: 44496322560 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.521577E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.063 | TFLOPs: 11.79 | 7: iteration 84880/ 173500 | consumed samples: 21729280 | consumed tokens: 44501565440 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.535290E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.833 | TFLOPs: 11.83 | 7: iteration 84890/ 173500 | consumed samples: 21731840 | consumed tokens: 44506808320 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.527980E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.117 | TFLOPs: 11.80 | 7: iteration 84900/ 173500 | consumed samples: 21734400 | consumed tokens: 44512051200 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.520759E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.385 | TFLOPs: 11.75 | 7: iteration 84910/ 173500 | consumed samples: 21736960 | consumed tokens: 44517294080 | elapsed time per iteration (s): 0.08 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 4.534725E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.991 | TFLOPs: 11.91 | 7: iteration 84920/ 173500 | consumed samples: 21739520 | consumed tokens: 44522536960 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.533326E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.198 | TFLOPs: 11.91 | 7: iteration 84930/ 173500 | consumed samples: 21742080 | consumed tokens: 44527779840 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.529205E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.112 | TFLOPs: 11.91 | 7: iteration 84940/ 173500 | consumed samples: 21744640 | consumed tokens: 44533022720 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.534000E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.028 | TFLOPs: 11.90 | 7: iteration 84950/ 173500 | consumed samples: 21747200 | consumed tokens: 44538265600 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.517237E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.075 | TFLOPs: 11.91 | 7: iteration 84960/ 173500 | consumed samples: 21749760 | consumed tokens: 44543508480 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.527068E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.548 | TFLOPs: 11.87 | 7: iteration 84970/ 173500 | consumed samples: 21752320 | consumed tokens: 44548751360 | elapsed time per iteration (s): 0.08 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 4.523161E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.006 | TFLOPs: 11.82 | 7: iteration 84980/ 173500 | consumed samples: 21754880 | consumed tokens: 44553994240 | elapsed time per iteration (s): 0.10 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.515208E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2609.552 | TFLOPs: 9.71 | 7: iteration 84990/ 173500 | consumed samples: 21757440 | consumed tokens: 44559237120 | elapsed time per iteration (s): 0.08 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.524969E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.451 | TFLOPs: 11.87 | 7: iteration 85000/ 173500 | consumed samples: 21760000 | consumed tokens: 44564480000 | elapsed time per iteration (s): 0.08 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.518040E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.804 | TFLOPs: 11.91 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 85000 | lm loss value: 4.398833E+00 | lm loss PPL: 8.135589E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 85000 to checkpoints_14m91b100m 0: [2023-03-17 02:18:41,215] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step85000 is begin to save! 0: [2023-03-17 02:18:41,218] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:18:41,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:18:41,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:18:41,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:18:41,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:18:41,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:18:41,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:18:41,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:18:41,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:18:41,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:18:41,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:18:41,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:18:41,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step85000/mp_rank_00_model_states.pt 0: [2023-03-17 02:18:41,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:18:41,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:18:41,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:18:41,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:18:41,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2023-03-17 02:18:41,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:18:41,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2023-03-17 02:18:41,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:18:41,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: successfully saved checkpoint at iteration 85000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.63 7: iteration 85010/ 173500 | consumed samples: 21762560 | consumed tokens: 44569722880 | elapsed time per iteration (s): 0.09 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.547112E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2796.047 | TFLOPs: 10.40 | 7: iteration 85020/ 173500 | consumed samples: 21765120 | consumed tokens: 44574965760 | elapsed time per iteration (s): 0.08 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.528612E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.114 | TFLOPs: 11.88 | 7: iteration 85030/ 173500 | consumed samples: 21767680 | consumed tokens: 44580208640 | elapsed time per iteration (s): 0.08 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 4.541963E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.748 | TFLOPs: 11.89 | 7: iteration 85040/ 173500 | consumed samples: 21770240 | consumed tokens: 44585451520 | elapsed time per iteration (s): 0.08 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.534132E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.317 | TFLOPs: 11.77 | 7: iteration 85050/ 173500 | consumed samples: 21772800 | consumed tokens: 44590694400 | elapsed time per iteration (s): 0.08 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.540331E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.795 | TFLOPs: 11.81 | 7: iteration 85060/ 173500 | consumed samples: 21775360 | consumed tokens: 44595937280 | elapsed time per iteration (s): 0.08 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.544872E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.783 | TFLOPs: 11.77 | 7: iteration 85070/ 173500 | consumed samples: 21777920 | consumed tokens: 44601180160 | elapsed time per iteration (s): 0.08 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.525426E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.205 | TFLOPs: 11.92 | 7: iteration 85080/ 173500 | consumed samples: 21780480 | consumed tokens: 44606423040 | elapsed time per iteration (s): 0.08 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.537033E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.126 | TFLOPs: 11.80 | 7: iteration 85090/ 173500 | consumed samples: 21783040 | consumed tokens: 44611665920 | elapsed time per iteration (s): 0.11 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 4.533895E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2274.791 | TFLOPs: 8.46 | 7: iteration 85100/ 173500 | consumed samples: 21785600 | consumed tokens: 44616908800 | elapsed time per iteration (s): 0.09 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.525310E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.012 | TFLOPs: 10.54 | 7: iteration 85110/ 173500 | consumed samples: 21788160 | consumed tokens: 44622151680 | elapsed time per iteration (s): 0.08 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.523686E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.648 | TFLOPs: 11.90 | 7: iteration 85120/ 173500 | consumed samples: 21790720 | consumed tokens: 44627394560 | elapsed time per iteration (s): 0.08 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.539634E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.304 | TFLOPs: 11.92 | 7: iteration 85130/ 173500 | consumed samples: 21793280 | consumed tokens: 44632637440 | elapsed time per iteration (s): 0.08 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.535728E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.297 | TFLOPs: 11.68 | 7: iteration 85140/ 173500 | consumed samples: 21795840 | consumed tokens: 44637880320 | elapsed time per iteration (s): 0.08 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.526517E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.302 | TFLOPs: 11.90 | 7: iteration 85150/ 173500 | consumed samples: 21798400 | consumed tokens: 44643123200 | elapsed time per iteration (s): 0.08 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 4.520837E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.668 | TFLOPs: 11.89 | 7: iteration 85160/ 173500 | consumed samples: 21800960 | consumed tokens: 44648366080 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.524794E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.170 | TFLOPs: 11.87 | 7: iteration 85170/ 173500 | consumed samples: 21803520 | consumed tokens: 44653608960 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.532228E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.092 | TFLOPs: 11.91 | 7: iteration 85180/ 173500 | consumed samples: 21806080 | consumed tokens: 44658851840 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.526662E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.318 | TFLOPs: 11.90 | 7: iteration 85190/ 173500 | consumed samples: 21808640 | consumed tokens: 44664094720 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.523380E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.805 | TFLOPs: 11.91 | 7: iteration 85200/ 173500 | consumed samples: 21811200 | consumed tokens: 44669337600 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.530837E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.267 | TFLOPs: 11.92 | 7: iteration 85210/ 173500 | consumed samples: 21813760 | consumed tokens: 44674580480 | elapsed time per iteration (s): 0.08 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 4.529088E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.339 | TFLOPs: 11.90 | 7: iteration 85220/ 173500 | consumed samples: 21816320 | consumed tokens: 44679823360 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.529296E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.591 | TFLOPs: 12.00 | 7: iteration 85230/ 173500 | consumed samples: 21818880 | consumed tokens: 44685066240 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.539677E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.367 | TFLOPs: 12.04 | 7: iteration 85240/ 173500 | consumed samples: 21821440 | consumed tokens: 44690309120 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.523305E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.862 | TFLOPs: 12.01 | 7: iteration 85250/ 173500 | consumed samples: 21824000 | consumed tokens: 44695552000 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.519835E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.137 | TFLOPs: 12.05 | 7: iteration 85260/ 173500 | consumed samples: 21826560 | consumed tokens: 44700794880 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.527980E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.778 | TFLOPs: 12.04 | 7: iteration 85270/ 173500 | consumed samples: 21829120 | consumed tokens: 44706037760 | elapsed time per iteration (s): 0.08 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 4.531396E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.642 | TFLOPs: 12.00 | 7: iteration 85280/ 173500 | consumed samples: 21831680 | consumed tokens: 44711280640 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.531062E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.228 | TFLOPs: 12.03 | 7: iteration 85290/ 173500 | consumed samples: 21834240 | consumed tokens: 44716523520 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.523590E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.149 | TFLOPs: 12.05 | 7: iteration 85300/ 173500 | consumed samples: 21836800 | consumed tokens: 44721766400 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.523478E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.705 | TFLOPs: 11.53 | 7: iteration 85310/ 173500 | consumed samples: 21839360 | consumed tokens: 44727009280 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.542159E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.572 | TFLOPs: 12.02 | 7: iteration 85320/ 173500 | consumed samples: 21841920 | consumed tokens: 44732252160 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.523818E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.452 | TFLOPs: 12.05 | 7: iteration 85330/ 173500 | consumed samples: 21844480 | consumed tokens: 44737495040 | elapsed time per iteration (s): 0.08 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 4.528699E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.907 | TFLOPs: 12.00 | 7: iteration 85340/ 173500 | consumed samples: 21847040 | consumed tokens: 44742737920 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.512483E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.202 | TFLOPs: 11.98 | 7: iteration 85350/ 173500 | consumed samples: 21849600 | consumed tokens: 44747980800 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.525520E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.295 | TFLOPs: 12.00 | 7: iteration 85360/ 173500 | consumed samples: 21852160 | consumed tokens: 44753223680 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.530095E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.732 | TFLOPs: 12.02 | 7: iteration 85370/ 173500 | consumed samples: 21854720 | consumed tokens: 44758466560 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.523052E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.643 | TFLOPs: 11.98 | 7: iteration 85380/ 173500 | consumed samples: 21857280 | consumed tokens: 44763709440 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.527040E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.355 | TFLOPs: 11.90 | 7: iteration 85390/ 173500 | consumed samples: 21859840 | consumed tokens: 44768952320 | elapsed time per iteration (s): 0.08 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 4.532919E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.459 | TFLOPs: 11.60 | 7: iteration 85400/ 173500 | consumed samples: 21862400 | consumed tokens: 44774195200 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.524532E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.229 | TFLOPs: 11.79 | 7: iteration 85410/ 173500 | consumed samples: 21864960 | consumed tokens: 44779438080 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.541622E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.395 | TFLOPs: 11.85 | 7: iteration 85420/ 173500 | consumed samples: 21867520 | consumed tokens: 44784680960 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.521906E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.180 | TFLOPs: 11.44 | 7: iteration 85430/ 173500 | consumed samples: 21870080 | consumed tokens: 44789923840 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.541007E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.547 | TFLOPs: 11.72 | 7: iteration 85440/ 173500 | consumed samples: 21872640 | consumed tokens: 44795166720 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.521442E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.172 | TFLOPs: 11.87 | 7: iteration 85450/ 173500 | consumed samples: 21875200 | consumed tokens: 44800409600 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.519788E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.447 | TFLOPs: 11.87 | 7: iteration 85460/ 173500 | consumed samples: 21877760 | consumed tokens: 44805652480 | elapsed time per iteration (s): 0.08 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 4.531138E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.839 | TFLOPs: 11.87 | 7: iteration 85470/ 173500 | consumed samples: 21880320 | consumed tokens: 44810895360 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.524403E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.619 | TFLOPs: 11.83 | 7: iteration 85480/ 173500 | consumed samples: 21882880 | consumed tokens: 44816138240 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.527527E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.893 | TFLOPs: 11.88 | 7: iteration 85490/ 173500 | consumed samples: 21885440 | consumed tokens: 44821381120 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.512295E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.444 | TFLOPs: 11.87 | 7: iteration 85500/ 173500 | consumed samples: 21888000 | consumed tokens: 44826624000 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.535151E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.058 | TFLOPs: 11.87 | 7: iteration 85510/ 173500 | consumed samples: 21890560 | consumed tokens: 44831866880 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.519839E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.836 | TFLOPs: 11.86 | 7: iteration 85520/ 173500 | consumed samples: 21893120 | consumed tokens: 44837109760 | elapsed time per iteration (s): 0.08 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 4.547336E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.061 | TFLOPs: 11.78 | 7: iteration 85530/ 173500 | consumed samples: 21895680 | consumed tokens: 44842352640 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.525877E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.117 | TFLOPs: 11.72 | 7: iteration 85540/ 173500 | consumed samples: 21898240 | consumed tokens: 44847595520 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.510400E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.532 | TFLOPs: 11.82 | 7: iteration 85550/ 173500 | consumed samples: 21900800 | consumed tokens: 44852838400 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.531408E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.991 | TFLOPs: 11.88 | 7: iteration 85560/ 173500 | consumed samples: 21903360 | consumed tokens: 44858081280 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.526376E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.650 | TFLOPs: 11.85 | 7: iteration 85570/ 173500 | consumed samples: 21905920 | consumed tokens: 44863324160 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.514890E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.208 | TFLOPs: 11.85 | 7: iteration 85580/ 173500 | consumed samples: 21908480 | consumed tokens: 44868567040 | elapsed time per iteration (s): 0.08 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 4.525027E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.430 | TFLOPs: 11.85 | 7: iteration 85590/ 173500 | consumed samples: 21911040 | consumed tokens: 44873809920 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.529081E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.126 | TFLOPs: 11.78 | 7: iteration 85600/ 173500 | consumed samples: 21913600 | consumed tokens: 44879052800 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.513198E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.979 | TFLOPs: 11.82 | 7: iteration 85610/ 173500 | consumed samples: 21916160 | consumed tokens: 44884295680 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.531700E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.678 | TFLOPs: 11.80 | 7: iteration 85620/ 173500 | consumed samples: 21918720 | consumed tokens: 44889538560 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.536945E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.896 | TFLOPs: 11.84 | 7: iteration 85630/ 173500 | consumed samples: 21921280 | consumed tokens: 44894781440 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.521112E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.150 | TFLOPs: 11.88 | 7: iteration 85640/ 173500 | consumed samples: 21923840 | consumed tokens: 44900024320 | elapsed time per iteration (s): 0.08 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 4.529852E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.651 | TFLOPs: 11.74 | 7: iteration 85650/ 173500 | consumed samples: 21926400 | consumed tokens: 44905267200 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.523483E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.760 | TFLOPs: 11.88 | 7: iteration 85660/ 173500 | consumed samples: 21928960 | consumed tokens: 44910510080 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.518081E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.331 | TFLOPs: 11.76 | 7: iteration 85670/ 173500 | consumed samples: 21931520 | consumed tokens: 44915752960 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.526683E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.948 | TFLOPs: 11.78 | 7: iteration 85680/ 173500 | consumed samples: 21934080 | consumed tokens: 44920995840 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.522263E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.571 | TFLOPs: 11.81 | 7: iteration 85690/ 173500 | consumed samples: 21936640 | consumed tokens: 44926238720 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.532869E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.183 | TFLOPs: 11.82 | 7: iteration 85700/ 173500 | consumed samples: 21939200 | consumed tokens: 44931481600 | elapsed time per iteration (s): 0.08 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 4.521366E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.443 | TFLOPs: 11.88 | 7: iteration 85710/ 173500 | consumed samples: 21941760 | consumed tokens: 44936724480 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.522135E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.992 | TFLOPs: 11.87 | 7: iteration 85720/ 173500 | consumed samples: 21944320 | consumed tokens: 44941967360 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.513144E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.612 | TFLOPs: 11.95 | 7: iteration 85730/ 173500 | consumed samples: 21946880 | consumed tokens: 44947210240 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.527917E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.432 | TFLOPs: 11.88 | 7: iteration 85740/ 173500 | consumed samples: 21949440 | consumed tokens: 44952453120 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.536348E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.277 | TFLOPs: 11.41 | 7: iteration 85750/ 173500 | consumed samples: 21952000 | consumed tokens: 44957696000 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.534937E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.576 | TFLOPs: 11.67 | 7: iteration 85760/ 173500 | consumed samples: 21954560 | consumed tokens: 44962938880 | elapsed time per iteration (s): 0.08 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 4.525709E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.625 | TFLOPs: 11.67 | 7: iteration 85770/ 173500 | consumed samples: 21957120 | consumed tokens: 44968181760 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.534248E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.244 | TFLOPs: 11.58 | 7: iteration 85780/ 173500 | consumed samples: 21959680 | consumed tokens: 44973424640 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.516430E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.096 | TFLOPs: 11.94 | 7: iteration 85790/ 173500 | consumed samples: 21962240 | consumed tokens: 44978667520 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.541500E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.773 | TFLOPs: 11.65 | 7: iteration 85800/ 173500 | consumed samples: 21964800 | consumed tokens: 44983910400 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.528185E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.669 | TFLOPs: 11.94 | 7: iteration 85810/ 173500 | consumed samples: 21967360 | consumed tokens: 44989153280 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.524393E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.485 | TFLOPs: 11.68 | 7: iteration 85820/ 173500 | consumed samples: 21969920 | consumed tokens: 44994396160 | elapsed time per iteration (s): 0.08 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 4.524571E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.301 | TFLOPs: 11.93 | 7: iteration 85830/ 173500 | consumed samples: 21972480 | consumed tokens: 44999639040 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.529151E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.812 | TFLOPs: 11.94 | 7: iteration 85840/ 173500 | consumed samples: 21975040 | consumed tokens: 45004881920 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.527522E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.213 | TFLOPs: 11.41 | 7: iteration 85850/ 173500 | consumed samples: 21977600 | consumed tokens: 45010124800 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.530443E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.306 | TFLOPs: 11.93 | 7: iteration 85860/ 173500 | consumed samples: 21980160 | consumed tokens: 45015367680 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.524921E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.571 | TFLOPs: 11.92 | 7: iteration 85870/ 173500 | consumed samples: 21982720 | consumed tokens: 45020610560 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.527497E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.366 | TFLOPs: 11.40 | 7: iteration 85880/ 173500 | consumed samples: 21985280 | consumed tokens: 45025853440 | elapsed time per iteration (s): 0.08 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 4.525985E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.874 | TFLOPs: 11.63 | 7: iteration 85890/ 173500 | consumed samples: 21987840 | consumed tokens: 45031096320 | elapsed time per iteration (s): 0.08 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.522013E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.759 | TFLOPs: 11.64 | 7: iteration 85900/ 173500 | consumed samples: 21990400 | consumed tokens: 45036339200 | elapsed time per iteration (s): 0.08 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.532715E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.864 | TFLOPs: 11.95 | 7: iteration 85910/ 173500 | consumed samples: 21992960 | consumed tokens: 45041582080 | elapsed time per iteration (s): 0.08 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.519884E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.628 | TFLOPs: 11.95 | 7: iteration 85920/ 173500 | consumed samples: 21995520 | consumed tokens: 45046824960 | elapsed time per iteration (s): 0.08 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.522732E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.192 | TFLOPs: 11.66 | 7: iteration 85930/ 173500 | consumed samples: 21998080 | consumed tokens: 45052067840 | elapsed time per iteration (s): 0.10 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.529494E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2603.170 | TFLOPs: 9.68 | 7: iteration 85940/ 173500 | consumed samples: 22000640 | consumed tokens: 45057310720 | elapsed time per iteration (s): 0.12 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 4.531311E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.928 | TFLOPs: 7.76 | 7: iteration 85950/ 173500 | consumed samples: 22003200 | consumed tokens: 45062553600 | elapsed time per iteration (s): 0.10 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.521765E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2489.330 | TFLOPs: 9.26 | 7: iteration 85960/ 173500 | consumed samples: 22005760 | consumed tokens: 45067796480 | elapsed time per iteration (s): 0.08 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.522393E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.414 | TFLOPs: 11.40 | 7: iteration 85970/ 173500 | consumed samples: 22008320 | consumed tokens: 45073039360 | elapsed time per iteration (s): 0.08 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.520930E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.010 | TFLOPs: 11.64 | 7: iteration 85980/ 173500 | consumed samples: 22010880 | consumed tokens: 45078282240 | elapsed time per iteration (s): 0.08 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.520167E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.479 | TFLOPs: 11.35 | 7: iteration 85990/ 173500 | consumed samples: 22013440 | consumed tokens: 45083525120 | elapsed time per iteration (s): 0.08 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.532641E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.026 | TFLOPs: 11.71 | 0: [2023-03-17 02:20:03,060] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=0, lr=[0.0001126626417003261, 0.0001126626417003261, 0.0001126626417003261], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 86000/ 173500 | consumed samples: 22016000 | consumed tokens: 45088768000 | elapsed time per iteration (s): 0.08 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 4.527094E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.481 | TFLOPs: 11.92 | 0: steps: 86000 loss: 4.5327 iter time (s): 0.083 samples/sec: 3095.843 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 86000 | lm loss value: 4.394504E+00 | lm loss PPL: 8.100441E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 86000 to checkpoints_14m91b100m 0: [2023-03-17 02:20:03,117] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step86000 is begin to save! 0: [2023-03-17 02:20:03,120] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:20:03,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:20:03,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:20:03,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:20:03,150] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:20:03,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:20:03,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:20:03,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:20:03,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:20:03,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:20:03,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:20:03,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:20:03,160] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step86000/mp_rank_00_model_states.pt 0: [2023-03-17 02:20:03,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:20:03,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:20:03,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:20:03,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:20:03,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 2: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 02:20:03,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2023-03-17 02:20:03,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: successfully saved checkpoint at iteration 86000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.51 7: iteration 86010/ 173500 | consumed samples: 22018560 | consumed tokens: 45094010880 | elapsed time per iteration (s): 0.09 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.520349E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.521 | TFLOPs: 10.22 | 7: iteration 86020/ 173500 | consumed samples: 22021120 | consumed tokens: 45099253760 | elapsed time per iteration (s): 0.08 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.541750E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.605 | TFLOPs: 11.63 | 7: iteration 86030/ 173500 | consumed samples: 22023680 | consumed tokens: 45104496640 | elapsed time per iteration (s): 0.08 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.515766E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.076 | TFLOPs: 11.65 | 7: iteration 86040/ 173500 | consumed samples: 22026240 | consumed tokens: 45109739520 | elapsed time per iteration (s): 0.08 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.524529E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.867 | TFLOPs: 11.63 | 7: iteration 86050/ 173500 | consumed samples: 22028800 | consumed tokens: 45114982400 | elapsed time per iteration (s): 0.08 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.527117E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.241 | TFLOPs: 11.88 | 7: iteration 86060/ 173500 | consumed samples: 22031360 | consumed tokens: 45120225280 | elapsed time per iteration (s): 0.08 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 4.529167E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.941 | TFLOPs: 11.39 | 7: iteration 86070/ 173500 | consumed samples: 22033920 | consumed tokens: 45125468160 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.529072E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.586 | TFLOPs: 11.88 | 7: iteration 86080/ 173500 | consumed samples: 22036480 | consumed tokens: 45130711040 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.529616E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.617 | TFLOPs: 11.89 | 7: iteration 86090/ 173500 | consumed samples: 22039040 | consumed tokens: 45135953920 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.526083E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.739 | TFLOPs: 11.86 | 7: iteration 86100/ 173500 | consumed samples: 22041600 | consumed tokens: 45141196800 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.531449E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.422 | TFLOPs: 11.83 | 7: iteration 86110/ 173500 | consumed samples: 22044160 | consumed tokens: 45146439680 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.539833E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.230 | TFLOPs: 11.85 | 7: iteration 86120/ 173500 | consumed samples: 22046720 | consumed tokens: 45151682560 | elapsed time per iteration (s): 0.08 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 4.530743E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.632 | TFLOPs: 11.85 | 7: iteration 86130/ 173500 | consumed samples: 22049280 | consumed tokens: 45156925440 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.528805E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.793 | TFLOPs: 11.81 | 7: iteration 86140/ 173500 | consumed samples: 22051840 | consumed tokens: 45162168320 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.513890E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.668 | TFLOPs: 11.81 | 7: iteration 86150/ 173500 | consumed samples: 22054400 | consumed tokens: 45167411200 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.526718E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.666 | TFLOPs: 11.76 | 7: iteration 86160/ 173500 | consumed samples: 22056960 | consumed tokens: 45172654080 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.530441E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.841 | TFLOPs: 11.85 | 7: iteration 86170/ 173500 | consumed samples: 22059520 | consumed tokens: 45177896960 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.527163E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.583 | TFLOPs: 11.82 | 7: iteration 86180/ 173500 | consumed samples: 22062080 | consumed tokens: 45183139840 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.522270E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.169 | TFLOPs: 11.85 | 7: iteration 86190/ 173500 | consumed samples: 22064640 | consumed tokens: 45188382720 | elapsed time per iteration (s): 0.08 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 4.542801E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.639 | TFLOPs: 11.86 | 7: iteration 86200/ 173500 | consumed samples: 22067200 | consumed tokens: 45193625600 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.533775E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.302 | TFLOPs: 11.88 | 7: iteration 86210/ 173500 | consumed samples: 22069760 | consumed tokens: 45198868480 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.526222E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.718 | TFLOPs: 11.91 | 7: iteration 86220/ 173500 | consumed samples: 22072320 | consumed tokens: 45204111360 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.525598E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.770 | TFLOPs: 11.81 | 7: iteration 86230/ 173500 | consumed samples: 22074880 | consumed tokens: 45209354240 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.534129E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.080 | TFLOPs: 11.88 | 7: iteration 86240/ 173500 | consumed samples: 22077440 | consumed tokens: 45214597120 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.524799E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.160 | TFLOPs: 11.86 | 7: iteration 86250/ 173500 | consumed samples: 22080000 | consumed tokens: 45219840000 | elapsed time per iteration (s): 0.08 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 4.529859E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.883 | TFLOPs: 11.33 | 7: iteration 86260/ 173500 | consumed samples: 22082560 | consumed tokens: 45225082880 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.522594E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.363 | TFLOPs: 11.81 | 7: iteration 86270/ 173500 | consumed samples: 22085120 | consumed tokens: 45230325760 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.530138E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.866 | TFLOPs: 11.88 | 7: iteration 86280/ 173500 | consumed samples: 22087680 | consumed tokens: 45235568640 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.517053E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.408 | TFLOPs: 11.91 | 7: iteration 86290/ 173500 | consumed samples: 22090240 | consumed tokens: 45240811520 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.523515E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.337 | TFLOPs: 11.86 | 7: iteration 86300/ 173500 | consumed samples: 22092800 | consumed tokens: 45246054400 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.526804E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.070 | TFLOPs: 11.87 | 7: iteration 86310/ 173500 | consumed samples: 22095360 | consumed tokens: 45251297280 | elapsed time per iteration (s): 0.08 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 4.528107E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.016 | TFLOPs: 11.89 | 7: iteration 86320/ 173500 | consumed samples: 22097920 | consumed tokens: 45256540160 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.523057E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.839 | TFLOPs: 11.82 | 7: iteration 86330/ 173500 | consumed samples: 22100480 | consumed tokens: 45261783040 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.535103E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.617 | TFLOPs: 11.88 | 7: iteration 86340/ 173500 | consumed samples: 22103040 | consumed tokens: 45267025920 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.532135E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.810 | TFLOPs: 11.89 | 7: iteration 86350/ 173500 | consumed samples: 22105600 | consumed tokens: 45272268800 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.523840E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.648 | TFLOPs: 11.86 | 7: iteration 86360/ 173500 | consumed samples: 22108160 | consumed tokens: 45277511680 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.529150E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.010 | TFLOPs: 11.87 | 7: iteration 86370/ 173500 | consumed samples: 22110720 | consumed tokens: 45282754560 | elapsed time per iteration (s): 0.08 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 4.527004E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.184 | TFLOPs: 11.90 | 7: iteration 86380/ 173500 | consumed samples: 22113280 | consumed tokens: 45287997440 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.533472E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.184 | TFLOPs: 11.89 | 7: iteration 86390/ 173500 | consumed samples: 22115840 | consumed tokens: 45293240320 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.526859E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.544 | TFLOPs: 11.89 | 7: iteration 86400/ 173500 | consumed samples: 22118400 | consumed tokens: 45298483200 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.531516E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.984 | TFLOPs: 11.83 | 7: iteration 86410/ 173500 | consumed samples: 22120960 | consumed tokens: 45303726080 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.519930E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.215 | TFLOPs: 11.88 | 7: iteration 86420/ 173500 | consumed samples: 22123520 | consumed tokens: 45308968960 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.535055E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.026 | TFLOPs: 11.90 | 7: iteration 86430/ 173500 | consumed samples: 22126080 | consumed tokens: 45314211840 | elapsed time per iteration (s): 0.08 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 4.534915E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.660 | TFLOPs: 11.91 | 7: iteration 86440/ 173500 | consumed samples: 22128640 | consumed tokens: 45319454720 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.533337E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.451 | TFLOPs: 11.90 | 7: iteration 86450/ 173500 | consumed samples: 22131200 | consumed tokens: 45324697600 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.526036E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.795 | TFLOPs: 11.86 | 7: iteration 86460/ 173500 | consumed samples: 22133760 | consumed tokens: 45329940480 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.536010E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.561 | TFLOPs: 11.88 | 7: iteration 86470/ 173500 | consumed samples: 22136320 | consumed tokens: 45335183360 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.523313E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.860 | TFLOPs: 11.91 | 7: iteration 86480/ 173500 | consumed samples: 22138880 | consumed tokens: 45340426240 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.521446E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.780 | TFLOPs: 11.91 | 7: iteration 86490/ 173500 | consumed samples: 22141440 | consumed tokens: 45345669120 | elapsed time per iteration (s): 0.08 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 4.533257E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.885 | TFLOPs: 11.90 | 7: iteration 86500/ 173500 | consumed samples: 22144000 | consumed tokens: 45350912000 | elapsed time per iteration (s): 0.11 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.510233E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.075 | TFLOPs: 9.02 | 7: iteration 86510/ 173500 | consumed samples: 22146560 | consumed tokens: 45356154880 | elapsed time per iteration (s): 0.10 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.525893E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2453.072 | TFLOPs: 9.12 | 7: iteration 86520/ 173500 | consumed samples: 22149120 | consumed tokens: 45361397760 | elapsed time per iteration (s): 0.08 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.539904E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.719 | TFLOPs: 11.81 | 7: iteration 86530/ 173500 | consumed samples: 22151680 | consumed tokens: 45366640640 | elapsed time per iteration (s): 0.08 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.518678E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.161 | TFLOPs: 11.92 | 7: iteration 86540/ 173500 | consumed samples: 22154240 | consumed tokens: 45371883520 | elapsed time per iteration (s): 0.08 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.525694E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.092 | TFLOPs: 11.90 | 7: iteration 86550/ 173500 | consumed samples: 22156800 | consumed tokens: 45377126400 | elapsed time per iteration (s): 0.08 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 4.533952E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.326 | TFLOPs: 11.91 | 7: iteration 86560/ 173500 | consumed samples: 22159360 | consumed tokens: 45382369280 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.527891E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.164 | TFLOPs: 11.90 | 7: iteration 86570/ 173500 | consumed samples: 22161920 | consumed tokens: 45387612160 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.520359E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.549 | TFLOPs: 11.96 | 7: iteration 86580/ 173500 | consumed samples: 22164480 | consumed tokens: 45392855040 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.524986E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.810 | TFLOPs: 11.95 | 7: iteration 86590/ 173500 | consumed samples: 22167040 | consumed tokens: 45398097920 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.530610E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.109 | TFLOPs: 11.94 | 7: iteration 86600/ 173500 | consumed samples: 22169600 | consumed tokens: 45403340800 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.513959E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.273 | TFLOPs: 11.83 | 7: iteration 86610/ 173500 | consumed samples: 22172160 | consumed tokens: 45408583680 | elapsed time per iteration (s): 0.08 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 4.519043E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.382 | TFLOPs: 11.93 | 7: iteration 86620/ 173500 | consumed samples: 22174720 | consumed tokens: 45413826560 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.534575E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.542 | TFLOPs: 11.90 | 7: iteration 86630/ 173500 | consumed samples: 22177280 | consumed tokens: 45419069440 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.521646E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.933 | TFLOPs: 11.93 | 7: iteration 86640/ 173500 | consumed samples: 22179840 | consumed tokens: 45424312320 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.522232E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.566 | TFLOPs: 11.89 | 7: iteration 86650/ 173500 | consumed samples: 22182400 | consumed tokens: 45429555200 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.529856E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.661 | TFLOPs: 11.89 | 7: iteration 86660/ 173500 | consumed samples: 22184960 | consumed tokens: 45434798080 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.533950E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.233 | TFLOPs: 11.91 | 7: iteration 86670/ 173500 | consumed samples: 22187520 | consumed tokens: 45440040960 | elapsed time per iteration (s): 0.08 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 4.535307E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.200 | TFLOPs: 11.90 | 7: iteration 86680/ 173500 | consumed samples: 22190080 | consumed tokens: 45445283840 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.531152E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.591 | TFLOPs: 11.90 | 7: iteration 86690/ 173500 | consumed samples: 22192640 | consumed tokens: 45450526720 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.530759E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.948 | TFLOPs: 11.93 | 7: iteration 86700/ 173500 | consumed samples: 22195200 | consumed tokens: 45455769600 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.515097E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.320 | TFLOPs: 11.94 | 7: iteration 86710/ 173500 | consumed samples: 22197760 | consumed tokens: 45461012480 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.519822E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.056 | TFLOPs: 11.93 | 7: iteration 86720/ 173500 | consumed samples: 22200320 | consumed tokens: 45466255360 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.510521E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.632 | TFLOPs: 11.95 | 7: iteration 86730/ 173500 | consumed samples: 22202880 | consumed tokens: 45471498240 | elapsed time per iteration (s): 0.08 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 4.536115E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.136 | TFLOPs: 11.49 | 7: iteration 86740/ 173500 | consumed samples: 22205440 | consumed tokens: 45476741120 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.518921E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.950 | TFLOPs: 11.68 | 7: iteration 86750/ 173500 | consumed samples: 22208000 | consumed tokens: 45481984000 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.523476E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.162 | TFLOPs: 11.88 | 7: iteration 86760/ 173500 | consumed samples: 22210560 | consumed tokens: 45487226880 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.530999E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.319 | TFLOPs: 11.87 | 7: iteration 86770/ 173500 | consumed samples: 22213120 | consumed tokens: 45492469760 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.517931E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.537 | TFLOPs: 11.27 | 7: iteration 86780/ 173500 | consumed samples: 22215680 | consumed tokens: 45497712640 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.529967E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.237 | TFLOPs: 11.90 | 7: iteration 86790/ 173500 | consumed samples: 22218240 | consumed tokens: 45502955520 | elapsed time per iteration (s): 0.08 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 4.542493E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.177 | TFLOPs: 11.37 | 7: iteration 86800/ 173500 | consumed samples: 22220800 | consumed tokens: 45508198400 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.520719E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.004 | TFLOPs: 11.37 | 7: iteration 86810/ 173500 | consumed samples: 22223360 | consumed tokens: 45513441280 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.524694E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.461 | TFLOPs: 11.89 | 7: iteration 86820/ 173500 | consumed samples: 22225920 | consumed tokens: 45518684160 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.528077E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.186 | TFLOPs: 11.81 | 7: iteration 86830/ 173500 | consumed samples: 22228480 | consumed tokens: 45523927040 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.520077E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.958 | TFLOPs: 11.64 | 7: iteration 86840/ 173500 | consumed samples: 22231040 | consumed tokens: 45529169920 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.521746E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.256 | TFLOPs: 11.92 | 7: iteration 86850/ 173500 | consumed samples: 22233600 | consumed tokens: 45534412800 | elapsed time per iteration (s): 0.08 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 4.526692E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.806 | TFLOPs: 11.91 | 7: iteration 86860/ 173500 | consumed samples: 22236160 | consumed tokens: 45539655680 | elapsed time per iteration (s): 0.08 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.529360E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.456 | TFLOPs: 11.91 | 7: iteration 86870/ 173500 | consumed samples: 22238720 | consumed tokens: 45544898560 | elapsed time per iteration (s): 0.09 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.531320E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.097 | TFLOPs: 11.19 | 7: iteration 86880/ 173500 | consumed samples: 22241280 | consumed tokens: 45550141440 | elapsed time per iteration (s): 0.08 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.523243E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.194 | TFLOPs: 11.91 | 7: iteration 86890/ 173500 | consumed samples: 22243840 | consumed tokens: 45555384320 | elapsed time per iteration (s): 0.08 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.528210E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.288 | TFLOPs: 11.54 | 7: iteration 86900/ 173500 | consumed samples: 22246400 | consumed tokens: 45560627200 | elapsed time per iteration (s): 0.08 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.528593E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.778 | TFLOPs: 11.83 | 7: iteration 86910/ 173500 | consumed samples: 22248960 | consumed tokens: 45565870080 | elapsed time per iteration (s): 0.08 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 4.520692E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.528 | TFLOPs: 11.83 | 7: iteration 86920/ 173500 | consumed samples: 22251520 | consumed tokens: 45571112960 | elapsed time per iteration (s): 0.08 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.509209E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.044 | TFLOPs: 11.88 | 7: iteration 86930/ 173500 | consumed samples: 22254080 | consumed tokens: 45576355840 | elapsed time per iteration (s): 0.08 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.521183E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.245 | TFLOPs: 11.89 | 7: iteration 86940/ 173500 | consumed samples: 22256640 | consumed tokens: 45581598720 | elapsed time per iteration (s): 0.09 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.528466E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.774 | TFLOPs: 10.61 | 7: iteration 86950/ 173500 | consumed samples: 22259200 | consumed tokens: 45586841600 | elapsed time per iteration (s): 0.09 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.530825E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.170 | TFLOPs: 10.64 | 7: iteration 86960/ 173500 | consumed samples: 22261760 | consumed tokens: 45592084480 | elapsed time per iteration (s): 0.08 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.524494E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.017 | TFLOPs: 11.69 | 7: iteration 86970/ 173500 | consumed samples: 22264320 | consumed tokens: 45597327360 | elapsed time per iteration (s): 0.08 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.529737E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.320 | TFLOPs: 11.55 | 7: iteration 86980/ 173500 | consumed samples: 22266880 | consumed tokens: 45602570240 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.521056E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.206 | TFLOPs: 11.80 | 7: iteration 86990/ 173500 | consumed samples: 22269440 | consumed tokens: 45607813120 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.528389E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.399 | TFLOPs: 11.60 | 7: iteration 87000/ 173500 | consumed samples: 22272000 | consumed tokens: 45613056000 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.535798E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.811 | TFLOPs: 11.82 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 87000 | lm loss value: 4.444090E+00 | lm loss PPL: 8.512241E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 87000 to checkpoints_14m91b100m 0: [2023-03-17 02:21:24,500] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step87000 is begin to save! 0: [2023-03-17 02:21:24,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:21:24,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:21:24,529] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:21:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:21:24,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:21:24,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:21:24,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:21:24,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:21:24,538] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:21:24,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:21:24,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:21:24,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:21:24,541] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step87000/mp_rank_00_model_states.pt 0: [2023-03-17 02:21:24,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:21:24,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:21:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:21:24,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:21:24,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:21:24,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:21:24,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: successfully saved checkpoint at iteration 87000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.75 7: iteration 87010/ 173500 | consumed samples: 22274560 | consumed tokens: 45618298880 | elapsed time per iteration (s): 0.09 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.538759E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.443 | TFLOPs: 10.41 | 7: iteration 87020/ 173500 | consumed samples: 22277120 | consumed tokens: 45623541760 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.529383E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.258 | TFLOPs: 11.65 | 7: iteration 87030/ 173500 | consumed samples: 22279680 | consumed tokens: 45628784640 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.526603E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.871 | TFLOPs: 11.42 | 7: iteration 87040/ 173500 | consumed samples: 22282240 | consumed tokens: 45634027520 | elapsed time per iteration (s): 0.08 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 4.536636E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.470 | TFLOPs: 11.30 | 7: iteration 87050/ 173500 | consumed samples: 22284800 | consumed tokens: 45639270400 | elapsed time per iteration (s): 0.08 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.534720E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.739 | TFLOPs: 11.37 | 7: iteration 87060/ 173500 | consumed samples: 22287360 | consumed tokens: 45644513280 | elapsed time per iteration (s): 0.09 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.524738E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.859 | TFLOPs: 10.54 | 7: iteration 87070/ 173500 | consumed samples: 22289920 | consumed tokens: 45649756160 | elapsed time per iteration (s): 0.08 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.533221E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.941 | TFLOPs: 11.37 | 7: iteration 87080/ 173500 | consumed samples: 22292480 | consumed tokens: 45654999040 | elapsed time per iteration (s): 0.09 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.533968E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.677 | TFLOPs: 10.41 | 7: iteration 87090/ 173500 | consumed samples: 22295040 | consumed tokens: 45660241920 | elapsed time per iteration (s): 0.08 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.514979E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.308 | TFLOPs: 11.72 | 7: iteration 87100/ 173500 | consumed samples: 22297600 | consumed tokens: 45665484800 | elapsed time per iteration (s): 0.08 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 4.520748E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.336 | TFLOPs: 11.96 | 7: iteration 87110/ 173500 | consumed samples: 22300160 | consumed tokens: 45670727680 | elapsed time per iteration (s): 0.08 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.526963E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.349 | TFLOPs: 11.78 | 7: iteration 87120/ 173500 | consumed samples: 22302720 | consumed tokens: 45675970560 | elapsed time per iteration (s): 0.09 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.520790E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.790 | TFLOPs: 10.43 | 7: iteration 87130/ 173500 | consumed samples: 22305280 | consumed tokens: 45681213440 | elapsed time per iteration (s): 0.08 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.527888E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.932 | TFLOPs: 11.63 | 7: iteration 87140/ 173500 | consumed samples: 22307840 | consumed tokens: 45686456320 | elapsed time per iteration (s): 0.08 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.526598E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.168 | TFLOPs: 11.89 | 7: iteration 87150/ 173500 | consumed samples: 22310400 | consumed tokens: 45691699200 | elapsed time per iteration (s): 0.08 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.524580E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.499 | TFLOPs: 11.87 | 7: iteration 87160/ 173500 | consumed samples: 22312960 | consumed tokens: 45696942080 | elapsed time per iteration (s): 0.08 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 4.533928E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.919 | TFLOPs: 11.87 | 7: iteration 87170/ 173500 | consumed samples: 22315520 | consumed tokens: 45702184960 | elapsed time per iteration (s): 0.08 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.521578E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.496 | TFLOPs: 11.84 | 7: iteration 87180/ 173500 | consumed samples: 22318080 | consumed tokens: 45707427840 | elapsed time per iteration (s): 0.09 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.525405E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2933.243 | TFLOPs: 10.91 | 7: iteration 87190/ 173500 | consumed samples: 22320640 | consumed tokens: 45712670720 | elapsed time per iteration (s): 0.09 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.525408E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2929.802 | TFLOPs: 10.90 | 7: iteration 87200/ 173500 | consumed samples: 22323200 | consumed tokens: 45717913600 | elapsed time per iteration (s): 0.08 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.517920E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.029 | TFLOPs: 11.21 | 7: iteration 87210/ 173500 | consumed samples: 22325760 | consumed tokens: 45723156480 | elapsed time per iteration (s): 0.08 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.523532E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.961 | TFLOPs: 11.62 | 7: iteration 87220/ 173500 | consumed samples: 22328320 | consumed tokens: 45728399360 | elapsed time per iteration (s): 0.08 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 4.517762E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.796 | TFLOPs: 11.79 | 7: iteration 87230/ 173500 | consumed samples: 22330880 | consumed tokens: 45733642240 | elapsed time per iteration (s): 0.08 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.525255E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.081 | TFLOPs: 11.55 | 7: iteration 87240/ 173500 | consumed samples: 22333440 | consumed tokens: 45738885120 | elapsed time per iteration (s): 0.08 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.537661E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.648 | TFLOPs: 11.81 | 7: iteration 87250/ 173500 | consumed samples: 22336000 | consumed tokens: 45744128000 | elapsed time per iteration (s): 0.08 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.533310E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.235 | TFLOPs: 11.86 | 7: iteration 87260/ 173500 | consumed samples: 22338560 | consumed tokens: 45749370880 | elapsed time per iteration (s): 0.27 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.526923E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 935.138 | TFLOPs: 3.48 | 7: iteration 87270/ 173500 | consumed samples: 22341120 | consumed tokens: 45754613760 | elapsed time per iteration (s): 0.08 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.515571E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.108 | TFLOPs: 11.68 | 7: iteration 87280/ 173500 | consumed samples: 22343680 | consumed tokens: 45759856640 | elapsed time per iteration (s): 0.08 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 4.526200E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.042 | TFLOPs: 11.83 | 7: iteration 87290/ 173500 | consumed samples: 22346240 | consumed tokens: 45765099520 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.526432E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.568 | TFLOPs: 11.91 | 7: iteration 87300/ 173500 | consumed samples: 22348800 | consumed tokens: 45770342400 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.528868E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.577 | TFLOPs: 11.92 | 7: iteration 87310/ 173500 | consumed samples: 22351360 | consumed tokens: 45775585280 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.541817E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.855 | TFLOPs: 11.88 | 7: iteration 87320/ 173500 | consumed samples: 22353920 | consumed tokens: 45780828160 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.534916E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.413 | TFLOPs: 11.89 | 7: iteration 87330/ 173500 | consumed samples: 22356480 | consumed tokens: 45786071040 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.538427E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.831 | TFLOPs: 11.82 | 7: iteration 87340/ 173500 | consumed samples: 22359040 | consumed tokens: 45791313920 | elapsed time per iteration (s): 0.08 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 4.519046E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.005 | TFLOPs: 11.88 | 7: iteration 87350/ 173500 | consumed samples: 22361600 | consumed tokens: 45796556800 | elapsed time per iteration (s): 0.09 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.519746E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.102 | TFLOPs: 11.11 | 7: iteration 87360/ 173500 | consumed samples: 22364160 | consumed tokens: 45801799680 | elapsed time per iteration (s): 0.08 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.520232E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.849 | TFLOPs: 11.84 | 7: iteration 87370/ 173500 | consumed samples: 22366720 | consumed tokens: 45807042560 | elapsed time per iteration (s): 0.08 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.536272E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.429 | TFLOPs: 11.86 | 7: iteration 87380/ 173500 | consumed samples: 22369280 | consumed tokens: 45812285440 | elapsed time per iteration (s): 0.08 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.535604E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.692 | TFLOPs: 11.86 | 7: iteration 87390/ 173500 | consumed samples: 22371840 | consumed tokens: 45817528320 | elapsed time per iteration (s): 0.08 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.520647E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.225 | TFLOPs: 11.63 | 7: iteration 87400/ 173500 | consumed samples: 22374400 | consumed tokens: 45822771200 | elapsed time per iteration (s): 0.09 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 4.522244E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2819.103 | TFLOPs: 10.49 | 7: iteration 87410/ 173500 | consumed samples: 22376960 | consumed tokens: 45828014080 | elapsed time per iteration (s): 0.09 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.528829E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.186 | TFLOPs: 10.56 | 7: iteration 87420/ 173500 | consumed samples: 22379520 | consumed tokens: 45833256960 | elapsed time per iteration (s): 0.08 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.516541E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.566 | TFLOPs: 11.32 | 7: iteration 87430/ 173500 | consumed samples: 22382080 | consumed tokens: 45838499840 | elapsed time per iteration (s): 0.09 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.513910E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.972 | TFLOPs: 11.09 | 7: iteration 87440/ 173500 | consumed samples: 22384640 | consumed tokens: 45843742720 | elapsed time per iteration (s): 0.08 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.522976E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.441 | TFLOPs: 11.35 | 7: iteration 87450/ 173500 | consumed samples: 22387200 | consumed tokens: 45848985600 | elapsed time per iteration (s): 0.08 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.530438E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.956 | TFLOPs: 11.60 | 7: iteration 87460/ 173500 | consumed samples: 22389760 | consumed tokens: 45854228480 | elapsed time per iteration (s): 0.09 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 4.512707E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.109 | TFLOPs: 10.69 | 7: iteration 87470/ 173500 | consumed samples: 22392320 | consumed tokens: 45859471360 | elapsed time per iteration (s): 0.08 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.518826E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.895 | TFLOPs: 11.88 | 7: iteration 87480/ 173500 | consumed samples: 22394880 | consumed tokens: 45864714240 | elapsed time per iteration (s): 0.08 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.527336E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.448 | TFLOPs: 11.57 | 7: iteration 87490/ 173500 | consumed samples: 22397440 | consumed tokens: 45869957120 | elapsed time per iteration (s): 0.09 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.522296E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.978 | TFLOPs: 11.13 | 7: iteration 87500/ 173500 | consumed samples: 22400000 | consumed tokens: 45875200000 | elapsed time per iteration (s): 0.08 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.521857E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.524 | TFLOPs: 11.62 | 7: iteration 87510/ 173500 | consumed samples: 22402560 | consumed tokens: 45880442880 | elapsed time per iteration (s): 0.08 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.522064E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.858 | TFLOPs: 11.63 | 7: iteration 87520/ 173500 | consumed samples: 22405120 | consumed tokens: 45885685760 | elapsed time per iteration (s): 0.08 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 4.524105E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.672 | TFLOPs: 11.86 | 7: iteration 87530/ 173500 | consumed samples: 22407680 | consumed tokens: 45890928640 | elapsed time per iteration (s): 0.08 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.526571E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.145 | TFLOPs: 11.59 | 7: iteration 87540/ 173500 | consumed samples: 22410240 | consumed tokens: 45896171520 | elapsed time per iteration (s): 0.08 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.536526E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.926 | TFLOPs: 11.36 | 7: iteration 87550/ 173500 | consumed samples: 22412800 | consumed tokens: 45901414400 | elapsed time per iteration (s): 0.08 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.517142E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.062 | TFLOPs: 11.60 | 7: iteration 87560/ 173500 | consumed samples: 22415360 | consumed tokens: 45906657280 | elapsed time per iteration (s): 0.08 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.516739E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.796 | TFLOPs: 11.88 | 7: iteration 87570/ 173500 | consumed samples: 22417920 | consumed tokens: 45911900160 | elapsed time per iteration (s): 0.08 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.526629E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.306 | TFLOPs: 11.67 | 7: iteration 87580/ 173500 | consumed samples: 22420480 | consumed tokens: 45917143040 | elapsed time per iteration (s): 0.09 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 4.522475E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.827 | TFLOPs: 11.16 | 7: iteration 87590/ 173500 | consumed samples: 22423040 | consumed tokens: 45922385920 | elapsed time per iteration (s): 0.08 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.527273E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.006 | TFLOPs: 11.72 | 7: iteration 87600/ 173500 | consumed samples: 22425600 | consumed tokens: 45927628800 | elapsed time per iteration (s): 0.10 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.526979E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2486.866 | TFLOPs: 9.25 | 7: iteration 87610/ 173500 | consumed samples: 22428160 | consumed tokens: 45932871680 | elapsed time per iteration (s): 0.10 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.527080E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2569.643 | TFLOPs: 9.56 | 7: iteration 87620/ 173500 | consumed samples: 22430720 | consumed tokens: 45938114560 | elapsed time per iteration (s): 0.09 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.524089E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.076 | TFLOPs: 11.14 | 7: iteration 87630/ 173500 | consumed samples: 22433280 | consumed tokens: 45943357440 | elapsed time per iteration (s): 0.08 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.520771E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.993 | TFLOPs: 11.36 | 7: iteration 87640/ 173500 | consumed samples: 22435840 | consumed tokens: 45948600320 | elapsed time per iteration (s): 0.08 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 4.524465E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.700 | TFLOPs: 11.62 | 7: iteration 87650/ 173500 | consumed samples: 22438400 | consumed tokens: 45953843200 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.522790E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.641 | TFLOPs: 11.56 | 7: iteration 87660/ 173500 | consumed samples: 22440960 | consumed tokens: 45959086080 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.522691E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.093 | TFLOPs: 11.85 | 7: iteration 87670/ 173500 | consumed samples: 22443520 | consumed tokens: 45964328960 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.533138E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.059 | TFLOPs: 11.61 | 7: iteration 87680/ 173500 | consumed samples: 22446080 | consumed tokens: 45969571840 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.525913E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.551 | TFLOPs: 11.89 | 7: iteration 87690/ 173500 | consumed samples: 22448640 | consumed tokens: 45974814720 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.515911E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.844 | TFLOPs: 11.84 | 7: iteration 87700/ 173500 | consumed samples: 22451200 | consumed tokens: 45980057600 | elapsed time per iteration (s): 0.08 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 4.537872E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.606 | TFLOPs: 11.84 | 7: iteration 87710/ 173500 | consumed samples: 22453760 | consumed tokens: 45985300480 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.532831E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.501 | TFLOPs: 11.58 | 7: iteration 87720/ 173500 | consumed samples: 22456320 | consumed tokens: 45990543360 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.529844E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.475 | TFLOPs: 11.55 | 7: iteration 87730/ 173500 | consumed samples: 22458880 | consumed tokens: 45995786240 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.529871E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.346 | TFLOPs: 11.92 | 7: iteration 87740/ 173500 | consumed samples: 22461440 | consumed tokens: 46001029120 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.527625E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.555 | TFLOPs: 11.54 | 7: iteration 87750/ 173500 | consumed samples: 22464000 | consumed tokens: 46006272000 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.532151E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.702 | TFLOPs: 11.83 | 7: iteration 87760/ 173500 | consumed samples: 22466560 | consumed tokens: 46011514880 | elapsed time per iteration (s): 0.08 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 4.529274E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.947 | TFLOPs: 11.63 | 7: iteration 87770/ 173500 | consumed samples: 22469120 | consumed tokens: 46016757760 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.534529E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.990 | TFLOPs: 11.92 | 7: iteration 87780/ 173500 | consumed samples: 22471680 | consumed tokens: 46022000640 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.525952E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.796 | TFLOPs: 11.85 | 7: iteration 87790/ 173500 | consumed samples: 22474240 | consumed tokens: 46027243520 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.523177E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.974 | TFLOPs: 11.83 | 7: iteration 87800/ 173500 | consumed samples: 22476800 | consumed tokens: 46032486400 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.520029E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.827 | TFLOPs: 11.98 | 7: iteration 87810/ 173500 | consumed samples: 22479360 | consumed tokens: 46037729280 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.517786E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.236 | TFLOPs: 11.61 | 7: iteration 87820/ 173500 | consumed samples: 22481920 | consumed tokens: 46042972160 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.521564E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.922 | TFLOPs: 11.99 | 7: iteration 87830/ 173500 | consumed samples: 22484480 | consumed tokens: 46048215040 | elapsed time per iteration (s): 0.08 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 4.522685E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.504 | TFLOPs: 11.43 | 7: iteration 87840/ 173500 | consumed samples: 22487040 | consumed tokens: 46053457920 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.516643E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.026 | TFLOPs: 12.00 | 7: iteration 87850/ 173500 | consumed samples: 22489600 | consumed tokens: 46058700800 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.514151E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.098 | TFLOPs: 11.74 | 7: iteration 87860/ 173500 | consumed samples: 22492160 | consumed tokens: 46063943680 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.520208E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.206 | TFLOPs: 11.67 | 7: iteration 87870/ 173500 | consumed samples: 22494720 | consumed tokens: 46069186560 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.520405E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.971 | TFLOPs: 11.92 | 7: iteration 87880/ 173500 | consumed samples: 22497280 | consumed tokens: 46074429440 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.526311E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.383 | TFLOPs: 11.87 | 7: iteration 87890/ 173500 | consumed samples: 22499840 | consumed tokens: 46079672320 | elapsed time per iteration (s): 0.08 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 4.537070E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.096 | TFLOPs: 11.46 | 7: iteration 87900/ 173500 | consumed samples: 22502400 | consumed tokens: 46084915200 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.525666E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.729 | TFLOPs: 11.73 | 7: iteration 87910/ 173500 | consumed samples: 22504960 | consumed tokens: 46090158080 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.526873E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.457 | TFLOPs: 12.00 | 7: iteration 87920/ 173500 | consumed samples: 22507520 | consumed tokens: 46095400960 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.534950E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.313 | TFLOPs: 11.71 | 7: iteration 87930/ 173500 | consumed samples: 22510080 | consumed tokens: 46100643840 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.518624E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.912 | TFLOPs: 11.42 | 7: iteration 87940/ 173500 | consumed samples: 22512640 | consumed tokens: 46105886720 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.519954E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.549 | TFLOPs: 11.66 | 7: iteration 87950/ 173500 | consumed samples: 22515200 | consumed tokens: 46111129600 | elapsed time per iteration (s): 0.08 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 4.525159E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.969 | TFLOPs: 11.87 | 7: iteration 87960/ 173500 | consumed samples: 22517760 | consumed tokens: 46116372480 | elapsed time per iteration (s): 0.08 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.517455E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.048 | TFLOPs: 11.50 | 7: iteration 87970/ 173500 | consumed samples: 22520320 | consumed tokens: 46121615360 | elapsed time per iteration (s): 0.08 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.513350E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.185 | TFLOPs: 11.93 | 7: iteration 87980/ 173500 | consumed samples: 22522880 | consumed tokens: 46126858240 | elapsed time per iteration (s): 0.08 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.524838E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.851 | TFLOPs: 11.63 | 7: iteration 87990/ 173500 | consumed samples: 22525440 | consumed tokens: 46132101120 | elapsed time per iteration (s): 0.08 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.526673E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.222 | TFLOPs: 11.61 | 0: [2023-03-17 02:22:49,118] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=0, lr=[0.00010937083470846484, 0.00010937083470846484, 0.00010937083470846484], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 88000/ 173500 | consumed samples: 22528000 | consumed tokens: 46137344000 | elapsed time per iteration (s): 0.08 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.537253E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.103 | TFLOPs: 11.68 | 0: steps: 88000 loss: 4.5055 iter time (s): 0.082 samples/sec: 3109.341 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 88000 | lm loss value: 4.384697E+00 | lm loss PPL: 8.021391E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 88000 to checkpoints_14m91b100m 0: [2023-03-17 02:22:49,175] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step88000 is begin to save! 0: [2023-03-17 02:22:49,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:22:49,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:22:49,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:22:49,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:22:49,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:22:49,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:22:49,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:22:49,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:22:49,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:22:49,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:22:49,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:22:49,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:22:49,217] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step88000/mp_rank_00_model_states.pt 0: [2023-03-17 02:22:49,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:22:49,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:22:49,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:22:49,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: successfully saved checkpoint at iteration 88000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.83 4: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:22:49,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:22:49,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: iteration 88010/ 173500 | consumed samples: 22530560 | consumed tokens: 46142586880 | elapsed time per iteration (s): 0.09 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 4.519909E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.143 | TFLOPs: 10.13 | 7: iteration 88020/ 173500 | consumed samples: 22533120 | consumed tokens: 46147829760 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.541109E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.866 | TFLOPs: 11.43 | 7: iteration 88030/ 173500 | consumed samples: 22535680 | consumed tokens: 46153072640 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.522404E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.438 | TFLOPs: 11.97 | 7: iteration 88040/ 173500 | consumed samples: 22538240 | consumed tokens: 46158315520 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.527361E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.191 | TFLOPs: 11.67 | 7: iteration 88050/ 173500 | consumed samples: 22540800 | consumed tokens: 46163558400 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.542095E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.486 | TFLOPs: 11.87 | 7: iteration 88060/ 173500 | consumed samples: 22543360 | consumed tokens: 46168801280 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.518114E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.243 | TFLOPs: 11.93 | 7: iteration 88070/ 173500 | consumed samples: 22545920 | consumed tokens: 46174044160 | elapsed time per iteration (s): 0.08 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 4.530463E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.039 | TFLOPs: 11.55 | 7: iteration 88080/ 173500 | consumed samples: 22548480 | consumed tokens: 46179287040 | elapsed time per iteration (s): 0.08 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.510991E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.386 | TFLOPs: 11.93 | 7: iteration 88090/ 173500 | consumed samples: 22551040 | consumed tokens: 46184529920 | elapsed time per iteration (s): 0.08 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.522636E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.513 | TFLOPs: 11.93 | 7: iteration 88100/ 173500 | consumed samples: 22553600 | consumed tokens: 46189772800 | elapsed time per iteration (s): 0.08 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.526230E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.325 | TFLOPs: 11.92 | 7: iteration 88110/ 173500 | consumed samples: 22556160 | consumed tokens: 46195015680 | elapsed time per iteration (s): 0.08 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.532259E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.180 | TFLOPs: 11.94 | 7: iteration 88120/ 173500 | consumed samples: 22558720 | consumed tokens: 46200258560 | elapsed time per iteration (s): 0.09 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.530393E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.499 | TFLOPs: 10.21 | 7: iteration 88130/ 173500 | consumed samples: 22561280 | consumed tokens: 46205501440 | elapsed time per iteration (s): 0.09 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 4.536061E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.663 | TFLOPs: 10.55 | 7: iteration 88140/ 173500 | consumed samples: 22563840 | consumed tokens: 46210744320 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.523166E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.308 | TFLOPs: 11.91 | 7: iteration 88150/ 173500 | consumed samples: 22566400 | consumed tokens: 46215987200 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.523613E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.698 | TFLOPs: 11.44 | 7: iteration 88160/ 173500 | consumed samples: 22568960 | consumed tokens: 46221230080 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.530775E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.023 | TFLOPs: 11.90 | 7: iteration 88170/ 173500 | consumed samples: 22571520 | consumed tokens: 46226472960 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.521602E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.687 | TFLOPs: 11.41 | 7: iteration 88180/ 173500 | consumed samples: 22574080 | consumed tokens: 46231715840 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.529738E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.915 | TFLOPs: 11.91 | 7: iteration 88190/ 173500 | consumed samples: 22576640 | consumed tokens: 46236958720 | elapsed time per iteration (s): 0.08 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 4.532610E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.327 | TFLOPs: 11.96 | 7: iteration 88200/ 173500 | consumed samples: 22579200 | consumed tokens: 46242201600 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.532919E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.593 | TFLOPs: 11.65 | 7: iteration 88210/ 173500 | consumed samples: 22581760 | consumed tokens: 46247444480 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.524612E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.841 | TFLOPs: 11.85 | 7: iteration 88220/ 173500 | consumed samples: 22584320 | consumed tokens: 46252687360 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.527924E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.377 | TFLOPs: 11.94 | 7: iteration 88230/ 173500 | consumed samples: 22586880 | consumed tokens: 46257930240 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.509679E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.899 | TFLOPs: 11.88 | 7: iteration 88240/ 173500 | consumed samples: 22589440 | consumed tokens: 46263173120 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.517989E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.415 | TFLOPs: 11.79 | 7: iteration 88250/ 173500 | consumed samples: 22592000 | consumed tokens: 46268416000 | elapsed time per iteration (s): 0.08 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 4.522814E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.351 | TFLOPs: 11.89 | 7: iteration 88260/ 173500 | consumed samples: 22594560 | consumed tokens: 46273658880 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.529177E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.602 | TFLOPs: 11.84 | 7: iteration 88270/ 173500 | consumed samples: 22597120 | consumed tokens: 46278901760 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.535411E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.989 | TFLOPs: 11.89 | 7: iteration 88280/ 173500 | consumed samples: 22599680 | consumed tokens: 46284144640 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.522849E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.425 | TFLOPs: 11.86 | 7: iteration 88290/ 173500 | consumed samples: 22602240 | consumed tokens: 46289387520 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.517457E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.344 | TFLOPs: 11.87 | 7: iteration 88300/ 173500 | consumed samples: 22604800 | consumed tokens: 46294630400 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.517506E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.332 | TFLOPs: 11.85 | 7: iteration 88310/ 173500 | consumed samples: 22607360 | consumed tokens: 46299873280 | elapsed time per iteration (s): 0.08 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 4.535295E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.554 | TFLOPs: 11.89 | 7: iteration 88320/ 173500 | consumed samples: 22609920 | consumed tokens: 46305116160 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.523257E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.267 | TFLOPs: 11.89 | 7: iteration 88330/ 173500 | consumed samples: 22612480 | consumed tokens: 46310359040 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.533862E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.033 | TFLOPs: 11.54 | 7: iteration 88340/ 173500 | consumed samples: 22615040 | consumed tokens: 46315601920 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.529239E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.284 | TFLOPs: 11.92 | 7: iteration 88350/ 173500 | consumed samples: 22617600 | consumed tokens: 46320844800 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.516212E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.082 | TFLOPs: 11.98 | 7: iteration 88360/ 173500 | consumed samples: 22620160 | consumed tokens: 46326087680 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.534092E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.823 | TFLOPs: 11.69 | 7: iteration 88370/ 173500 | consumed samples: 22622720 | consumed tokens: 46331330560 | elapsed time per iteration (s): 0.08 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 4.528701E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.156 | TFLOPs: 11.84 | 7: iteration 88380/ 173500 | consumed samples: 22625280 | consumed tokens: 46336573440 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.523660E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.204 | TFLOPs: 11.58 | 7: iteration 88390/ 173500 | consumed samples: 22627840 | consumed tokens: 46341816320 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.523573E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.168 | TFLOPs: 12.03 | 7: iteration 88400/ 173500 | consumed samples: 22630400 | consumed tokens: 46347059200 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.518282E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.836 | TFLOPs: 12.01 | 7: iteration 88410/ 173500 | consumed samples: 22632960 | consumed tokens: 46352302080 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.523254E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.760 | TFLOPs: 11.98 | 7: iteration 88420/ 173500 | consumed samples: 22635520 | consumed tokens: 46357544960 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.528147E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.130 | TFLOPs: 12.00 | 7: iteration 88430/ 173500 | consumed samples: 22638080 | consumed tokens: 46362787840 | elapsed time per iteration (s): 0.08 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 4.513351E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.956 | TFLOPs: 11.99 | 7: iteration 88440/ 173500 | consumed samples: 22640640 | consumed tokens: 46368030720 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.522257E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.599 | TFLOPs: 11.99 | 7: iteration 88450/ 173500 | consumed samples: 22643200 | consumed tokens: 46373273600 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.527162E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.948 | TFLOPs: 12.00 | 7: iteration 88460/ 173500 | consumed samples: 22645760 | consumed tokens: 46378516480 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.522906E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.237 | TFLOPs: 12.00 | 7: iteration 88470/ 173500 | consumed samples: 22648320 | consumed tokens: 46383759360 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.538461E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.115 | TFLOPs: 11.84 | 7: iteration 88480/ 173500 | consumed samples: 22650880 | consumed tokens: 46389002240 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.526753E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.150 | TFLOPs: 12.04 | 7: iteration 88490/ 173500 | consumed samples: 22653440 | consumed tokens: 46394245120 | elapsed time per iteration (s): 0.08 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 4.508357E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.449 | TFLOPs: 12.02 | 7: iteration 88500/ 173500 | consumed samples: 22656000 | consumed tokens: 46399488000 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.531770E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.708 | TFLOPs: 11.93 | 7: iteration 88510/ 173500 | consumed samples: 22658560 | consumed tokens: 46404730880 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.513387E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.937 | TFLOPs: 11.89 | 7: iteration 88520/ 173500 | consumed samples: 22661120 | consumed tokens: 46409973760 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.533404E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.845 | TFLOPs: 11.69 | 7: iteration 88530/ 173500 | consumed samples: 22663680 | consumed tokens: 46415216640 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.514067E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.205 | TFLOPs: 12.02 | 7: iteration 88540/ 173500 | consumed samples: 22666240 | consumed tokens: 46420459520 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.527647E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.901 | TFLOPs: 11.97 | 7: iteration 88550/ 173500 | consumed samples: 22668800 | consumed tokens: 46425702400 | elapsed time per iteration (s): 0.08 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 4.518511E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.549 | TFLOPs: 11.95 | 7: iteration 88560/ 173500 | consumed samples: 22671360 | consumed tokens: 46430945280 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.533826E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.793 | TFLOPs: 11.97 | 7: iteration 88570/ 173500 | consumed samples: 22673920 | consumed tokens: 46436188160 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.535126E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.619 | TFLOPs: 11.95 | 7: iteration 88580/ 173500 | consumed samples: 22676480 | consumed tokens: 46441431040 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.534153E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.567 | TFLOPs: 12.02 | 7: iteration 88590/ 173500 | consumed samples: 22679040 | consumed tokens: 46446673920 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.533850E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.447 | TFLOPs: 12.02 | 7: iteration 88600/ 173500 | consumed samples: 22681600 | consumed tokens: 46451916800 | elapsed time per iteration (s): 0.11 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.527406E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2338.591 | TFLOPs: 8.70 | 7: iteration 88610/ 173500 | consumed samples: 22684160 | consumed tokens: 46457159680 | elapsed time per iteration (s): 0.09 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.524269E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.508 | TFLOPs: 10.64 | 7: iteration 88620/ 173500 | consumed samples: 22686720 | consumed tokens: 46462402560 | elapsed time per iteration (s): 0.08 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 4.514985E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.031 | TFLOPs: 11.93 | 7: iteration 88630/ 173500 | consumed samples: 22689280 | consumed tokens: 46467645440 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.521320E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.533 | TFLOPs: 11.95 | 7: iteration 88640/ 173500 | consumed samples: 22691840 | consumed tokens: 46472888320 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.524020E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.158 | TFLOPs: 11.91 | 7: iteration 88650/ 173500 | consumed samples: 22694400 | consumed tokens: 46478131200 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.530230E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.224 | TFLOPs: 11.78 | 7: iteration 88660/ 173500 | consumed samples: 22696960 | consumed tokens: 46483374080 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.530314E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.908 | TFLOPs: 11.89 | 7: iteration 88670/ 173500 | consumed samples: 22699520 | consumed tokens: 46488616960 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.522243E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.002 | TFLOPs: 11.83 | 7: iteration 88680/ 173500 | consumed samples: 22702080 | consumed tokens: 46493859840 | elapsed time per iteration (s): 0.08 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 4.526364E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.113 | TFLOPs: 11.84 | 7: iteration 88690/ 173500 | consumed samples: 22704640 | consumed tokens: 46499102720 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.517663E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.309 | TFLOPs: 11.85 | 7: iteration 88700/ 173500 | consumed samples: 22707200 | consumed tokens: 46504345600 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.526113E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.371 | TFLOPs: 11.83 | 7: iteration 88710/ 173500 | consumed samples: 22709760 | consumed tokens: 46509588480 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.518012E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.043 | TFLOPs: 11.81 | 7: iteration 88720/ 173500 | consumed samples: 22712320 | consumed tokens: 46514831360 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.516514E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.152 | TFLOPs: 11.82 | 7: iteration 88730/ 173500 | consumed samples: 22714880 | consumed tokens: 46520074240 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.532763E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.594 | TFLOPs: 11.87 | 7: iteration 88740/ 173500 | consumed samples: 22717440 | consumed tokens: 46525317120 | elapsed time per iteration (s): 0.08 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 4.530706E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.442 | TFLOPs: 11.89 | 7: iteration 88750/ 173500 | consumed samples: 22720000 | consumed tokens: 46530560000 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.518327E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.412 | TFLOPs: 11.91 | 7: iteration 88760/ 173500 | consumed samples: 22722560 | consumed tokens: 46535802880 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.521641E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.706 | TFLOPs: 11.86 | 7: iteration 88770/ 173500 | consumed samples: 22725120 | consumed tokens: 46541045760 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.515857E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.556 | TFLOPs: 11.91 | 7: iteration 88780/ 173500 | consumed samples: 22727680 | consumed tokens: 46546288640 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.522925E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.308 | TFLOPs: 11.93 | 7: iteration 88790/ 173500 | consumed samples: 22730240 | consumed tokens: 46551531520 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.525163E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.446 | TFLOPs: 11.65 | 7: iteration 88800/ 173500 | consumed samples: 22732800 | consumed tokens: 46556774400 | elapsed time per iteration (s): 0.08 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 4.509484E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.410 | TFLOPs: 11.81 | 7: iteration 88810/ 173500 | consumed samples: 22735360 | consumed tokens: 46562017280 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.524608E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.743 | TFLOPs: 11.69 | 7: iteration 88820/ 173500 | consumed samples: 22737920 | consumed tokens: 46567260160 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.531562E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.867 | TFLOPs: 11.69 | 7: iteration 88830/ 173500 | consumed samples: 22740480 | consumed tokens: 46572503040 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.521775E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.204 | TFLOPs: 11.83 | 7: iteration 88840/ 173500 | consumed samples: 22743040 | consumed tokens: 46577745920 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.536093E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.679 | TFLOPs: 11.75 | 7: iteration 88850/ 173500 | consumed samples: 22745600 | consumed tokens: 46582988800 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.532140E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.737 | TFLOPs: 11.45 | 7: iteration 88860/ 173500 | consumed samples: 22748160 | consumed tokens: 46588231680 | elapsed time per iteration (s): 0.08 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 4.515954E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.359 | TFLOPs: 11.87 | 7: iteration 88870/ 173500 | consumed samples: 22750720 | consumed tokens: 46593474560 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.524956E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.596 | TFLOPs: 11.71 | 7: iteration 88880/ 173500 | consumed samples: 22753280 | consumed tokens: 46598717440 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.518917E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.024 | TFLOPs: 11.83 | 7: iteration 88890/ 173500 | consumed samples: 22755840 | consumed tokens: 46603960320 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.520163E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.957 | TFLOPs: 11.94 | 7: iteration 88900/ 173500 | consumed samples: 22758400 | consumed tokens: 46609203200 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.527800E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.930 | TFLOPs: 11.84 | 7: iteration 88910/ 173500 | consumed samples: 22760960 | consumed tokens: 46614446080 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.520146E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.595 | TFLOPs: 11.77 | 7: iteration 88920/ 173500 | consumed samples: 22763520 | consumed tokens: 46619688960 | elapsed time per iteration (s): 0.08 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 4.519948E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.406 | TFLOPs: 11.93 | 7: iteration 88930/ 173500 | consumed samples: 22766080 | consumed tokens: 46624931840 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.520662E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.790 | TFLOPs: 11.81 | 7: iteration 88940/ 173500 | consumed samples: 22768640 | consumed tokens: 46630174720 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.526337E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.603 | TFLOPs: 11.98 | 7: iteration 88950/ 173500 | consumed samples: 22771200 | consumed tokens: 46635417600 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.534790E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.201 | TFLOPs: 11.93 | 7: iteration 88960/ 173500 | consumed samples: 22773760 | consumed tokens: 46640660480 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.530288E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.002 | TFLOPs: 11.21 | 7: iteration 88970/ 173500 | consumed samples: 22776320 | consumed tokens: 46645903360 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.530106E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.756 | TFLOPs: 11.93 | 7: iteration 88980/ 173500 | consumed samples: 22778880 | consumed tokens: 46651146240 | elapsed time per iteration (s): 0.08 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 4.520748E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.018 | TFLOPs: 11.90 | 7: iteration 88990/ 173500 | consumed samples: 22781440 | consumed tokens: 46656389120 | elapsed time per iteration (s): 0.08 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.523706E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.837 | TFLOPs: 11.90 | 7: iteration 89000/ 173500 | consumed samples: 22784000 | consumed tokens: 46661632000 | elapsed time per iteration (s): 0.08 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.529701E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.117 | TFLOPs: 11.96 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 89000 | lm loss value: 4.420447E+00 | lm loss PPL: 8.313347E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 89000 to checkpoints_14m91b100m 0: [2023-03-17 02:24:10,286] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step89000 is begin to save! 0: [2023-03-17 02:24:10,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:24:10,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:24:10,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:24:10,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:24:10,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:24:10,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:24:10,325] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:24:10,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:24:10,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:24:10,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:24:10,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:24:10,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:24:10,332] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step89000/mp_rank_00_model_states.pt 0: [2023-03-17 02:24:10,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:24:10,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:24:10,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:24:10,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:24:10,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:24:10,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2023-03-17 02:24:10,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:24:10,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:24:10,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:24:10,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: successfully saved checkpoint at iteration 89000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 99.41 7: iteration 89010/ 173500 | consumed samples: 22786560 | consumed tokens: 46666874880 | elapsed time per iteration (s): 0.09 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.529984E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.190 | TFLOPs: 10.15 | 7: iteration 89020/ 173500 | consumed samples: 22789120 | consumed tokens: 46672117760 | elapsed time per iteration (s): 0.08 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.533369E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.879 | TFLOPs: 11.69 | 7: iteration 89030/ 173500 | consumed samples: 22791680 | consumed tokens: 46677360640 | elapsed time per iteration (s): 0.08 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.524955E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.871 | TFLOPs: 11.44 | 7: iteration 89040/ 173500 | consumed samples: 22794240 | consumed tokens: 46682603520 | elapsed time per iteration (s): 0.08 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 4.523484E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.363 | TFLOPs: 11.49 | 7: iteration 89050/ 173500 | consumed samples: 22796800 | consumed tokens: 46687846400 | elapsed time per iteration (s): 0.09 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.519884E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.442 | TFLOPs: 10.78 | 7: iteration 89060/ 173500 | consumed samples: 22799360 | consumed tokens: 46693089280 | elapsed time per iteration (s): 0.08 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.521163E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.026 | TFLOPs: 11.34 | 7: iteration 89070/ 173500 | consumed samples: 22801920 | consumed tokens: 46698332160 | elapsed time per iteration (s): 0.09 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.527907E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.600 | TFLOPs: 11.07 | 7: iteration 89080/ 173500 | consumed samples: 22804480 | consumed tokens: 46703575040 | elapsed time per iteration (s): 0.08 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.526991E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.208 | TFLOPs: 11.72 | 7: iteration 89090/ 173500 | consumed samples: 22807040 | consumed tokens: 46708817920 | elapsed time per iteration (s): 0.09 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.515543E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.689 | TFLOPs: 10.10 | 7: iteration 89100/ 173500 | consumed samples: 22809600 | consumed tokens: 46714060800 | elapsed time per iteration (s): 0.08 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 4.518113E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.718 | TFLOPs: 11.62 | 7: iteration 89110/ 173500 | consumed samples: 22812160 | consumed tokens: 46719303680 | elapsed time per iteration (s): 0.11 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.521853E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.469 | TFLOPs: 9.05 | 7: iteration 89120/ 173500 | consumed samples: 22814720 | consumed tokens: 46724546560 | elapsed time per iteration (s): 0.11 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.516777E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.952 | TFLOPs: 8.75 | 7: iteration 89130/ 173500 | consumed samples: 22817280 | consumed tokens: 46729789440 | elapsed time per iteration (s): 0.09 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.528258E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2742.951 | TFLOPs: 10.20 | 7: iteration 89140/ 173500 | consumed samples: 22819840 | consumed tokens: 46735032320 | elapsed time per iteration (s): 0.08 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.536606E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.556 | TFLOPs: 11.34 | 7: iteration 89150/ 173500 | consumed samples: 22822400 | consumed tokens: 46740275200 | elapsed time per iteration (s): 0.08 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.524002E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.025 | TFLOPs: 11.31 | 7: iteration 89160/ 173500 | consumed samples: 22824960 | consumed tokens: 46745518080 | elapsed time per iteration (s): 0.10 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 4.521345E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.409 | TFLOPs: 9.46 | 7: iteration 89170/ 173500 | consumed samples: 22827520 | consumed tokens: 46750760960 | elapsed time per iteration (s): 0.08 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.518359E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.393 | TFLOPs: 11.79 | 7: iteration 89180/ 173500 | consumed samples: 22830080 | consumed tokens: 46756003840 | elapsed time per iteration (s): 0.11 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.533505E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2330.586 | TFLOPs: 8.67 | 7: iteration 89190/ 173500 | consumed samples: 22832640 | consumed tokens: 46761246720 | elapsed time per iteration (s): 0.11 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.526318E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2298.630 | TFLOPs: 8.55 | 7: iteration 89200/ 173500 | consumed samples: 22835200 | consumed tokens: 46766489600 | elapsed time per iteration (s): 0.08 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.522917E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.214 | TFLOPs: 11.30 | 7: iteration 89210/ 173500 | consumed samples: 22837760 | consumed tokens: 46771732480 | elapsed time per iteration (s): 0.11 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.524309E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.915 | TFLOPs: 8.82 | 7: iteration 89220/ 173500 | consumed samples: 22840320 | consumed tokens: 46776975360 | elapsed time per iteration (s): 0.10 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 4.530539E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.895 | TFLOPs: 9.36 | 7: iteration 89230/ 173500 | consumed samples: 22842880 | consumed tokens: 46782218240 | elapsed time per iteration (s): 0.08 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.524504E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.477 | TFLOPs: 11.67 | 7: iteration 89240/ 173500 | consumed samples: 22845440 | consumed tokens: 46787461120 | elapsed time per iteration (s): 0.08 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.526942E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.032 | TFLOPs: 11.74 | 7: iteration 89250/ 173500 | consumed samples: 22848000 | consumed tokens: 46792704000 | elapsed time per iteration (s): 0.10 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.538063E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2685.769 | TFLOPs: 9.99 | 7: iteration 89260/ 173500 | consumed samples: 22850560 | consumed tokens: 46797946880 | elapsed time per iteration (s): 0.09 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.536529E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.615 | TFLOPs: 10.41 | 7: iteration 89270/ 173500 | consumed samples: 22853120 | consumed tokens: 46803189760 | elapsed time per iteration (s): 0.09 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.515933E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.425 | TFLOPs: 10.16 | 7: iteration 89280/ 173500 | consumed samples: 22855680 | consumed tokens: 46808432640 | elapsed time per iteration (s): 0.08 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 4.525725E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.589 | TFLOPs: 11.79 | 7: iteration 89290/ 173500 | consumed samples: 22858240 | consumed tokens: 46813675520 | elapsed time per iteration (s): 0.08 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.526371E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.509 | TFLOPs: 12.02 | 7: iteration 89300/ 173500 | consumed samples: 22860800 | consumed tokens: 46818918400 | elapsed time per iteration (s): 0.09 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.520655E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.692 | TFLOPs: 10.69 | 7: iteration 89310/ 173500 | consumed samples: 22863360 | consumed tokens: 46824161280 | elapsed time per iteration (s): 0.09 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.528107E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2944.031 | TFLOPs: 10.95 | 7: iteration 89320/ 173500 | consumed samples: 22865920 | consumed tokens: 46829404160 | elapsed time per iteration (s): 0.08 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.529985E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.354 | TFLOPs: 11.96 | 7: iteration 89330/ 173500 | consumed samples: 22868480 | consumed tokens: 46834647040 | elapsed time per iteration (s): 0.10 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.522256E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.647 | TFLOPs: 9.61 | 7: iteration 89340/ 173500 | consumed samples: 22871040 | consumed tokens: 46839889920 | elapsed time per iteration (s): 0.08 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 4.528428E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.087 | TFLOPs: 11.95 | 7: iteration 89350/ 173500 | consumed samples: 22873600 | consumed tokens: 46845132800 | elapsed time per iteration (s): 0.09 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.521253E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.377 | TFLOPs: 10.72 | 7: iteration 89360/ 173500 | consumed samples: 22876160 | consumed tokens: 46850375680 | elapsed time per iteration (s): 0.10 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.515876E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2451.722 | TFLOPs: 9.12 | 7: iteration 89370/ 173500 | consumed samples: 22878720 | consumed tokens: 46855618560 | elapsed time per iteration (s): 0.10 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.527159E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.289 | TFLOPs: 9.34 | 7: iteration 89380/ 173500 | consumed samples: 22881280 | consumed tokens: 46860861440 | elapsed time per iteration (s): 0.10 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.515646E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2692.951 | TFLOPs: 10.02 | 7: iteration 89390/ 173500 | consumed samples: 22883840 | consumed tokens: 46866104320 | elapsed time per iteration (s): 0.13 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.538880E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1936.610 | TFLOPs: 7.20 | 7: iteration 89400/ 173500 | consumed samples: 22886400 | consumed tokens: 46871347200 | elapsed time per iteration (s): 0.09 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.536610E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.891 | TFLOPs: 10.45 | 7: iteration 89410/ 173500 | consumed samples: 22888960 | consumed tokens: 46876590080 | elapsed time per iteration (s): 0.10 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 4.524670E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2464.731 | TFLOPs: 9.17 | 7: iteration 89420/ 173500 | consumed samples: 22891520 | consumed tokens: 46881832960 | elapsed time per iteration (s): 0.11 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.536911E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2248.252 | TFLOPs: 8.36 | 7: iteration 89430/ 173500 | consumed samples: 22894080 | consumed tokens: 46887075840 | elapsed time per iteration (s): 0.11 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.524335E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2411.993 | TFLOPs: 8.97 | 7: iteration 89440/ 173500 | consumed samples: 22896640 | consumed tokens: 46892318720 | elapsed time per iteration (s): 0.11 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.531831E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2258.491 | TFLOPs: 8.40 | 7: iteration 89450/ 173500 | consumed samples: 22899200 | consumed tokens: 46897561600 | elapsed time per iteration (s): 0.09 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.523003E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2773.165 | TFLOPs: 10.31 | 7: iteration 89460/ 173500 | consumed samples: 22901760 | consumed tokens: 46902804480 | elapsed time per iteration (s): 0.09 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.515251E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2753.892 | TFLOPs: 10.24 | 7: iteration 89470/ 173500 | consumed samples: 22904320 | consumed tokens: 46908047360 | elapsed time per iteration (s): 0.12 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 4.507975E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.018 | TFLOPs: 7.70 | 7: iteration 89480/ 173500 | consumed samples: 22906880 | consumed tokens: 46913290240 | elapsed time per iteration (s): 0.08 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.527274E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.047 | TFLOPs: 11.40 | 7: iteration 89490/ 173500 | consumed samples: 22909440 | consumed tokens: 46918533120 | elapsed time per iteration (s): 0.09 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.522045E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.683 | TFLOPs: 10.38 | 7: iteration 89500/ 173500 | consumed samples: 22912000 | consumed tokens: 46923776000 | elapsed time per iteration (s): 0.08 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.530445E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.857 | TFLOPs: 11.70 | 7: iteration 89510/ 173500 | consumed samples: 22914560 | consumed tokens: 46929018880 | elapsed time per iteration (s): 0.08 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.529877E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.696 | TFLOPs: 12.01 | 7: iteration 89520/ 173500 | consumed samples: 22917120 | consumed tokens: 46934261760 | elapsed time per iteration (s): 0.08 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.520188E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.985 | TFLOPs: 12.03 | 7: iteration 89530/ 173500 | consumed samples: 22919680 | consumed tokens: 46939504640 | elapsed time per iteration (s): 0.11 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 4.519314E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2431.070 | TFLOPs: 9.04 | 7: iteration 89540/ 173500 | consumed samples: 22922240 | consumed tokens: 46944747520 | elapsed time per iteration (s): 0.09 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.530911E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.277 | TFLOPs: 11.09 | 7: iteration 89550/ 173500 | consumed samples: 22924800 | consumed tokens: 46949990400 | elapsed time per iteration (s): 0.10 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.521103E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.538 | TFLOPs: 9.45 | 7: iteration 89560/ 173500 | consumed samples: 22927360 | consumed tokens: 46955233280 | elapsed time per iteration (s): 0.11 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.520879E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.204 | TFLOPs: 8.56 | 7: iteration 89570/ 173500 | consumed samples: 22929920 | consumed tokens: 46960476160 | elapsed time per iteration (s): 0.08 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.541218E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.407 | TFLOPs: 11.67 | 7: iteration 89580/ 173500 | consumed samples: 22932480 | consumed tokens: 46965719040 | elapsed time per iteration (s): 0.10 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.509991E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.417 | TFLOPs: 9.60 | 7: iteration 89590/ 173500 | consumed samples: 22935040 | consumed tokens: 46970961920 | elapsed time per iteration (s): 0.10 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 4.519967E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.661 | TFLOPs: 9.09 | 7: iteration 89600/ 173500 | consumed samples: 22937600 | consumed tokens: 46976204800 | elapsed time per iteration (s): 0.10 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.529457E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.489 | TFLOPs: 9.08 | 7: iteration 89610/ 173500 | consumed samples: 22940160 | consumed tokens: 46981447680 | elapsed time per iteration (s): 0.15 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.522469E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1690.678 | TFLOPs: 6.29 | 7: iteration 89620/ 173500 | consumed samples: 22942720 | consumed tokens: 46986690560 | elapsed time per iteration (s): 0.10 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.527115E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.468 | TFLOPs: 9.73 | 7: iteration 89630/ 173500 | consumed samples: 22945280 | consumed tokens: 46991933440 | elapsed time per iteration (s): 0.09 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.532543E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.055 | TFLOPs: 10.83 | 7: iteration 89640/ 173500 | consumed samples: 22947840 | consumed tokens: 46997176320 | elapsed time per iteration (s): 0.08 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.532349E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.877 | TFLOPs: 11.86 | 7: iteration 89650/ 173500 | consumed samples: 22950400 | consumed tokens: 47002419200 | elapsed time per iteration (s): 0.08 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 4.528725E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.042 | TFLOPs: 12.01 | 7: iteration 89660/ 173500 | consumed samples: 22952960 | consumed tokens: 47007662080 | elapsed time per iteration (s): 0.09 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.522978E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.015 | TFLOPs: 11.12 | 7: iteration 89670/ 173500 | consumed samples: 22955520 | consumed tokens: 47012904960 | elapsed time per iteration (s): 0.08 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.535416E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.007 | TFLOPs: 11.89 | 7: iteration 89680/ 173500 | consumed samples: 22958080 | consumed tokens: 47018147840 | elapsed time per iteration (s): 0.08 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.530004E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.990 | TFLOPs: 11.91 | 7: iteration 89690/ 173500 | consumed samples: 22960640 | consumed tokens: 47023390720 | elapsed time per iteration (s): 0.09 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.513426E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.714 | TFLOPs: 10.15 | 7: iteration 89700/ 173500 | consumed samples: 22963200 | consumed tokens: 47028633600 | elapsed time per iteration (s): 0.12 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.523503E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.717 | TFLOPs: 7.78 | 7: iteration 89710/ 173500 | consumed samples: 22965760 | consumed tokens: 47033876480 | elapsed time per iteration (s): 0.13 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 4.535690E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.919 | TFLOPs: 7.26 | 7: iteration 89720/ 173500 | consumed samples: 22968320 | consumed tokens: 47039119360 | elapsed time per iteration (s): 0.11 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.535500E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2274.564 | TFLOPs: 8.46 | 7: iteration 89730/ 173500 | consumed samples: 22970880 | consumed tokens: 47044362240 | elapsed time per iteration (s): 0.08 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.537639E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.865 | TFLOPs: 12.01 | 7: iteration 89740/ 173500 | consumed samples: 22973440 | consumed tokens: 47049605120 | elapsed time per iteration (s): 0.08 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.521245E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.094 | TFLOPs: 12.02 | 7: iteration 89750/ 173500 | consumed samples: 22976000 | consumed tokens: 47054848000 | elapsed time per iteration (s): 0.08 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.525311E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.257 | TFLOPs: 11.37 | 7: iteration 89760/ 173500 | consumed samples: 22978560 | consumed tokens: 47060090880 | elapsed time per iteration (s): 0.08 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.503406E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.608 | TFLOPs: 11.27 | 7: iteration 89770/ 173500 | consumed samples: 22981120 | consumed tokens: 47065333760 | elapsed time per iteration (s): 0.10 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 4.512962E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.152 | TFLOPs: 10.00 | 7: iteration 89780/ 173500 | consumed samples: 22983680 | consumed tokens: 47070576640 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.523645E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.090 | TFLOPs: 11.47 | 7: iteration 89790/ 173500 | consumed samples: 22986240 | consumed tokens: 47075819520 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.529467E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.721 | TFLOPs: 11.98 | 7: iteration 89800/ 173500 | consumed samples: 22988800 | consumed tokens: 47081062400 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.527561E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.219 | TFLOPs: 12.03 | 7: iteration 89810/ 173500 | consumed samples: 22991360 | consumed tokens: 47086305280 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.525163E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.049 | TFLOPs: 11.92 | 7: iteration 89820/ 173500 | consumed samples: 22993920 | consumed tokens: 47091548160 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.505755E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.616 | TFLOPs: 11.87 | 7: iteration 89830/ 173500 | consumed samples: 22996480 | consumed tokens: 47096791040 | elapsed time per iteration (s): 0.08 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 4.539622E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.350 | TFLOPs: 11.56 | 7: iteration 89840/ 173500 | consumed samples: 22999040 | consumed tokens: 47102033920 | elapsed time per iteration (s): 0.13 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.519825E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1953.607 | TFLOPs: 7.27 | 7: iteration 89850/ 173500 | consumed samples: 23001600 | consumed tokens: 47107276800 | elapsed time per iteration (s): 0.13 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.528130E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.264 | TFLOPs: 7.58 | 7: iteration 89860/ 173500 | consumed samples: 23004160 | consumed tokens: 47112519680 | elapsed time per iteration (s): 0.10 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.518163E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2489.985 | TFLOPs: 9.26 | 7: iteration 89870/ 173500 | consumed samples: 23006720 | consumed tokens: 47117762560 | elapsed time per iteration (s): 0.08 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.530352E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.429 | TFLOPs: 11.97 | 7: iteration 89880/ 173500 | consumed samples: 23009280 | consumed tokens: 47123005440 | elapsed time per iteration (s): 0.08 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.528489E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.555 | TFLOPs: 11.97 | 7: iteration 89890/ 173500 | consumed samples: 23011840 | consumed tokens: 47128248320 | elapsed time per iteration (s): 0.08 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 4.519663E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.643 | TFLOPs: 11.98 | 7: iteration 89900/ 173500 | consumed samples: 23014400 | consumed tokens: 47133491200 | elapsed time per iteration (s): 0.08 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.519081E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.912 | TFLOPs: 12.01 | 7: iteration 89910/ 173500 | consumed samples: 23016960 | consumed tokens: 47138734080 | elapsed time per iteration (s): 0.08 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.531393E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.299 | TFLOPs: 12.00 | 7: iteration 89920/ 173500 | consumed samples: 23019520 | consumed tokens: 47143976960 | elapsed time per iteration (s): 0.09 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.523203E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.450 | TFLOPs: 11.10 | 7: iteration 89930/ 173500 | consumed samples: 23022080 | consumed tokens: 47149219840 | elapsed time per iteration (s): 0.08 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.526028E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.226 | TFLOPs: 12.04 | 7: iteration 89940/ 173500 | consumed samples: 23024640 | consumed tokens: 47154462720 | elapsed time per iteration (s): 0.08 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.519651E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.508 | TFLOPs: 11.99 | 7: iteration 89950/ 173500 | consumed samples: 23027200 | consumed tokens: 47159705600 | elapsed time per iteration (s): 0.08 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 4.517250E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.742 | TFLOPs: 12.04 | 7: iteration 89960/ 173500 | consumed samples: 23029760 | consumed tokens: 47164948480 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.522618E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.493 | TFLOPs: 12.05 | 7: iteration 89970/ 173500 | consumed samples: 23032320 | consumed tokens: 47170191360 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.517289E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.752 | TFLOPs: 11.95 | 7: iteration 89980/ 173500 | consumed samples: 23034880 | consumed tokens: 47175434240 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.524548E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.842 | TFLOPs: 12.02 | 7: iteration 89990/ 173500 | consumed samples: 23037440 | consumed tokens: 47180677120 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.530435E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.222 | TFLOPs: 12.00 | 0: [2023-03-17 02:25:42,437] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=0, lr=[0.00010607986950689534, 0.00010607986950689534, 0.00010607986950689534], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 90000/ 173500 | consumed samples: 23040000 | consumed tokens: 47185920000 | elapsed time per iteration (s): 0.08 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.528487E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.815 | TFLOPs: 12.02 | 0: steps: 90000 loss: 4.5307 iter time (s): 0.086 samples/sec: 2978.893 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 90000 | lm loss value: 4.407695E+00 | lm loss PPL: 8.208007E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 90000 to checkpoints_14m91b100m 0: [2023-03-17 02:25:42,495] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step90000 is begin to save! 0: [2023-03-17 02:25:42,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:25:42,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:25:42,522] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:25:42,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:25:42,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:25:42,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:25:42,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:25:42,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:25:42,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:25:42,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:25:42,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:25:42,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:25:42,538] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step90000/mp_rank_00_model_states.pt 0: [2023-03-17 02:25:42,538] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:25:42,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:25:42,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:25:42,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:25:42,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,570] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,570] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2023-03-17 02:25:42,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:25:42,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:25:42,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2023-03-17 02:25:42,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:25:42,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:25:42,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: successfully saved checkpoint at iteration 90000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.90 7: iteration 90010/ 173500 | consumed samples: 23042560 | consumed tokens: 47191162880 | elapsed time per iteration (s): 0.10 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 4.526497E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2651.318 | TFLOPs: 9.86 | 7: iteration 90020/ 173500 | consumed samples: 23045120 | consumed tokens: 47196405760 | elapsed time per iteration (s): 0.08 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.529836E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.659 | TFLOPs: 11.96 | 7: iteration 90030/ 173500 | consumed samples: 23047680 | consumed tokens: 47201648640 | elapsed time per iteration (s): 0.10 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.531429E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2552.010 | TFLOPs: 9.49 | 7: iteration 90040/ 173500 | consumed samples: 23050240 | consumed tokens: 47206891520 | elapsed time per iteration (s): 0.10 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.526968E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2621.308 | TFLOPs: 9.75 | 7: iteration 90050/ 173500 | consumed samples: 23052800 | consumed tokens: 47212134400 | elapsed time per iteration (s): 0.08 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.529202E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.572 | TFLOPs: 11.94 | 7: iteration 90060/ 173500 | consumed samples: 23055360 | consumed tokens: 47217377280 | elapsed time per iteration (s): 0.08 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.525361E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.971 | TFLOPs: 11.26 | 7: iteration 90070/ 173500 | consumed samples: 23057920 | consumed tokens: 47222620160 | elapsed time per iteration (s): 0.08 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 4.528272E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.383 | TFLOPs: 12.00 | 7: iteration 90080/ 173500 | consumed samples: 23060480 | consumed tokens: 47227863040 | elapsed time per iteration (s): 0.12 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.529159E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.068 | TFLOPs: 7.70 | 7: iteration 90090/ 173500 | consumed samples: 23063040 | consumed tokens: 47233105920 | elapsed time per iteration (s): 0.10 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.525841E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2522.996 | TFLOPs: 9.38 | 7: iteration 90100/ 173500 | consumed samples: 23065600 | consumed tokens: 47238348800 | elapsed time per iteration (s): 0.09 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.515723E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2899.255 | TFLOPs: 10.78 | 7: iteration 90110/ 173500 | consumed samples: 23068160 | consumed tokens: 47243591680 | elapsed time per iteration (s): 0.08 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.530833E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.581 | TFLOPs: 11.96 | 7: iteration 90120/ 173500 | consumed samples: 23070720 | consumed tokens: 47248834560 | elapsed time per iteration (s): 0.08 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.521885E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.669 | TFLOPs: 11.76 | 7: iteration 90130/ 173500 | consumed samples: 23073280 | consumed tokens: 47254077440 | elapsed time per iteration (s): 0.08 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 4.526809E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.122 | TFLOPs: 12.06 | 7: iteration 90140/ 173500 | consumed samples: 23075840 | consumed tokens: 47259320320 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.524733E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.612 | TFLOPs: 12.03 | 7: iteration 90150/ 173500 | consumed samples: 23078400 | consumed tokens: 47264563200 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.540077E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.570 | TFLOPs: 11.76 | 7: iteration 90160/ 173500 | consumed samples: 23080960 | consumed tokens: 47269806080 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.526556E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.067 | TFLOPs: 12.06 | 7: iteration 90170/ 173500 | consumed samples: 23083520 | consumed tokens: 47275048960 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.531723E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.481 | TFLOPs: 12.05 | 7: iteration 90180/ 173500 | consumed samples: 23086080 | consumed tokens: 47280291840 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.524061E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.473 | TFLOPs: 12.03 | 7: iteration 90190/ 173500 | consumed samples: 23088640 | consumed tokens: 47285534720 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.523608E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.392 | TFLOPs: 12.03 | 7: iteration 90200/ 173500 | consumed samples: 23091200 | consumed tokens: 47290777600 | elapsed time per iteration (s): 0.08 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 4.542569E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.239 | TFLOPs: 12.05 | 7: iteration 90210/ 173500 | consumed samples: 23093760 | consumed tokens: 47296020480 | elapsed time per iteration (s): 0.08 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.529837E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.263 | TFLOPs: 11.90 | 7: iteration 90220/ 173500 | consumed samples: 23096320 | consumed tokens: 47301263360 | elapsed time per iteration (s): 0.08 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.514000E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.762 | TFLOPs: 11.91 | 7: iteration 90230/ 173500 | consumed samples: 23098880 | consumed tokens: 47306506240 | elapsed time per iteration (s): 0.08 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.537240E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.331 | TFLOPs: 12.03 | 7: iteration 90240/ 173500 | consumed samples: 23101440 | consumed tokens: 47311749120 | elapsed time per iteration (s): 0.08 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.527855E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.642 | TFLOPs: 12.02 | 7: iteration 90250/ 173500 | consumed samples: 23104000 | consumed tokens: 47316992000 | elapsed time per iteration (s): 0.09 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.526494E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.139 | TFLOPs: 10.54 | 7: iteration 90260/ 173500 | consumed samples: 23106560 | consumed tokens: 47322234880 | elapsed time per iteration (s): 0.08 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 4.518335E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.483 | TFLOPs: 12.02 | 7: iteration 90270/ 173500 | consumed samples: 23109120 | consumed tokens: 47327477760 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.539717E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.886 | TFLOPs: 12.04 | 7: iteration 90280/ 173500 | consumed samples: 23111680 | consumed tokens: 47332720640 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.530157E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.670 | TFLOPs: 12.01 | 7: iteration 90290/ 173500 | consumed samples: 23114240 | consumed tokens: 47337963520 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.519684E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.566 | TFLOPs: 12.05 | 7: iteration 90300/ 173500 | consumed samples: 23116800 | consumed tokens: 47343206400 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.520001E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.489 | TFLOPs: 11.70 | 7: iteration 90310/ 173500 | consumed samples: 23119360 | consumed tokens: 47348449280 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.530388E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.267 | TFLOPs: 12.03 | 7: iteration 90320/ 173500 | consumed samples: 23121920 | consumed tokens: 47353692160 | elapsed time per iteration (s): 0.08 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 4.520362E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.573 | TFLOPs: 11.76 | 7: iteration 90330/ 173500 | consumed samples: 23124480 | consumed tokens: 47358935040 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.525980E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.389 | TFLOPs: 12.05 | 7: iteration 90340/ 173500 | consumed samples: 23127040 | consumed tokens: 47364177920 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.539843E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.104 | TFLOPs: 12.01 | 7: iteration 90350/ 173500 | consumed samples: 23129600 | consumed tokens: 47369420800 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.530092E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.104 | TFLOPs: 12.01 | 7: iteration 90360/ 173500 | consumed samples: 23132160 | consumed tokens: 47374663680 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.528199E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.365 | TFLOPs: 11.86 | 7: iteration 90370/ 173500 | consumed samples: 23134720 | consumed tokens: 47379906560 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.520761E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.848 | TFLOPs: 12.01 | 7: iteration 90380/ 173500 | consumed samples: 23137280 | consumed tokens: 47385149440 | elapsed time per iteration (s): 0.08 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 4.530636E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.369 | TFLOPs: 12.04 | 7: iteration 90390/ 173500 | consumed samples: 23139840 | consumed tokens: 47390392320 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.522989E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.407 | TFLOPs: 11.97 | 7: iteration 90400/ 173500 | consumed samples: 23142400 | consumed tokens: 47395635200 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.522886E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.634 | TFLOPs: 12.02 | 7: iteration 90410/ 173500 | consumed samples: 23144960 | consumed tokens: 47400878080 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.528231E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.752 | TFLOPs: 11.99 | 7: iteration 90420/ 173500 | consumed samples: 23147520 | consumed tokens: 47406120960 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.530866E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.412 | TFLOPs: 11.87 | 7: iteration 90430/ 173500 | consumed samples: 23150080 | consumed tokens: 47411363840 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.519637E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.688 | TFLOPs: 11.80 | 7: iteration 90440/ 173500 | consumed samples: 23152640 | consumed tokens: 47416606720 | elapsed time per iteration (s): 0.08 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 4.527151E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.538 | TFLOPs: 11.91 | 7: iteration 90450/ 173500 | consumed samples: 23155200 | consumed tokens: 47421849600 | elapsed time per iteration (s): 0.08 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.523752E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.736 | TFLOPs: 11.49 | 7: iteration 90460/ 173500 | consumed samples: 23157760 | consumed tokens: 47427092480 | elapsed time per iteration (s): 0.08 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.513080E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.764 | TFLOPs: 11.97 | 7: iteration 90470/ 173500 | consumed samples: 23160320 | consumed tokens: 47432335360 | elapsed time per iteration (s): 0.08 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.532568E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.290 | TFLOPs: 11.74 | 7: iteration 90480/ 173500 | consumed samples: 23162880 | consumed tokens: 47437578240 | elapsed time per iteration (s): 0.08 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.518780E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.841 | TFLOPs: 11.97 | 7: iteration 90490/ 173500 | consumed samples: 23165440 | consumed tokens: 47442821120 | elapsed time per iteration (s): 0.09 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.522988E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2918.617 | TFLOPs: 10.86 | 7: iteration 90500/ 173500 | consumed samples: 23168000 | consumed tokens: 47448064000 | elapsed time per iteration (s): 0.13 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 4.528868E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1994.109 | TFLOPs: 7.42 | 7: iteration 90510/ 173500 | consumed samples: 23170560 | consumed tokens: 47453306880 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.532489E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.419 | TFLOPs: 7.37 | 7: iteration 90520/ 173500 | consumed samples: 23173120 | consumed tokens: 47458549760 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.516584E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.176 | TFLOPs: 7.24 | 7: iteration 90530/ 173500 | consumed samples: 23175680 | consumed tokens: 47463792640 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.519871E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1962.980 | TFLOPs: 7.30 | 7: iteration 90540/ 173500 | consumed samples: 23178240 | consumed tokens: 47469035520 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.523248E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.373 | TFLOPs: 7.37 | 7: iteration 90550/ 173500 | consumed samples: 23180800 | consumed tokens: 47474278400 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.526080E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.183 | TFLOPs: 7.32 | 7: iteration 90560/ 173500 | consumed samples: 23183360 | consumed tokens: 47479521280 | elapsed time per iteration (s): 0.13 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 4.529267E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.368 | TFLOPs: 7.30 | 7: iteration 90570/ 173500 | consumed samples: 23185920 | consumed tokens: 47484764160 | elapsed time per iteration (s): 0.10 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.515977E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2693.301 | TFLOPs: 10.02 | 7: iteration 90580/ 173500 | consumed samples: 23188480 | consumed tokens: 47490007040 | elapsed time per iteration (s): 0.08 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.523368E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.338 | TFLOPs: 11.28 | 7: iteration 90590/ 173500 | consumed samples: 23191040 | consumed tokens: 47495249920 | elapsed time per iteration (s): 0.08 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.526682E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.969 | TFLOPs: 11.59 | 7: iteration 90600/ 173500 | consumed samples: 23193600 | consumed tokens: 47500492800 | elapsed time per iteration (s): 0.08 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.527649E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.554 | TFLOPs: 11.92 | 7: iteration 90610/ 173500 | consumed samples: 23196160 | consumed tokens: 47505735680 | elapsed time per iteration (s): 0.09 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.521785E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.933 | TFLOPs: 11.13 | 7: iteration 90620/ 173500 | consumed samples: 23198720 | consumed tokens: 47510978560 | elapsed time per iteration (s): 0.08 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 4.516769E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.369 | TFLOPs: 11.65 | 7: iteration 90630/ 173500 | consumed samples: 23201280 | consumed tokens: 47516221440 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.529309E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.102 | TFLOPs: 11.37 | 7: iteration 90640/ 173500 | consumed samples: 23203840 | consumed tokens: 47521464320 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.529711E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.037 | TFLOPs: 11.67 | 7: iteration 90650/ 173500 | consumed samples: 23206400 | consumed tokens: 47526707200 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.525327E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.570 | TFLOPs: 11.96 | 7: iteration 90660/ 173500 | consumed samples: 23208960 | consumed tokens: 47531950080 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.527963E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.935 | TFLOPs: 11.37 | 7: iteration 90670/ 173500 | consumed samples: 23211520 | consumed tokens: 47537192960 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.525732E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.763 | TFLOPs: 11.91 | 7: iteration 90680/ 173500 | consumed samples: 23214080 | consumed tokens: 47542435840 | elapsed time per iteration (s): 0.08 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 4.520612E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.231 | TFLOPs: 11.62 | 7: iteration 90690/ 173500 | consumed samples: 23216640 | consumed tokens: 47547678720 | elapsed time per iteration (s): 0.09 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.527039E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.839 | TFLOPs: 10.14 | 7: iteration 90700/ 173500 | consumed samples: 23219200 | consumed tokens: 47552921600 | elapsed time per iteration (s): 0.11 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.529169E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2279.330 | TFLOPs: 8.48 | 7: iteration 90710/ 173500 | consumed samples: 23221760 | consumed tokens: 47558164480 | elapsed time per iteration (s): 0.11 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.517302E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.014 | TFLOPs: 8.56 | 7: iteration 90720/ 173500 | consumed samples: 23224320 | consumed tokens: 47563407360 | elapsed time per iteration (s): 0.11 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.524024E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.434 | TFLOPs: 8.59 | 7: iteration 90730/ 173500 | consumed samples: 23226880 | consumed tokens: 47568650240 | elapsed time per iteration (s): 0.11 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.518974E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.688 | TFLOPs: 8.62 | 7: iteration 90740/ 173500 | consumed samples: 23229440 | consumed tokens: 47573893120 | elapsed time per iteration (s): 0.11 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 4.521359E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2286.031 | TFLOPs: 8.50 | 7: iteration 90750/ 173500 | consumed samples: 23232000 | consumed tokens: 47579136000 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.505041E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2286.151 | TFLOPs: 8.50 | 7: iteration 90760/ 173500 | consumed samples: 23234560 | consumed tokens: 47584378880 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.530674E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.352 | TFLOPs: 8.50 | 7: iteration 90770/ 173500 | consumed samples: 23237120 | consumed tokens: 47589621760 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.527880E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.600 | TFLOPs: 8.72 | 7: iteration 90780/ 173500 | consumed samples: 23239680 | consumed tokens: 47594864640 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.519761E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.580 | TFLOPs: 8.53 | 7: iteration 90790/ 173500 | consumed samples: 23242240 | consumed tokens: 47600107520 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.532366E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.996 | TFLOPs: 8.50 | 7: iteration 90800/ 173500 | consumed samples: 23244800 | consumed tokens: 47605350400 | elapsed time per iteration (s): 0.11 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 4.525990E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.959 | TFLOPs: 8.50 | 7: iteration 90810/ 173500 | consumed samples: 23247360 | consumed tokens: 47610593280 | elapsed time per iteration (s): 0.12 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.511736E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2097.100 | TFLOPs: 7.80 | 7: iteration 90820/ 173500 | consumed samples: 23249920 | consumed tokens: 47615836160 | elapsed time per iteration (s): 0.08 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.529268E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.223 | TFLOPs: 11.93 | 7: iteration 90830/ 173500 | consumed samples: 23252480 | consumed tokens: 47621079040 | elapsed time per iteration (s): 0.09 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.520784E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.999 | TFLOPs: 10.38 | 7: iteration 90840/ 173500 | consumed samples: 23255040 | consumed tokens: 47626321920 | elapsed time per iteration (s): 0.10 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.515314E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.913 | TFLOPs: 9.48 | 7: iteration 90850/ 173500 | consumed samples: 23257600 | consumed tokens: 47631564800 | elapsed time per iteration (s): 0.08 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.523879E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.750 | TFLOPs: 11.93 | 7: iteration 90860/ 173500 | consumed samples: 23260160 | consumed tokens: 47636807680 | elapsed time per iteration (s): 0.08 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 4.528344E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.242 | TFLOPs: 11.95 | 7: iteration 90870/ 173500 | consumed samples: 23262720 | consumed tokens: 47642050560 | elapsed time per iteration (s): 0.09 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.521418E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2782.387 | TFLOPs: 10.35 | 7: iteration 90880/ 173500 | consumed samples: 23265280 | consumed tokens: 47647293440 | elapsed time per iteration (s): 0.13 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.540720E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.649 | TFLOPs: 7.53 | 7: iteration 90890/ 173500 | consumed samples: 23267840 | consumed tokens: 47652536320 | elapsed time per iteration (s): 0.11 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.513257E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.864 | TFLOPs: 8.32 | 7: iteration 90900/ 173500 | consumed samples: 23270400 | consumed tokens: 47657779200 | elapsed time per iteration (s): 0.10 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.527023E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2611.943 | TFLOPs: 9.72 | 7: iteration 90910/ 173500 | consumed samples: 23272960 | consumed tokens: 47663022080 | elapsed time per iteration (s): 0.08 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.515545E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.182 | TFLOPs: 12.00 | 7: iteration 90920/ 173500 | consumed samples: 23275520 | consumed tokens: 47668264960 | elapsed time per iteration (s): 0.08 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.536224E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.165 | TFLOPs: 11.99 | 7: iteration 90930/ 173500 | consumed samples: 23278080 | consumed tokens: 47673507840 | elapsed time per iteration (s): 0.08 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 4.532775E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.499 | TFLOPs: 11.84 | 7: iteration 90940/ 173500 | consumed samples: 23280640 | consumed tokens: 47678750720 | elapsed time per iteration (s): 0.08 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.534384E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.264 | TFLOPs: 11.97 | 7: iteration 90950/ 173500 | consumed samples: 23283200 | consumed tokens: 47683993600 | elapsed time per iteration (s): 0.09 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.523653E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2895.381 | TFLOPs: 10.77 | 7: iteration 90960/ 173500 | consumed samples: 23285760 | consumed tokens: 47689236480 | elapsed time per iteration (s): 0.10 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.526994E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.970 | TFLOPs: 9.82 | 7: iteration 90970/ 173500 | consumed samples: 23288320 | consumed tokens: 47694479360 | elapsed time per iteration (s): 0.08 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.535764E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.503 | TFLOPs: 11.63 | 7: iteration 90980/ 173500 | consumed samples: 23290880 | consumed tokens: 47699722240 | elapsed time per iteration (s): 0.08 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.516978E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.995 | TFLOPs: 12.00 | 7: iteration 90990/ 173500 | consumed samples: 23293440 | consumed tokens: 47704965120 | elapsed time per iteration (s): 0.08 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 4.519625E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.387 | TFLOPs: 11.97 | 7: iteration 91000/ 173500 | consumed samples: 23296000 | consumed tokens: 47710208000 | elapsed time per iteration (s): 0.08 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.517352E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.346 | TFLOPs: 11.95 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 91000 | lm loss value: 4.388981E+00 | lm loss PPL: 8.055828E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 91000 to checkpoints_14m91b100m 0: [2023-03-17 02:27:13,439] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step91000 is begin to save! 0: [2023-03-17 02:27:13,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:27:13,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:27:13,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:27:13,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:27:13,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:27:13,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:27:13,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:27:13,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:27:13,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:27:13,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:27:13,480] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:27:13,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:27:13,481] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step91000/mp_rank_00_model_states.pt 0: [2023-03-17 02:27:13,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:27:13,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:27:13,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:27:13,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 02:27:13,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:27:13,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:27:13,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:27:13,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2023-03-17 02:27:13,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:27:13,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:27:13,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: successfully saved checkpoint at iteration 91000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.89 7: iteration 91010/ 173500 | consumed samples: 23298560 | consumed tokens: 47715450880 | elapsed time per iteration (s): 0.09 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.520609E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2797.442 | TFLOPs: 10.41 | 7: iteration 91020/ 173500 | consumed samples: 23301120 | consumed tokens: 47720693760 | elapsed time per iteration (s): 0.08 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.523360E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.052 | TFLOPs: 11.85 | 7: iteration 91030/ 173500 | consumed samples: 23303680 | consumed tokens: 47725936640 | elapsed time per iteration (s): 0.08 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.512611E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.871 | TFLOPs: 11.84 | 7: iteration 91040/ 173500 | consumed samples: 23306240 | consumed tokens: 47731179520 | elapsed time per iteration (s): 0.08 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.532031E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.318 | TFLOPs: 11.76 | 7: iteration 91050/ 173500 | consumed samples: 23308800 | consumed tokens: 47736422400 | elapsed time per iteration (s): 0.08 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 4.524127E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.497 | TFLOPs: 11.99 | 7: iteration 91060/ 173500 | consumed samples: 23311360 | consumed tokens: 47741665280 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.527992E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.396 | TFLOPs: 11.99 | 7: iteration 91070/ 173500 | consumed samples: 23313920 | consumed tokens: 47746908160 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.512719E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.371 | TFLOPs: 11.96 | 7: iteration 91080/ 173500 | consumed samples: 23316480 | consumed tokens: 47752151040 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.522719E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.975 | TFLOPs: 12.00 | 7: iteration 91090/ 173500 | consumed samples: 23319040 | consumed tokens: 47757393920 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.522871E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.222 | TFLOPs: 12.02 | 7: iteration 91100/ 173500 | consumed samples: 23321600 | consumed tokens: 47762636800 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.524978E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.647 | TFLOPs: 12.00 | 7: iteration 91110/ 173500 | consumed samples: 23324160 | consumed tokens: 47767879680 | elapsed time per iteration (s): 0.08 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 4.524063E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.865 | TFLOPs: 11.88 | 7: iteration 91120/ 173500 | consumed samples: 23326720 | consumed tokens: 47773122560 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.514005E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.761 | TFLOPs: 11.89 | 7: iteration 91130/ 173500 | consumed samples: 23329280 | consumed tokens: 47778365440 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.531682E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.470 | TFLOPs: 11.61 | 7: iteration 91140/ 173500 | consumed samples: 23331840 | consumed tokens: 47783608320 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.530543E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.697 | TFLOPs: 11.87 | 7: iteration 91150/ 173500 | consumed samples: 23334400 | consumed tokens: 47788851200 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.532843E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.060 | TFLOPs: 11.87 | 7: iteration 91160/ 173500 | consumed samples: 23336960 | consumed tokens: 47794094080 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.526244E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.932 | TFLOPs: 11.88 | 7: iteration 91170/ 173500 | consumed samples: 23339520 | consumed tokens: 47799336960 | elapsed time per iteration (s): 0.08 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 4.515444E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.256 | TFLOPs: 11.86 | 7: iteration 91180/ 173500 | consumed samples: 23342080 | consumed tokens: 47804579840 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.518478E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.699 | TFLOPs: 11.58 | 7: iteration 91190/ 173500 | consumed samples: 23344640 | consumed tokens: 47809822720 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.528578E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.122 | TFLOPs: 11.84 | 7: iteration 91200/ 173500 | consumed samples: 23347200 | consumed tokens: 47815065600 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.520645E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.147 | TFLOPs: 11.56 | 7: iteration 91210/ 173500 | consumed samples: 23349760 | consumed tokens: 47820308480 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.535960E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.054 | TFLOPs: 11.86 | 7: iteration 91220/ 173500 | consumed samples: 23352320 | consumed tokens: 47825551360 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.519769E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.238 | TFLOPs: 11.73 | 7: iteration 91230/ 173500 | consumed samples: 23354880 | consumed tokens: 47830794240 | elapsed time per iteration (s): 0.08 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 4.522028E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.201 | TFLOPs: 11.89 | 7: iteration 91240/ 173500 | consumed samples: 23357440 | consumed tokens: 47836037120 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.518265E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.830 | TFLOPs: 11.62 | 7: iteration 91250/ 173500 | consumed samples: 23360000 | consumed tokens: 47841280000 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.523150E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.369 | TFLOPs: 11.82 | 7: iteration 91260/ 173500 | consumed samples: 23362560 | consumed tokens: 47846522880 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.509175E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.461 | TFLOPs: 11.88 | 7: iteration 91270/ 173500 | consumed samples: 23365120 | consumed tokens: 47851765760 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.509251E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.149 | TFLOPs: 11.90 | 7: iteration 91280/ 173500 | consumed samples: 23367680 | consumed tokens: 47857008640 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.520506E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.497 | TFLOPs: 11.90 | 7: iteration 91290/ 173500 | consumed samples: 23370240 | consumed tokens: 47862251520 | elapsed time per iteration (s): 0.08 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 4.516961E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.667 | TFLOPs: 11.85 | 7: iteration 91300/ 173500 | consumed samples: 23372800 | consumed tokens: 47867494400 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.524539E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.545 | TFLOPs: 11.84 | 7: iteration 91310/ 173500 | consumed samples: 23375360 | consumed tokens: 47872737280 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.523775E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.595 | TFLOPs: 11.88 | 7: iteration 91320/ 173500 | consumed samples: 23377920 | consumed tokens: 47877980160 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.529987E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.607 | TFLOPs: 11.90 | 7: iteration 91330/ 173500 | consumed samples: 23380480 | consumed tokens: 47883223040 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.535928E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.145 | TFLOPs: 11.86 | 7: iteration 91340/ 173500 | consumed samples: 23383040 | consumed tokens: 47888465920 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.514993E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.444 | TFLOPs: 11.76 | 7: iteration 91350/ 173500 | consumed samples: 23385600 | consumed tokens: 47893708800 | elapsed time per iteration (s): 0.08 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 4.530453E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.356 | TFLOPs: 11.60 | 7: iteration 91360/ 173500 | consumed samples: 23388160 | consumed tokens: 47898951680 | elapsed time per iteration (s): 0.08 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.524496E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.485 | TFLOPs: 11.70 | 7: iteration 91370/ 173500 | consumed samples: 23390720 | consumed tokens: 47904194560 | elapsed time per iteration (s): 0.09 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.527433E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.802 | TFLOPs: 10.82 | 7: iteration 91380/ 173500 | consumed samples: 23393280 | consumed tokens: 47909437440 | elapsed time per iteration (s): 0.08 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.518550E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.350 | TFLOPs: 11.57 | 7: iteration 91390/ 173500 | consumed samples: 23395840 | consumed tokens: 47914680320 | elapsed time per iteration (s): 0.09 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.504393E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.512 | TFLOPs: 11.05 | 7: iteration 91400/ 173500 | consumed samples: 23398400 | consumed tokens: 47919923200 | elapsed time per iteration (s): 0.13 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.524360E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1970.079 | TFLOPs: 7.33 | 7: iteration 91410/ 173500 | consumed samples: 23400960 | consumed tokens: 47925166080 | elapsed time per iteration (s): 0.08 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 4.519539E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.243 | TFLOPs: 11.93 | 7: iteration 91420/ 173500 | consumed samples: 23403520 | consumed tokens: 47930408960 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.530034E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.683 | TFLOPs: 11.96 | 7: iteration 91430/ 173500 | consumed samples: 23406080 | consumed tokens: 47935651840 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.531841E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.753 | TFLOPs: 11.99 | 7: iteration 91440/ 173500 | consumed samples: 23408640 | consumed tokens: 47940894720 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.517476E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.907 | TFLOPs: 12.00 | 7: iteration 91450/ 173500 | consumed samples: 23411200 | consumed tokens: 47946137600 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.523460E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.066 | TFLOPs: 11.72 | 7: iteration 91460/ 173500 | consumed samples: 23413760 | consumed tokens: 47951380480 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.530269E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.123 | TFLOPs: 11.96 | 7: iteration 91470/ 173500 | consumed samples: 23416320 | consumed tokens: 47956623360 | elapsed time per iteration (s): 0.08 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 4.523411E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.501 | TFLOPs: 12.00 | 7: iteration 91480/ 173500 | consumed samples: 23418880 | consumed tokens: 47961866240 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.491784E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.383 | TFLOPs: 12.02 | 7: iteration 91490/ 173500 | consumed samples: 23421440 | consumed tokens: 47967109120 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.519530E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.396 | TFLOPs: 12.00 | 7: iteration 91500/ 173500 | consumed samples: 23424000 | consumed tokens: 47972352000 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.524516E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.575 | TFLOPs: 11.98 | 7: iteration 91510/ 173500 | consumed samples: 23426560 | consumed tokens: 47977594880 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.528573E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.890 | TFLOPs: 11.94 | 7: iteration 91520/ 173500 | consumed samples: 23429120 | consumed tokens: 47982837760 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.531734E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.777 | TFLOPs: 11.71 | 7: iteration 91530/ 173500 | consumed samples: 23431680 | consumed tokens: 47988080640 | elapsed time per iteration (s): 0.08 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 4.529589E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.353 | TFLOPs: 12.00 | 7: iteration 91540/ 173500 | consumed samples: 23434240 | consumed tokens: 47993323520 | elapsed time per iteration (s): 0.08 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.524445E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.947 | TFLOPs: 11.48 | 7: iteration 91550/ 173500 | consumed samples: 23436800 | consumed tokens: 47998566400 | elapsed time per iteration (s): 0.08 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.522765E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.660 | TFLOPs: 11.72 | 7: iteration 91560/ 173500 | consumed samples: 23439360 | consumed tokens: 48003809280 | elapsed time per iteration (s): 0.09 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.533835E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.186 | TFLOPs: 10.74 | 7: iteration 91570/ 173500 | consumed samples: 23441920 | consumed tokens: 48009052160 | elapsed time per iteration (s): 0.10 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.525310E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.230 | TFLOPs: 9.64 | 7: iteration 91580/ 173500 | consumed samples: 23444480 | consumed tokens: 48014295040 | elapsed time per iteration (s): 0.10 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.529766E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.136 | TFLOPs: 9.19 | 7: iteration 91590/ 173500 | consumed samples: 23447040 | consumed tokens: 48019537920 | elapsed time per iteration (s): 0.09 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.519905E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.877 | TFLOPs: 11.15 | 7: iteration 91600/ 173500 | consumed samples: 23449600 | consumed tokens: 48024780800 | elapsed time per iteration (s): 0.08 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 4.525095E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.304 | TFLOPs: 11.98 | 7: iteration 91610/ 173500 | consumed samples: 23452160 | consumed tokens: 48030023680 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.513782E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.181 | TFLOPs: 12.04 | 7: iteration 91620/ 173500 | consumed samples: 23454720 | consumed tokens: 48035266560 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.517998E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.060 | TFLOPs: 12.05 | 7: iteration 91630/ 173500 | consumed samples: 23457280 | consumed tokens: 48040509440 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.521557E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.347 | TFLOPs: 12.03 | 7: iteration 91640/ 173500 | consumed samples: 23459840 | consumed tokens: 48045752320 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.532879E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.098 | TFLOPs: 12.01 | 7: iteration 91650/ 173500 | consumed samples: 23462400 | consumed tokens: 48050995200 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.507901E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.695 | TFLOPs: 12.05 | 7: iteration 91660/ 173500 | consumed samples: 23464960 | consumed tokens: 48056238080 | elapsed time per iteration (s): 0.08 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 4.521008E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.944 | TFLOPs: 11.97 | 7: iteration 91670/ 173500 | consumed samples: 23467520 | consumed tokens: 48061480960 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.515802E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.997 | TFLOPs: 11.97 | 7: iteration 91680/ 173500 | consumed samples: 23470080 | consumed tokens: 48066723840 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.523172E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.771 | TFLOPs: 12.02 | 7: iteration 91690/ 173500 | consumed samples: 23472640 | consumed tokens: 48071966720 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.541184E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.616 | TFLOPs: 12.01 | 7: iteration 91700/ 173500 | consumed samples: 23475200 | consumed tokens: 48077209600 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.518536E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.908 | TFLOPs: 12.05 | 7: iteration 91710/ 173500 | consumed samples: 23477760 | consumed tokens: 48082452480 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.531025E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.396 | TFLOPs: 11.96 | 7: iteration 91720/ 173500 | consumed samples: 23480320 | consumed tokens: 48087695360 | elapsed time per iteration (s): 0.08 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 4.522792E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.789 | TFLOPs: 12.05 | 7: iteration 91730/ 173500 | consumed samples: 23482880 | consumed tokens: 48092938240 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.520538E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.948 | TFLOPs: 12.02 | 7: iteration 91740/ 173500 | consumed samples: 23485440 | consumed tokens: 48098181120 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.523275E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.471 | TFLOPs: 12.02 | 7: iteration 91750/ 173500 | consumed samples: 23488000 | consumed tokens: 48103424000 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.522068E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.147 | TFLOPs: 11.91 | 7: iteration 91760/ 173500 | consumed samples: 23490560 | consumed tokens: 48108666880 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.515453E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.095 | TFLOPs: 11.93 | 7: iteration 91770/ 173500 | consumed samples: 23493120 | consumed tokens: 48113909760 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.520529E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.520 | TFLOPs: 12.03 | 7: iteration 91780/ 173500 | consumed samples: 23495680 | consumed tokens: 48119152640 | elapsed time per iteration (s): 0.08 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 4.520000E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.119 | TFLOPs: 12.00 | 7: iteration 91790/ 173500 | consumed samples: 23498240 | consumed tokens: 48124395520 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.523939E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.694 | TFLOPs: 12.04 | 7: iteration 91800/ 173500 | consumed samples: 23500800 | consumed tokens: 48129638400 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.517010E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.020 | TFLOPs: 12.06 | 7: iteration 91810/ 173500 | consumed samples: 23503360 | consumed tokens: 48134881280 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.516875E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.459 | TFLOPs: 12.03 | 7: iteration 91820/ 173500 | consumed samples: 23505920 | consumed tokens: 48140124160 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.524437E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.033 | TFLOPs: 12.00 | 7: iteration 91830/ 173500 | consumed samples: 23508480 | consumed tokens: 48145367040 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.533919E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.676 | TFLOPs: 11.76 | 7: iteration 91840/ 173500 | consumed samples: 23511040 | consumed tokens: 48150609920 | elapsed time per iteration (s): 0.08 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 4.535701E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.119 | TFLOPs: 12.03 | 7: iteration 91850/ 173500 | consumed samples: 23513600 | consumed tokens: 48155852800 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.527077E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.789 | TFLOPs: 11.76 | 7: iteration 91860/ 173500 | consumed samples: 23516160 | consumed tokens: 48161095680 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.537557E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.796 | TFLOPs: 11.99 | 7: iteration 91870/ 173500 | consumed samples: 23518720 | consumed tokens: 48166338560 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.526612E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.898 | TFLOPs: 12.02 | 7: iteration 91880/ 173500 | consumed samples: 23521280 | consumed tokens: 48171581440 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.537643E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.357 | TFLOPs: 11.78 | 7: iteration 91890/ 173500 | consumed samples: 23523840 | consumed tokens: 48176824320 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.526777E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.543 | TFLOPs: 12.02 | 7: iteration 91900/ 173500 | consumed samples: 23526400 | consumed tokens: 48182067200 | elapsed time per iteration (s): 0.08 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 4.535799E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.249 | TFLOPs: 12.03 | 7: iteration 91910/ 173500 | consumed samples: 23528960 | consumed tokens: 48187310080 | elapsed time per iteration (s): 0.08 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.525088E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.093 | TFLOPs: 11.97 | 7: iteration 91920/ 173500 | consumed samples: 23531520 | consumed tokens: 48192552960 | elapsed time per iteration (s): 0.08 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.527734E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.734 | TFLOPs: 11.62 | 7: iteration 91930/ 173500 | consumed samples: 23534080 | consumed tokens: 48197795840 | elapsed time per iteration (s): 0.09 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.518468E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2846.237 | TFLOPs: 10.59 | 7: iteration 91940/ 173500 | consumed samples: 23536640 | consumed tokens: 48203038720 | elapsed time per iteration (s): 0.09 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.536966E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.618 | TFLOPs: 10.13 | 7: iteration 91950/ 173500 | consumed samples: 23539200 | consumed tokens: 48208281600 | elapsed time per iteration (s): 0.08 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.517670E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.559 | TFLOPs: 11.87 | 7: iteration 91960/ 173500 | consumed samples: 23541760 | consumed tokens: 48213524480 | elapsed time per iteration (s): 0.08 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 4.520768E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.137 | TFLOPs: 11.77 | 7: iteration 91970/ 173500 | consumed samples: 23544320 | consumed tokens: 48218767360 | elapsed time per iteration (s): 0.08 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.528342E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.931 | TFLOPs: 11.84 | 7: iteration 91980/ 173500 | consumed samples: 23546880 | consumed tokens: 48224010240 | elapsed time per iteration (s): 0.08 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.527150E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.737 | TFLOPs: 11.89 | 7: iteration 91990/ 173500 | consumed samples: 23549440 | consumed tokens: 48229253120 | elapsed time per iteration (s): 0.08 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.510489E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.497 | TFLOPs: 11.80 | 0: [2023-03-17 02:28:35,024] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=0, lr=[0.0001027941492351335, 0.0001027941492351335, 0.0001027941492351335], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 92000/ 173500 | consumed samples: 23552000 | consumed tokens: 48234496000 | elapsed time per iteration (s): 0.08 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.525547E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.936 | TFLOPs: 11.87 | 0: steps: 92000 loss: 4.5318 iter time (s): 0.086 samples/sec: 2991.508 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 92000 | lm loss value: 4.391869E+00 | lm loss PPL: 8.079124E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 92000 to checkpoints_14m91b100m 0: [2023-03-17 02:28:35,082] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step92000 is begin to save! 0: [2023-03-17 02:28:35,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:28:35,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:28:35,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:28:35,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:28:35,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:28:35,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:28:35,118] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:28:35,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:28:35,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:28:35,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:28:35,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:28:35,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:28:35,125] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step92000/mp_rank_00_model_states.pt 0: [2023-03-17 02:28:35,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:28:35,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:28:35,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:28:35,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 02:28:35,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2023-03-17 02:28:35,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: successfully saved checkpoint at iteration 92000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.51 7: iteration 92010/ 173500 | consumed samples: 23554560 | consumed tokens: 48239738880 | elapsed time per iteration (s): 0.09 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.540688E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.377 | TFLOPs: 10.22 | 7: iteration 92020/ 173500 | consumed samples: 23557120 | consumed tokens: 48244981760 | elapsed time per iteration (s): 0.08 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 4.519358E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.711 | TFLOPs: 11.90 | 7: iteration 92030/ 173500 | consumed samples: 23559680 | consumed tokens: 48250224640 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.531210E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.981 | TFLOPs: 11.90 | 7: iteration 92040/ 173500 | consumed samples: 23562240 | consumed tokens: 48255467520 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.512401E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.456 | TFLOPs: 11.76 | 7: iteration 92050/ 173500 | consumed samples: 23564800 | consumed tokens: 48260710400 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.534583E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.074 | TFLOPs: 11.87 | 7: iteration 92060/ 173500 | consumed samples: 23567360 | consumed tokens: 48265953280 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.513795E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.692 | TFLOPs: 11.88 | 7: iteration 92070/ 173500 | consumed samples: 23569920 | consumed tokens: 48271196160 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.517039E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.026 | TFLOPs: 11.87 | 7: iteration 92080/ 173500 | consumed samples: 23572480 | consumed tokens: 48276439040 | elapsed time per iteration (s): 0.08 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 4.521576E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.515 | TFLOPs: 11.86 | 7: iteration 92090/ 173500 | consumed samples: 23575040 | consumed tokens: 48281681920 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.519683E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.559 | TFLOPs: 11.83 | 7: iteration 92100/ 173500 | consumed samples: 23577600 | consumed tokens: 48286924800 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.526596E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.493 | TFLOPs: 11.84 | 7: iteration 92110/ 173500 | consumed samples: 23580160 | consumed tokens: 48292167680 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.517531E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.155 | TFLOPs: 11.84 | 7: iteration 92120/ 173500 | consumed samples: 23582720 | consumed tokens: 48297410560 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.508236E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.685 | TFLOPs: 11.87 | 7: iteration 92130/ 173500 | consumed samples: 23585280 | consumed tokens: 48302653440 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.525871E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.534 | TFLOPs: 11.86 | 7: iteration 92140/ 173500 | consumed samples: 23587840 | consumed tokens: 48307896320 | elapsed time per iteration (s): 0.08 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 4.515168E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.107 | TFLOPs: 11.87 | 7: iteration 92150/ 173500 | consumed samples: 23590400 | consumed tokens: 48313139200 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.523110E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.905 | TFLOPs: 11.87 | 7: iteration 92160/ 173500 | consumed samples: 23592960 | consumed tokens: 48318382080 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.527610E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.143 | TFLOPs: 11.81 | 7: iteration 92170/ 173500 | consumed samples: 23595520 | consumed tokens: 48323624960 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.532613E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.371 | TFLOPs: 11.58 | 7: iteration 92180/ 173500 | consumed samples: 23598080 | consumed tokens: 48328867840 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.538566E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.923 | TFLOPs: 11.90 | 7: iteration 92190/ 173500 | consumed samples: 23600640 | consumed tokens: 48334110720 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.519360E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.097 | TFLOPs: 11.88 | 7: iteration 92200/ 173500 | consumed samples: 23603200 | consumed tokens: 48339353600 | elapsed time per iteration (s): 0.08 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 4.522303E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.973 | TFLOPs: 11.85 | 7: iteration 92210/ 173500 | consumed samples: 23605760 | consumed tokens: 48344596480 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.511851E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.920 | TFLOPs: 11.80 | 7: iteration 92220/ 173500 | consumed samples: 23608320 | consumed tokens: 48349839360 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.516235E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.694 | TFLOPs: 11.86 | 7: iteration 92230/ 173500 | consumed samples: 23610880 | consumed tokens: 48355082240 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.540751E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.382 | TFLOPs: 11.89 | 7: iteration 92240/ 173500 | consumed samples: 23613440 | consumed tokens: 48360325120 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.511094E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.805 | TFLOPs: 11.88 | 7: iteration 92250/ 173500 | consumed samples: 23616000 | consumed tokens: 48365568000 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.522823E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.156 | TFLOPs: 11.87 | 7: iteration 92260/ 173500 | consumed samples: 23618560 | consumed tokens: 48370810880 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.516145E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.084 | TFLOPs: 11.87 | 7: iteration 92270/ 173500 | consumed samples: 23621120 | consumed tokens: 48376053760 | elapsed time per iteration (s): 0.08 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 4.526997E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.661 | TFLOPs: 11.84 | 7: iteration 92280/ 173500 | consumed samples: 23623680 | consumed tokens: 48381296640 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.521136E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.561 | TFLOPs: 11.86 | 7: iteration 92290/ 173500 | consumed samples: 23626240 | consumed tokens: 48386539520 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.506960E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.075 | TFLOPs: 11.89 | 7: iteration 92300/ 173500 | consumed samples: 23628800 | consumed tokens: 48391782400 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.519142E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.505 | TFLOPs: 11.87 | 7: iteration 92310/ 173500 | consumed samples: 23631360 | consumed tokens: 48397025280 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.522347E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.213 | TFLOPs: 11.86 | 7: iteration 92320/ 173500 | consumed samples: 23633920 | consumed tokens: 48402268160 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.531426E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.308 | TFLOPs: 11.86 | 7: iteration 92330/ 173500 | consumed samples: 23636480 | consumed tokens: 48407511040 | elapsed time per iteration (s): 0.08 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 4.524887E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.037 | TFLOPs: 11.61 | 7: iteration 92340/ 173500 | consumed samples: 23639040 | consumed tokens: 48412753920 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.525708E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.062 | TFLOPs: 11.89 | 7: iteration 92350/ 173500 | consumed samples: 23641600 | consumed tokens: 48417996800 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.518218E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.320 | TFLOPs: 11.87 | 7: iteration 92360/ 173500 | consumed samples: 23644160 | consumed tokens: 48423239680 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.527934E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.234 | TFLOPs: 11.89 | 7: iteration 92370/ 173500 | consumed samples: 23646720 | consumed tokens: 48428482560 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.516042E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.976 | TFLOPs: 11.83 | 7: iteration 92380/ 173500 | consumed samples: 23649280 | consumed tokens: 48433725440 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.528498E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.268 | TFLOPs: 11.76 | 7: iteration 92390/ 173500 | consumed samples: 23651840 | consumed tokens: 48438968320 | elapsed time per iteration (s): 0.08 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 4.518705E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.572 | TFLOPs: 11.85 | 7: iteration 92400/ 173500 | consumed samples: 23654400 | consumed tokens: 48444211200 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.522736E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.084 | TFLOPs: 11.85 | 7: iteration 92410/ 173500 | consumed samples: 23656960 | consumed tokens: 48449454080 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.526171E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.847 | TFLOPs: 11.89 | 7: iteration 92420/ 173500 | consumed samples: 23659520 | consumed tokens: 48454696960 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.529185E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.825 | TFLOPs: 11.91 | 7: iteration 92430/ 173500 | consumed samples: 23662080 | consumed tokens: 48459939840 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.517638E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.979 | TFLOPs: 11.89 | 7: iteration 92440/ 173500 | consumed samples: 23664640 | consumed tokens: 48465182720 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.513462E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.865 | TFLOPs: 11.88 | 7: iteration 92450/ 173500 | consumed samples: 23667200 | consumed tokens: 48470425600 | elapsed time per iteration (s): 0.08 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 4.539243E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.794 | TFLOPs: 11.91 | 7: iteration 92460/ 173500 | consumed samples: 23669760 | consumed tokens: 48475668480 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.534049E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.488 | TFLOPs: 11.86 | 7: iteration 92470/ 173500 | consumed samples: 23672320 | consumed tokens: 48480911360 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.515563E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.380 | TFLOPs: 11.88 | 7: iteration 92480/ 173500 | consumed samples: 23674880 | consumed tokens: 48486154240 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.514530E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.254 | TFLOPs: 11.86 | 7: iteration 92490/ 173500 | consumed samples: 23677440 | consumed tokens: 48491397120 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.529062E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.403 | TFLOPs: 11.90 | 7: iteration 92500/ 173500 | consumed samples: 23680000 | consumed tokens: 48496640000 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.530940E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.447 | TFLOPs: 11.86 | 7: iteration 92510/ 173500 | consumed samples: 23682560 | consumed tokens: 48501882880 | elapsed time per iteration (s): 0.08 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 4.521169E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.031 | TFLOPs: 11.90 | 7: iteration 92520/ 173500 | consumed samples: 23685120 | consumed tokens: 48507125760 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.534886E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.111 | TFLOPs: 11.75 | 7: iteration 92530/ 173500 | consumed samples: 23687680 | consumed tokens: 48512368640 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.523140E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.968 | TFLOPs: 11.60 | 7: iteration 92540/ 173500 | consumed samples: 23690240 | consumed tokens: 48517611520 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.520414E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.440 | TFLOPs: 11.88 | 7: iteration 92550/ 173500 | consumed samples: 23692800 | consumed tokens: 48522854400 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.518665E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.244 | TFLOPs: 11.91 | 7: iteration 92560/ 173500 | consumed samples: 23695360 | consumed tokens: 48528097280 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.520775E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.806 | TFLOPs: 11.91 | 7: iteration 92570/ 173500 | consumed samples: 23697920 | consumed tokens: 48533340160 | elapsed time per iteration (s): 0.08 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 4.514192E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.937 | TFLOPs: 11.89 | 7: iteration 92580/ 173500 | consumed samples: 23700480 | consumed tokens: 48538583040 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.527586E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.076 | TFLOPs: 11.89 | 7: iteration 92590/ 173500 | consumed samples: 23703040 | consumed tokens: 48543825920 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.528239E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.106 | TFLOPs: 11.59 | 7: iteration 92600/ 173500 | consumed samples: 23705600 | consumed tokens: 48549068800 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.525532E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.973 | TFLOPs: 11.63 | 7: iteration 92610/ 173500 | consumed samples: 23708160 | consumed tokens: 48554311680 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.513073E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.024 | TFLOPs: 11.88 | 7: iteration 92620/ 173500 | consumed samples: 23710720 | consumed tokens: 48559554560 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.520462E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.966 | TFLOPs: 11.40 | 7: iteration 92630/ 173500 | consumed samples: 23713280 | consumed tokens: 48564797440 | elapsed time per iteration (s): 0.08 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 4.515089E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.317 | TFLOPs: 11.87 | 7: iteration 92640/ 173500 | consumed samples: 23715840 | consumed tokens: 48570040320 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.526477E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.763 | TFLOPs: 11.82 | 7: iteration 92650/ 173500 | consumed samples: 23718400 | consumed tokens: 48575283200 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.514988E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.955 | TFLOPs: 11.83 | 7: iteration 92660/ 173500 | consumed samples: 23720960 | consumed tokens: 48580526080 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.518805E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.166 | TFLOPs: 11.78 | 7: iteration 92670/ 173500 | consumed samples: 23723520 | consumed tokens: 48585768960 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.514820E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.024 | TFLOPs: 11.91 | 7: iteration 92680/ 173500 | consumed samples: 23726080 | consumed tokens: 48591011840 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.531093E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.440 | TFLOPs: 11.88 | 7: iteration 92690/ 173500 | consumed samples: 23728640 | consumed tokens: 48596254720 | elapsed time per iteration (s): 0.08 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 4.517005E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.175 | TFLOPs: 11.88 | 7: iteration 92700/ 173500 | consumed samples: 23731200 | consumed tokens: 48601497600 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.540964E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.473 | TFLOPs: 11.81 | 7: iteration 92710/ 173500 | consumed samples: 23733760 | consumed tokens: 48606740480 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.535049E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.373 | TFLOPs: 11.89 | 7: iteration 92720/ 173500 | consumed samples: 23736320 | consumed tokens: 48611983360 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.537252E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.636 | TFLOPs: 11.90 | 7: iteration 92730/ 173500 | consumed samples: 23738880 | consumed tokens: 48617226240 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.522028E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.615 | TFLOPs: 11.88 | 7: iteration 92740/ 173500 | consumed samples: 23741440 | consumed tokens: 48622469120 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.519482E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.532 | TFLOPs: 11.60 | 7: iteration 92750/ 173500 | consumed samples: 23744000 | consumed tokens: 48627712000 | elapsed time per iteration (s): 0.08 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 4.516661E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.230 | TFLOPs: 11.92 | 7: iteration 92760/ 173500 | consumed samples: 23746560 | consumed tokens: 48632954880 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.527531E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.845 | TFLOPs: 11.90 | 7: iteration 92770/ 173500 | consumed samples: 23749120 | consumed tokens: 48638197760 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.518004E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.117 | TFLOPs: 11.80 | 7: iteration 92780/ 173500 | consumed samples: 23751680 | consumed tokens: 48643440640 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.529301E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.354 | TFLOPs: 11.86 | 7: iteration 92790/ 173500 | consumed samples: 23754240 | consumed tokens: 48648683520 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.519243E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.563 | TFLOPs: 11.83 | 7: iteration 92800/ 173500 | consumed samples: 23756800 | consumed tokens: 48653926400 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.523077E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.239 | TFLOPs: 11.90 | 7: iteration 92810/ 173500 | consumed samples: 23759360 | consumed tokens: 48659169280 | elapsed time per iteration (s): 0.08 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 4.534906E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.400 | TFLOPs: 11.87 | 7: iteration 92820/ 173500 | consumed samples: 23761920 | consumed tokens: 48664412160 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.519337E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.521 | TFLOPs: 11.86 | 7: iteration 92830/ 173500 | consumed samples: 23764480 | consumed tokens: 48669655040 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.531177E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.684 | TFLOPs: 11.89 | 7: iteration 92840/ 173500 | consumed samples: 23767040 | consumed tokens: 48674897920 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.529221E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.389 | TFLOPs: 11.92 | 7: iteration 92850/ 173500 | consumed samples: 23769600 | consumed tokens: 48680140800 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.524490E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.231 | TFLOPs: 11.85 | 7: iteration 92860/ 173500 | consumed samples: 23772160 | consumed tokens: 48685383680 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.528018E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.420 | TFLOPs: 11.89 | 7: iteration 92870/ 173500 | consumed samples: 23774720 | consumed tokens: 48690626560 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.521006E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.914 | TFLOPs: 11.89 | 7: iteration 92880/ 173500 | consumed samples: 23777280 | consumed tokens: 48695869440 | elapsed time per iteration (s): 0.08 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 4.532973E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.837 | TFLOPs: 11.56 | 7: iteration 92890/ 173500 | consumed samples: 23779840 | consumed tokens: 48701112320 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.521168E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.220 | TFLOPs: 11.85 | 7: iteration 92900/ 173500 | consumed samples: 23782400 | consumed tokens: 48706355200 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.527492E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.021 | TFLOPs: 11.84 | 7: iteration 92910/ 173500 | consumed samples: 23784960 | consumed tokens: 48711598080 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.537771E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.140 | TFLOPs: 11.61 | 7: iteration 92920/ 173500 | consumed samples: 23787520 | consumed tokens: 48716840960 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.529103E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.044 | TFLOPs: 11.84 | 7: iteration 92930/ 173500 | consumed samples: 23790080 | consumed tokens: 48722083840 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.518144E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.012 | TFLOPs: 11.88 | 7: iteration 92940/ 173500 | consumed samples: 23792640 | consumed tokens: 48727326720 | elapsed time per iteration (s): 0.08 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 4.519841E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.682 | TFLOPs: 11.80 | 7: iteration 92950/ 173500 | consumed samples: 23795200 | consumed tokens: 48732569600 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.522117E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.362 | TFLOPs: 11.59 | 7: iteration 92960/ 173500 | consumed samples: 23797760 | consumed tokens: 48737812480 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.520395E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.542 | TFLOPs: 11.89 | 7: iteration 92970/ 173500 | consumed samples: 23800320 | consumed tokens: 48743055360 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.523019E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.692 | TFLOPs: 11.88 | 7: iteration 92980/ 173500 | consumed samples: 23802880 | consumed tokens: 48748298240 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.522095E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.758 | TFLOPs: 11.88 | 7: iteration 92990/ 173500 | consumed samples: 23805440 | consumed tokens: 48753541120 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.527399E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.438 | TFLOPs: 11.63 | 7: iteration 93000/ 173500 | consumed samples: 23808000 | consumed tokens: 48758784000 | elapsed time per iteration (s): 0.08 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 4.517770E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.632 | TFLOPs: 11.58 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 93000 | lm loss value: 4.421353E+00 | lm loss PPL: 8.320882E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 93000 to checkpoints_14m91b100m 0: [2023-03-17 02:29:55,684] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step93000 is begin to save! 0: [2023-03-17 02:29:55,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:29:55,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:29:55,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:29:55,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:29:55,726] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:29:55,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:29:55,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:29:55,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:29:55,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:29:55,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:29:55,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:29:55,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:29:55,736] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step93000/mp_rank_00_model_states.pt 0: [2023-03-17 02:29:55,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:29:55,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:29:55,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:29:55,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:29:55,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: successfully saved checkpoint at iteration 93000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 89.87 7: iteration 93010/ 173500 | consumed samples: 23810560 | consumed tokens: 48764026880 | elapsed time per iteration (s): 0.09 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.520839E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.011 | TFLOPs: 10.39 | 7: iteration 93020/ 173500 | consumed samples: 23813120 | consumed tokens: 48769269760 | elapsed time per iteration (s): 0.08 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.518464E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.835 | TFLOPs: 12.04 | 7: iteration 93030/ 173500 | consumed samples: 23815680 | consumed tokens: 48774512640 | elapsed time per iteration (s): 0.08 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.516855E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.500 | TFLOPs: 12.03 | 7: iteration 93040/ 173500 | consumed samples: 23818240 | consumed tokens: 48779755520 | elapsed time per iteration (s): 0.08 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.528290E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.853 | TFLOPs: 11.99 | 7: iteration 93050/ 173500 | consumed samples: 23820800 | consumed tokens: 48784998400 | elapsed time per iteration (s): 0.08 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.523236E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.971 | TFLOPs: 12.03 | 7: iteration 93060/ 173500 | consumed samples: 23823360 | consumed tokens: 48790241280 | elapsed time per iteration (s): 0.08 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 4.517021E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.336 | TFLOPs: 12.03 | 7: iteration 93070/ 173500 | consumed samples: 23825920 | consumed tokens: 48795484160 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.527607E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.521 | TFLOPs: 11.99 | 7: iteration 93080/ 173500 | consumed samples: 23828480 | consumed tokens: 48800727040 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.514209E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.821 | TFLOPs: 11.94 | 7: iteration 93090/ 173500 | consumed samples: 23831040 | consumed tokens: 48805969920 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.515968E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.208 | TFLOPs: 12.01 | 7: iteration 93100/ 173500 | consumed samples: 23833600 | consumed tokens: 48811212800 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.527008E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.480 | TFLOPs: 11.74 | 7: iteration 93110/ 173500 | consumed samples: 23836160 | consumed tokens: 48816455680 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.519006E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.518 | TFLOPs: 11.69 | 7: iteration 93120/ 173500 | consumed samples: 23838720 | consumed tokens: 48821698560 | elapsed time per iteration (s): 0.08 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 4.540144E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.315 | TFLOPs: 11.99 | 7: iteration 93130/ 173500 | consumed samples: 23841280 | consumed tokens: 48826941440 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.510567E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.885 | TFLOPs: 11.72 | 7: iteration 93140/ 173500 | consumed samples: 23843840 | consumed tokens: 48832184320 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.525877E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.730 | TFLOPs: 11.62 | 7: iteration 93150/ 173500 | consumed samples: 23846400 | consumed tokens: 48837427200 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.526206E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.099 | TFLOPs: 12.03 | 7: iteration 93160/ 173500 | consumed samples: 23848960 | consumed tokens: 48842670080 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.519464E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.938 | TFLOPs: 12.04 | 7: iteration 93170/ 173500 | consumed samples: 23851520 | consumed tokens: 48847912960 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.517677E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.448 | TFLOPs: 12.03 | 7: iteration 93180/ 173500 | consumed samples: 23854080 | consumed tokens: 48853155840 | elapsed time per iteration (s): 0.08 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 4.528158E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.494 | TFLOPs: 11.73 | 7: iteration 93190/ 173500 | consumed samples: 23856640 | consumed tokens: 48858398720 | elapsed time per iteration (s): 0.08 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.524360E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.794 | TFLOPs: 11.91 | 7: iteration 93200/ 173500 | consumed samples: 23859200 | consumed tokens: 48863641600 | elapsed time per iteration (s): 0.08 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.527667E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.891 | TFLOPs: 11.91 | 7: iteration 93210/ 173500 | consumed samples: 23861760 | consumed tokens: 48868884480 | elapsed time per iteration (s): 0.08 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.517260E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.769 | TFLOPs: 11.61 | 7: iteration 93220/ 173500 | consumed samples: 23864320 | consumed tokens: 48874127360 | elapsed time per iteration (s): 0.08 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.525278E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.392 | TFLOPs: 11.60 | 7: iteration 93230/ 173500 | consumed samples: 23866880 | consumed tokens: 48879370240 | elapsed time per iteration (s): 0.08 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.522385E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.995 | TFLOPs: 11.93 | 7: iteration 93240/ 173500 | consumed samples: 23869440 | consumed tokens: 48884613120 | elapsed time per iteration (s): 0.10 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 4.530032E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.090 | TFLOPs: 9.29 | 7: iteration 93250/ 173500 | consumed samples: 23872000 | consumed tokens: 48889856000 | elapsed time per iteration (s): 0.11 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.527684E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.355 | TFLOPs: 8.41 | 7: iteration 93260/ 173500 | consumed samples: 23874560 | consumed tokens: 48895098880 | elapsed time per iteration (s): 0.09 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.526529E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2775.210 | TFLOPs: 10.32 | 7: iteration 93270/ 173500 | consumed samples: 23877120 | consumed tokens: 48900341760 | elapsed time per iteration (s): 0.08 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.522190E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.481 | TFLOPs: 11.51 | 7: iteration 93280/ 173500 | consumed samples: 23879680 | consumed tokens: 48905584640 | elapsed time per iteration (s): 0.08 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.538237E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.124 | TFLOPs: 11.93 | 7: iteration 93290/ 173500 | consumed samples: 23882240 | consumed tokens: 48910827520 | elapsed time per iteration (s): 0.08 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.529451E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.367 | TFLOPs: 11.93 | 7: iteration 93300/ 173500 | consumed samples: 23884800 | consumed tokens: 48916070400 | elapsed time per iteration (s): 0.08 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 4.519837E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.892 | TFLOPs: 11.63 | 7: iteration 93310/ 173500 | consumed samples: 23887360 | consumed tokens: 48921313280 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.522156E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.602 | TFLOPs: 11.67 | 7: iteration 93320/ 173500 | consumed samples: 23889920 | consumed tokens: 48926556160 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.518882E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.717 | TFLOPs: 11.89 | 7: iteration 93330/ 173500 | consumed samples: 23892480 | consumed tokens: 48931799040 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.529637E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.266 | TFLOPs: 11.84 | 7: iteration 93340/ 173500 | consumed samples: 23895040 | consumed tokens: 48937041920 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.522749E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.244 | TFLOPs: 11.94 | 7: iteration 93350/ 173500 | consumed samples: 23897600 | consumed tokens: 48942284800 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.523655E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.021 | TFLOPs: 11.97 | 7: iteration 93360/ 173500 | consumed samples: 23900160 | consumed tokens: 48947527680 | elapsed time per iteration (s): 0.08 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 4.523822E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.863 | TFLOPs: 11.92 | 7: iteration 93370/ 173500 | consumed samples: 23902720 | consumed tokens: 48952770560 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.514161E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.732 | TFLOPs: 11.94 | 7: iteration 93380/ 173500 | consumed samples: 23905280 | consumed tokens: 48958013440 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.521334E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.326 | TFLOPs: 11.89 | 7: iteration 93390/ 173500 | consumed samples: 23907840 | consumed tokens: 48963256320 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.519721E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.862 | TFLOPs: 11.97 | 7: iteration 93400/ 173500 | consumed samples: 23910400 | consumed tokens: 48968499200 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.527405E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.562 | TFLOPs: 11.97 | 7: iteration 93410/ 173500 | consumed samples: 23912960 | consumed tokens: 48973742080 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.524403E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.382 | TFLOPs: 11.94 | 7: iteration 93420/ 173500 | consumed samples: 23915520 | consumed tokens: 48978984960 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.532343E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.275 | TFLOPs: 11.95 | 7: iteration 93430/ 173500 | consumed samples: 23918080 | consumed tokens: 48984227840 | elapsed time per iteration (s): 0.08 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 4.518055E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.885 | TFLOPs: 11.77 | 7: iteration 93440/ 173500 | consumed samples: 23920640 | consumed tokens: 48989470720 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.530265E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.889 | TFLOPs: 11.97 | 7: iteration 93450/ 173500 | consumed samples: 23923200 | consumed tokens: 48994713600 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.519192E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.332 | TFLOPs: 11.67 | 7: iteration 93460/ 173500 | consumed samples: 23925760 | consumed tokens: 48999956480 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.522102E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.745 | TFLOPs: 11.96 | 7: iteration 93470/ 173500 | consumed samples: 23928320 | consumed tokens: 49005199360 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.507332E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.822 | TFLOPs: 11.69 | 7: iteration 93480/ 173500 | consumed samples: 23930880 | consumed tokens: 49010442240 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.534784E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.910 | TFLOPs: 11.96 | 7: iteration 93490/ 173500 | consumed samples: 23933440 | consumed tokens: 49015685120 | elapsed time per iteration (s): 0.08 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 4.524110E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.871 | TFLOPs: 11.94 | 7: iteration 93500/ 173500 | consumed samples: 23936000 | consumed tokens: 49020928000 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.512103E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.468 | TFLOPs: 11.93 | 7: iteration 93510/ 173500 | consumed samples: 23938560 | consumed tokens: 49026170880 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.520015E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.017 | TFLOPs: 11.99 | 7: iteration 93520/ 173500 | consumed samples: 23941120 | consumed tokens: 49031413760 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.529199E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.418 | TFLOPs: 11.95 | 7: iteration 93530/ 173500 | consumed samples: 23943680 | consumed tokens: 49036656640 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.521806E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.625 | TFLOPs: 11.96 | 7: iteration 93540/ 173500 | consumed samples: 23946240 | consumed tokens: 49041899520 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.522984E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.348 | TFLOPs: 12.01 | 7: iteration 93550/ 173500 | consumed samples: 23948800 | consumed tokens: 49047142400 | elapsed time per iteration (s): 0.08 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 4.525385E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.970 | TFLOPs: 12.05 | 7: iteration 93560/ 173500 | consumed samples: 23951360 | consumed tokens: 49052385280 | elapsed time per iteration (s): 0.08 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.518290E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.549 | TFLOPs: 12.04 | 7: iteration 93570/ 173500 | consumed samples: 23953920 | consumed tokens: 49057628160 | elapsed time per iteration (s): 0.08 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.515409E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.203 | TFLOPs: 11.84 | 7: iteration 93580/ 173500 | consumed samples: 23956480 | consumed tokens: 49062871040 | elapsed time per iteration (s): 0.08 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.539071E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.468 | TFLOPs: 12.05 | 7: iteration 93590/ 173500 | consumed samples: 23959040 | consumed tokens: 49068113920 | elapsed time per iteration (s): 0.08 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.528550E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.567 | TFLOPs: 11.75 | 7: iteration 93600/ 173500 | consumed samples: 23961600 | consumed tokens: 49073356800 | elapsed time per iteration (s): 0.09 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.510312E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.676 | TFLOPs: 10.42 | 7: iteration 93610/ 173500 | consumed samples: 23964160 | consumed tokens: 49078599680 | elapsed time per iteration (s): 0.08 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 4.529321E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.020 | TFLOPs: 12.01 | 7: iteration 93620/ 173500 | consumed samples: 23966720 | consumed tokens: 49083842560 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.521193E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.333 | TFLOPs: 11.95 | 7: iteration 93630/ 173500 | consumed samples: 23969280 | consumed tokens: 49089085440 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.525122E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.503 | TFLOPs: 11.93 | 7: iteration 93640/ 173500 | consumed samples: 23971840 | consumed tokens: 49094328320 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.518060E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.642 | TFLOPs: 11.85 | 7: iteration 93650/ 173500 | consumed samples: 23974400 | consumed tokens: 49099571200 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.535237E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.745 | TFLOPs: 11.91 | 7: iteration 93660/ 173500 | consumed samples: 23976960 | consumed tokens: 49104814080 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.506425E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.420 | TFLOPs: 11.86 | 7: iteration 93670/ 173500 | consumed samples: 23979520 | consumed tokens: 49110056960 | elapsed time per iteration (s): 0.08 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 4.523775E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.644 | TFLOPs: 11.95 | 7: iteration 93680/ 173500 | consumed samples: 23982080 | consumed tokens: 49115299840 | elapsed time per iteration (s): 0.08 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 4.509512E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.109 | TFLOPs: 11.92 | 7: iteration 93690/ 173500 | consumed samples: 23984640 | consumed tokens: 49120542720 | elapsed time per iteration (s): 0.08 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 4.526746E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.095 | TFLOPs: 11.92 | 7: iteration 93700/ 173500 | consumed samples: 23987200 | consumed tokens: 49125785600 | elapsed time per iteration (s): 0.08 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 4.514512E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.834 | TFLOPs: 11.92 | 7: iteration 93710/ 173500 | consumed samples: 23989760 | consumed tokens: 49131028480 | elapsed time per iteration (s): 0.08 | learning rate: 9.999E-05 | global batch size: 256 | lm loss: 4.524309E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.805 | TFLOPs: 11.93 | 7: iteration 93720/ 173500 | consumed samples: 23992320 | consumed tokens: 49136271360 | elapsed time per iteration (s): 0.08 | learning rate: 9.998E-05 | global batch size: 256 | lm loss: 4.523448E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.672 | TFLOPs: 11.67 | 7: iteration 93730/ 173500 | consumed samples: 23994880 | consumed tokens: 49141514240 | elapsed time per iteration (s): 0.08 | learning rate: 9.996E-05 | global batch size: 256 | lm loss: 4.534463E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.488 | TFLOPs: 11.95 | 7: iteration 93740/ 173500 | consumed samples: 23997440 | consumed tokens: 49146757120 | elapsed time per iteration (s): 0.08 | learning rate: 9.994E-05 | global batch size: 256 | lm loss: 4.519841E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.914 | TFLOPs: 11.94 | 7: iteration 93750/ 173500 | consumed samples: 24000000 | consumed tokens: 49152000000 | elapsed time per iteration (s): 0.08 | learning rate: 9.993E-05 | global batch size: 256 | lm loss: 4.523027E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.815 | TFLOPs: 11.86 | 7: iteration 93760/ 173500 | consumed samples: 24002560 | consumed tokens: 49157242880 | elapsed time per iteration (s): 0.08 | learning rate: 9.991E-05 | global batch size: 256 | lm loss: 4.530274E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.888 | TFLOPs: 11.94 | 7: iteration 93770/ 173500 | consumed samples: 24005120 | consumed tokens: 49162485760 | elapsed time per iteration (s): 0.08 | learning rate: 9.989E-05 | global batch size: 256 | lm loss: 4.521323E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.250 | TFLOPs: 11.81 | 7: iteration 93780/ 173500 | consumed samples: 24007680 | consumed tokens: 49167728640 | elapsed time per iteration (s): 0.08 | learning rate: 9.988E-05 | global batch size: 256 | lm loss: 4.527475E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.152 | TFLOPs: 11.94 | 7: iteration 93790/ 173500 | consumed samples: 24010240 | consumed tokens: 49172971520 | elapsed time per iteration (s): 0.08 | learning rate: 9.986E-05 | global batch size: 256 | lm loss: 4.522264E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.455 | TFLOPs: 11.81 | 7: iteration 93800/ 173500 | consumed samples: 24012800 | consumed tokens: 49178214400 | elapsed time per iteration (s): 0.08 | learning rate: 9.985E-05 | global batch size: 256 | lm loss: 4.521430E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.913 | TFLOPs: 11.88 | 7: iteration 93810/ 173500 | consumed samples: 24015360 | consumed tokens: 49183457280 | elapsed time per iteration (s): 0.08 | learning rate: 9.983E-05 | global batch size: 256 | lm loss: 4.515552E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.438 | TFLOPs: 11.63 | 7: iteration 93820/ 173500 | consumed samples: 24017920 | consumed tokens: 49188700160 | elapsed time per iteration (s): 0.08 | learning rate: 9.981E-05 | global batch size: 256 | lm loss: 4.535783E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.303 | TFLOPs: 11.93 | 7: iteration 93830/ 173500 | consumed samples: 24020480 | consumed tokens: 49193943040 | elapsed time per iteration (s): 0.08 | learning rate: 9.980E-05 | global batch size: 256 | lm loss: 4.524061E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.802 | TFLOPs: 11.87 | 7: iteration 93840/ 173500 | consumed samples: 24023040 | consumed tokens: 49199185920 | elapsed time per iteration (s): 0.08 | learning rate: 9.978E-05 | global batch size: 256 | lm loss: 4.531730E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.346 | TFLOPs: 11.95 | 7: iteration 93850/ 173500 | consumed samples: 24025600 | consumed tokens: 49204428800 | elapsed time per iteration (s): 0.08 | learning rate: 9.976E-05 | global batch size: 256 | lm loss: 4.516841E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.780 | TFLOPs: 11.93 | 7: iteration 93860/ 173500 | consumed samples: 24028160 | consumed tokens: 49209671680 | elapsed time per iteration (s): 0.08 | learning rate: 9.975E-05 | global batch size: 256 | lm loss: 4.522032E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.148 | TFLOPs: 11.69 | 7: iteration 93870/ 173500 | consumed samples: 24030720 | consumed tokens: 49214914560 | elapsed time per iteration (s): 0.08 | learning rate: 9.973E-05 | global batch size: 256 | lm loss: 4.516122E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.731 | TFLOPs: 11.73 | 7: iteration 93880/ 173500 | consumed samples: 24033280 | consumed tokens: 49220157440 | elapsed time per iteration (s): 0.08 | learning rate: 9.971E-05 | global batch size: 256 | lm loss: 4.506390E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.600 | TFLOPs: 11.69 | 7: iteration 93890/ 173500 | consumed samples: 24035840 | consumed tokens: 49225400320 | elapsed time per iteration (s): 0.08 | learning rate: 9.970E-05 | global batch size: 256 | lm loss: 4.524376E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.835 | TFLOPs: 11.98 | 7: iteration 93900/ 173500 | consumed samples: 24038400 | consumed tokens: 49230643200 | elapsed time per iteration (s): 0.08 | learning rate: 9.968E-05 | global batch size: 256 | lm loss: 4.526499E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.386 | TFLOPs: 11.75 | 7: iteration 93910/ 173500 | consumed samples: 24040960 | consumed tokens: 49235886080 | elapsed time per iteration (s): 0.08 | learning rate: 9.967E-05 | global batch size: 256 | lm loss: 4.532899E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.728 | TFLOPs: 11.74 | 7: iteration 93920/ 173500 | consumed samples: 24043520 | consumed tokens: 49241128960 | elapsed time per iteration (s): 0.08 | learning rate: 9.965E-05 | global batch size: 256 | lm loss: 4.514365E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.946 | TFLOPs: 11.99 | 7: iteration 93930/ 173500 | consumed samples: 24046080 | consumed tokens: 49246371840 | elapsed time per iteration (s): 0.08 | learning rate: 9.963E-05 | global batch size: 256 | lm loss: 4.521573E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.374 | TFLOPs: 11.47 | 7: iteration 93940/ 173500 | consumed samples: 24048640 | consumed tokens: 49251614720 | elapsed time per iteration (s): 0.08 | learning rate: 9.962E-05 | global batch size: 256 | lm loss: 4.530120E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.885 | TFLOPs: 11.74 | 7: iteration 93950/ 173500 | consumed samples: 24051200 | consumed tokens: 49256857600 | elapsed time per iteration (s): 0.08 | learning rate: 9.960E-05 | global batch size: 256 | lm loss: 4.519507E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.278 | TFLOPs: 11.99 | 7: iteration 93960/ 173500 | consumed samples: 24053760 | consumed tokens: 49262100480 | elapsed time per iteration (s): 0.08 | learning rate: 9.958E-05 | global batch size: 256 | lm loss: 4.499844E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.343 | TFLOPs: 12.00 | 7: iteration 93970/ 173500 | consumed samples: 24056320 | consumed tokens: 49267343360 | elapsed time per iteration (s): 0.08 | learning rate: 9.957E-05 | global batch size: 256 | lm loss: 4.517903E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.274 | TFLOPs: 11.42 | 7: iteration 93980/ 173500 | consumed samples: 24058880 | consumed tokens: 49272586240 | elapsed time per iteration (s): 0.08 | learning rate: 9.955E-05 | global batch size: 256 | lm loss: 4.521073E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.429 | TFLOPs: 11.42 | 7: iteration 93990/ 173500 | consumed samples: 24061440 | consumed tokens: 49277829120 | elapsed time per iteration (s): 0.09 | learning rate: 9.953E-05 | global batch size: 256 | lm loss: 4.524459E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.405 | TFLOPs: 11.19 | 0: [2023-03-17 02:31:16,960] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=0, lr=[9.951807001525316e-05, 9.951807001525316e-05, 9.951807001525316e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 94000/ 173500 | consumed samples: 24064000 | consumed tokens: 49283072000 | elapsed time per iteration (s): 0.09 | learning rate: 9.952E-05 | global batch size: 256 | lm loss: 4.525628E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.835 | TFLOPs: 10.03 | 0: steps: 94000 loss: 4.5155 iter time (s): 0.080 samples/sec: 3190.520 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 94000 | lm loss value: 4.398824E+00 | lm loss PPL: 8.135516E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 94000 to checkpoints_14m91b100m 0: [2023-03-17 02:31:17,033] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step94000 is begin to save! 0: [2023-03-17 02:31:17,036] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:31:17,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:31:17,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:31:17,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:31:17,066] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:31:17,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:31:17,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:31:17,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:31:17,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:31:17,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:31:17,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:31:17,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:31:17,076] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step94000/mp_rank_00_model_states.pt 0: [2023-03-17 02:31:17,076] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:31:17,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:31:17,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2023-03-17 02:31:17,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:31:17,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2023-03-17 02:31:17,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:31:17,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2023-03-17 02:31:17,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:31:17,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:31:17,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: successfully saved checkpoint at iteration 94000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.66 7: iteration 94010/ 173500 | consumed samples: 24066560 | consumed tokens: 49288314880 | elapsed time per iteration (s): 0.11 | learning rate: 9.950E-05 | global batch size: 256 | lm loss: 4.532327E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.618 | TFLOPs: 8.75 | 7: iteration 94020/ 173500 | consumed samples: 24069120 | consumed tokens: 49293557760 | elapsed time per iteration (s): 0.10 | learning rate: 9.949E-05 | global batch size: 256 | lm loss: 4.527175E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.854 | TFLOPs: 9.83 | 7: iteration 94030/ 173500 | consumed samples: 24071680 | consumed tokens: 49298800640 | elapsed time per iteration (s): 0.11 | learning rate: 9.947E-05 | global batch size: 256 | lm loss: 4.523690E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2312.355 | TFLOPs: 8.60 | 7: iteration 94040/ 173500 | consumed samples: 24074240 | consumed tokens: 49304043520 | elapsed time per iteration (s): 0.09 | learning rate: 9.945E-05 | global batch size: 256 | lm loss: 4.519537E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.344 | TFLOPs: 10.09 | 7: iteration 94050/ 173500 | consumed samples: 24076800 | consumed tokens: 49309286400 | elapsed time per iteration (s): 0.09 | learning rate: 9.944E-05 | global batch size: 256 | lm loss: 4.526017E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.862 | TFLOPs: 10.09 | 7: iteration 94060/ 173500 | consumed samples: 24079360 | consumed tokens: 49314529280 | elapsed time per iteration (s): 0.09 | learning rate: 9.942E-05 | global batch size: 256 | lm loss: 4.526181E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.253 | TFLOPs: 10.04 | 7: iteration 94070/ 173500 | consumed samples: 24081920 | consumed tokens: 49319772160 | elapsed time per iteration (s): 0.09 | learning rate: 9.940E-05 | global batch size: 256 | lm loss: 4.524971E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.454 | TFLOPs: 10.14 | 7: iteration 94080/ 173500 | consumed samples: 24084480 | consumed tokens: 49325015040 | elapsed time per iteration (s): 0.10 | learning rate: 9.939E-05 | global batch size: 256 | lm loss: 4.523499E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2674.138 | TFLOPs: 9.95 | 7: iteration 94090/ 173500 | consumed samples: 24087040 | consumed tokens: 49330257920 | elapsed time per iteration (s): 0.09 | learning rate: 9.937E-05 | global batch size: 256 | lm loss: 4.514848E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.468 | TFLOPs: 10.04 | 7: iteration 94100/ 173500 | consumed samples: 24089600 | consumed tokens: 49335500800 | elapsed time per iteration (s): 0.10 | learning rate: 9.935E-05 | global batch size: 256 | lm loss: 4.524242E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.983 | TFLOPs: 9.92 | 7: iteration 94110/ 173500 | consumed samples: 24092160 | consumed tokens: 49340743680 | elapsed time per iteration (s): 0.10 | learning rate: 9.934E-05 | global batch size: 256 | lm loss: 4.530344E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.500 | TFLOPs: 9.80 | 7: iteration 94120/ 173500 | consumed samples: 24094720 | consumed tokens: 49345986560 | elapsed time per iteration (s): 0.09 | learning rate: 9.932E-05 | global batch size: 256 | lm loss: 4.519438E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.809 | TFLOPs: 10.05 | 7: iteration 94130/ 173500 | consumed samples: 24097280 | consumed tokens: 49351229440 | elapsed time per iteration (s): 0.10 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 4.505374E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.664 | TFLOPs: 10.00 | 7: iteration 94140/ 173500 | consumed samples: 24099840 | consumed tokens: 49356472320 | elapsed time per iteration (s): 0.10 | learning rate: 9.929E-05 | global batch size: 256 | lm loss: 4.530616E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.450 | TFLOPs: 10.00 | 7: iteration 94150/ 173500 | consumed samples: 24102400 | consumed tokens: 49361715200 | elapsed time per iteration (s): 0.10 | learning rate: 9.927E-05 | global batch size: 256 | lm loss: 4.532148E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.860 | TFLOPs: 9.96 | 7: iteration 94160/ 173500 | consumed samples: 24104960 | consumed tokens: 49366958080 | elapsed time per iteration (s): 0.09 | learning rate: 9.926E-05 | global batch size: 256 | lm loss: 4.534848E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.178 | TFLOPs: 10.04 | 7: iteration 94170/ 173500 | consumed samples: 24107520 | consumed tokens: 49372200960 | elapsed time per iteration (s): 0.09 | learning rate: 9.924E-05 | global batch size: 256 | lm loss: 4.523921E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2712.142 | TFLOPs: 10.09 | 7: iteration 94180/ 173500 | consumed samples: 24110080 | consumed tokens: 49377443840 | elapsed time per iteration (s): 0.10 | learning rate: 9.922E-05 | global batch size: 256 | lm loss: 4.523076E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.584 | TFLOPs: 9.96 | 7: iteration 94190/ 173500 | consumed samples: 24112640 | consumed tokens: 49382686720 | elapsed time per iteration (s): 0.12 | learning rate: 9.921E-05 | global batch size: 256 | lm loss: 4.527834E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2214.885 | TFLOPs: 8.24 | 7: iteration 94200/ 173500 | consumed samples: 24115200 | consumed tokens: 49387929600 | elapsed time per iteration (s): 0.10 | learning rate: 9.919E-05 | global batch size: 256 | lm loss: 4.526230E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.616 | TFLOPs: 9.92 | 7: iteration 94210/ 173500 | consumed samples: 24117760 | consumed tokens: 49393172480 | elapsed time per iteration (s): 0.09 | learning rate: 9.917E-05 | global batch size: 256 | lm loss: 4.529675E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.845 | TFLOPs: 10.05 | 7: iteration 94220/ 173500 | consumed samples: 24120320 | consumed tokens: 49398415360 | elapsed time per iteration (s): 0.09 | learning rate: 9.916E-05 | global batch size: 256 | lm loss: 4.530423E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.226 | TFLOPs: 10.04 | 7: iteration 94230/ 173500 | consumed samples: 24122880 | consumed tokens: 49403658240 | elapsed time per iteration (s): 0.09 | learning rate: 9.914E-05 | global batch size: 256 | lm loss: 4.510823E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.765 | TFLOPs: 10.17 | 7: iteration 94240/ 173500 | consumed samples: 24125440 | consumed tokens: 49408901120 | elapsed time per iteration (s): 0.09 | learning rate: 9.913E-05 | global batch size: 256 | lm loss: 4.525423E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.588 | TFLOPs: 10.05 | 7: iteration 94250/ 173500 | consumed samples: 24128000 | consumed tokens: 49414144000 | elapsed time per iteration (s): 0.10 | learning rate: 9.911E-05 | global batch size: 256 | lm loss: 4.524330E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2655.864 | TFLOPs: 9.88 | 7: iteration 94260/ 173500 | consumed samples: 24130560 | consumed tokens: 49419386880 | elapsed time per iteration (s): 0.09 | learning rate: 9.909E-05 | global batch size: 256 | lm loss: 4.516777E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.440 | TFLOPs: 10.30 | 7: iteration 94270/ 173500 | consumed samples: 24133120 | consumed tokens: 49424629760 | elapsed time per iteration (s): 0.10 | learning rate: 9.908E-05 | global batch size: 256 | lm loss: 4.526962E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.816 | TFLOPs: 9.96 | 7: iteration 94280/ 173500 | consumed samples: 24135680 | consumed tokens: 49429872640 | elapsed time per iteration (s): 0.09 | learning rate: 9.906E-05 | global batch size: 256 | lm loss: 4.524009E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.231 | TFLOPs: 10.41 | 7: iteration 94290/ 173500 | consumed samples: 24138240 | consumed tokens: 49435115520 | elapsed time per iteration (s): 0.12 | learning rate: 9.904E-05 | global batch size: 256 | lm loss: 4.525378E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2197.149 | TFLOPs: 8.17 | 7: iteration 94300/ 173500 | consumed samples: 24140800 | consumed tokens: 49440358400 | elapsed time per iteration (s): 0.13 | learning rate: 9.903E-05 | global batch size: 256 | lm loss: 4.516191E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2019.484 | TFLOPs: 7.51 | 7: iteration 94310/ 173500 | consumed samples: 24143360 | consumed tokens: 49445601280 | elapsed time per iteration (s): 0.13 | learning rate: 9.901E-05 | global batch size: 256 | lm loss: 4.515063E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.630 | TFLOPs: 7.42 | 7: iteration 94320/ 173500 | consumed samples: 24145920 | consumed tokens: 49450844160 | elapsed time per iteration (s): 0.12 | learning rate: 9.900E-05 | global batch size: 256 | lm loss: 4.520362E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2140.609 | TFLOPs: 7.96 | 7: iteration 94330/ 173500 | consumed samples: 24148480 | consumed tokens: 49456087040 | elapsed time per iteration (s): 0.12 | learning rate: 9.898E-05 | global batch size: 256 | lm loss: 4.514636E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2188.556 | TFLOPs: 8.14 | 7: iteration 94340/ 173500 | consumed samples: 24151040 | consumed tokens: 49461329920 | elapsed time per iteration (s): 0.08 | learning rate: 9.896E-05 | global batch size: 256 | lm loss: 4.520441E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.419 | TFLOPs: 11.79 | 7: iteration 94350/ 173500 | consumed samples: 24153600 | consumed tokens: 49466572800 | elapsed time per iteration (s): 0.08 | learning rate: 9.895E-05 | global batch size: 256 | lm loss: 4.519000E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.164 | TFLOPs: 11.83 | 7: iteration 94360/ 173500 | consumed samples: 24156160 | consumed tokens: 49471815680 | elapsed time per iteration (s): 0.08 | learning rate: 9.893E-05 | global batch size: 256 | lm loss: 4.511205E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.924 | TFLOPs: 11.88 | 7: iteration 94370/ 173500 | consumed samples: 24158720 | consumed tokens: 49477058560 | elapsed time per iteration (s): 0.08 | learning rate: 9.891E-05 | global batch size: 256 | lm loss: 4.518469E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.952 | TFLOPs: 11.90 | 7: iteration 94380/ 173500 | consumed samples: 24161280 | consumed tokens: 49482301440 | elapsed time per iteration (s): 0.09 | learning rate: 9.890E-05 | global batch size: 256 | lm loss: 4.531667E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.167 | TFLOPs: 10.08 | 7: iteration 94390/ 173500 | consumed samples: 24163840 | consumed tokens: 49487544320 | elapsed time per iteration (s): 0.09 | learning rate: 9.888E-05 | global batch size: 256 | lm loss: 4.522211E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.489 | TFLOPs: 10.13 | 7: iteration 94400/ 173500 | consumed samples: 24166400 | consumed tokens: 49492787200 | elapsed time per iteration (s): 0.09 | learning rate: 9.886E-05 | global batch size: 256 | lm loss: 4.519168E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.536 | TFLOPs: 10.98 | 7: iteration 94410/ 173500 | consumed samples: 24168960 | consumed tokens: 49498030080 | elapsed time per iteration (s): 0.08 | learning rate: 9.885E-05 | global batch size: 256 | lm loss: 4.531368E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.330 | TFLOPs: 11.88 | 7: iteration 94420/ 173500 | consumed samples: 24171520 | consumed tokens: 49503272960 | elapsed time per iteration (s): 0.08 | learning rate: 9.883E-05 | global batch size: 256 | lm loss: 4.528362E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.916 | TFLOPs: 11.88 | 7: iteration 94430/ 173500 | consumed samples: 24174080 | consumed tokens: 49508515840 | elapsed time per iteration (s): 0.08 | learning rate: 9.882E-05 | global batch size: 256 | lm loss: 4.520736E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.688 | TFLOPs: 11.80 | 7: iteration 94440/ 173500 | consumed samples: 24176640 | consumed tokens: 49513758720 | elapsed time per iteration (s): 0.08 | learning rate: 9.880E-05 | global batch size: 256 | lm loss: 4.531018E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.139 | TFLOPs: 11.86 | 7: iteration 94450/ 173500 | consumed samples: 24179200 | consumed tokens: 49519001600 | elapsed time per iteration (s): 0.08 | learning rate: 9.878E-05 | global batch size: 256 | lm loss: 4.529897E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.893 | TFLOPs: 11.89 | 7: iteration 94460/ 173500 | consumed samples: 24181760 | consumed tokens: 49524244480 | elapsed time per iteration (s): 0.08 | learning rate: 9.877E-05 | global batch size: 256 | lm loss: 4.518204E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.025 | TFLOPs: 11.96 | 7: iteration 94470/ 173500 | consumed samples: 24184320 | consumed tokens: 49529487360 | elapsed time per iteration (s): 0.08 | learning rate: 9.875E-05 | global batch size: 256 | lm loss: 4.518452E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.075 | TFLOPs: 11.91 | 7: iteration 94480/ 173500 | consumed samples: 24186880 | consumed tokens: 49534730240 | elapsed time per iteration (s): 0.08 | learning rate: 9.873E-05 | global batch size: 256 | lm loss: 4.537511E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.900 | TFLOPs: 11.93 | 7: iteration 94490/ 173500 | consumed samples: 24189440 | consumed tokens: 49539973120 | elapsed time per iteration (s): 0.08 | learning rate: 9.872E-05 | global batch size: 256 | lm loss: 4.536294E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.061 | TFLOPs: 11.93 | 7: iteration 94500/ 173500 | consumed samples: 24192000 | consumed tokens: 49545216000 | elapsed time per iteration (s): 0.08 | learning rate: 9.870E-05 | global batch size: 256 | lm loss: 4.529601E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.545 | TFLOPs: 11.94 | 7: iteration 94510/ 173500 | consumed samples: 24194560 | consumed tokens: 49550458880 | elapsed time per iteration (s): 0.08 | learning rate: 9.868E-05 | global batch size: 256 | lm loss: 4.510870E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.947 | TFLOPs: 11.91 | 7: iteration 94520/ 173500 | consumed samples: 24197120 | consumed tokens: 49555701760 | elapsed time per iteration (s): 0.08 | learning rate: 9.867E-05 | global batch size: 256 | lm loss: 4.518884E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.981 | TFLOPs: 11.91 | 7: iteration 94530/ 173500 | consumed samples: 24199680 | consumed tokens: 49560944640 | elapsed time per iteration (s): 0.08 | learning rate: 9.865E-05 | global batch size: 256 | lm loss: 4.518235E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.189 | TFLOPs: 11.93 | 7: iteration 94540/ 173500 | consumed samples: 24202240 | consumed tokens: 49566187520 | elapsed time per iteration (s): 0.08 | learning rate: 9.864E-05 | global batch size: 256 | lm loss: 4.512909E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.188 | TFLOPs: 11.91 | 7: iteration 94550/ 173500 | consumed samples: 24204800 | consumed tokens: 49571430400 | elapsed time per iteration (s): 0.08 | learning rate: 9.862E-05 | global batch size: 256 | lm loss: 4.529166E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.388 | TFLOPs: 11.91 | 7: iteration 94560/ 173500 | consumed samples: 24207360 | consumed tokens: 49576673280 | elapsed time per iteration (s): 0.08 | learning rate: 9.860E-05 | global batch size: 256 | lm loss: 4.531918E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.907 | TFLOPs: 11.90 | 7: iteration 94570/ 173500 | consumed samples: 24209920 | consumed tokens: 49581916160 | elapsed time per iteration (s): 0.08 | learning rate: 9.859E-05 | global batch size: 256 | lm loss: 4.515068E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.456 | TFLOPs: 11.91 | 7: iteration 94580/ 173500 | consumed samples: 24212480 | consumed tokens: 49587159040 | elapsed time per iteration (s): 0.08 | learning rate: 9.857E-05 | global batch size: 256 | lm loss: 4.535102E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.394 | TFLOPs: 11.92 | 7: iteration 94590/ 173500 | consumed samples: 24215040 | consumed tokens: 49592401920 | elapsed time per iteration (s): 0.08 | learning rate: 9.855E-05 | global batch size: 256 | lm loss: 4.518085E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.944 | TFLOPs: 11.95 | 7: iteration 94600/ 173500 | consumed samples: 24217600 | consumed tokens: 49597644800 | elapsed time per iteration (s): 0.08 | learning rate: 9.854E-05 | global batch size: 256 | lm loss: 4.523409E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.112 | TFLOPs: 11.98 | 7: iteration 94610/ 173500 | consumed samples: 24220160 | consumed tokens: 49602887680 | elapsed time per iteration (s): 0.08 | learning rate: 9.852E-05 | global batch size: 256 | lm loss: 4.510688E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.131 | TFLOPs: 11.98 | 7: iteration 94620/ 173500 | consumed samples: 24222720 | consumed tokens: 49608130560 | elapsed time per iteration (s): 0.08 | learning rate: 9.851E-05 | global batch size: 256 | lm loss: 4.517382E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.663 | TFLOPs: 12.01 | 7: iteration 94630/ 173500 | consumed samples: 24225280 | consumed tokens: 49613373440 | elapsed time per iteration (s): 0.08 | learning rate: 9.849E-05 | global batch size: 256 | lm loss: 4.535901E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.235 | TFLOPs: 11.96 | 7: iteration 94640/ 173500 | consumed samples: 24227840 | consumed tokens: 49618616320 | elapsed time per iteration (s): 0.08 | learning rate: 9.847E-05 | global batch size: 256 | lm loss: 4.527599E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.565 | TFLOPs: 12.01 | 7: iteration 94650/ 173500 | consumed samples: 24230400 | consumed tokens: 49623859200 | elapsed time per iteration (s): 0.08 | learning rate: 9.846E-05 | global batch size: 256 | lm loss: 4.519649E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.527 | TFLOPs: 12.00 | 7: iteration 94660/ 173500 | consumed samples: 24232960 | consumed tokens: 49629102080 | elapsed time per iteration (s): 0.08 | learning rate: 9.844E-05 | global batch size: 256 | lm loss: 4.518927E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.631 | TFLOPs: 11.44 | 7: iteration 94670/ 173500 | consumed samples: 24235520 | consumed tokens: 49634344960 | elapsed time per iteration (s): 0.08 | learning rate: 9.842E-05 | global batch size: 256 | lm loss: 4.526162E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.507 | TFLOPs: 11.71 | 7: iteration 94680/ 173500 | consumed samples: 24238080 | consumed tokens: 49639587840 | elapsed time per iteration (s): 0.08 | learning rate: 9.841E-05 | global batch size: 256 | lm loss: 4.511912E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.401 | TFLOPs: 12.01 | 7: iteration 94690/ 173500 | consumed samples: 24240640 | consumed tokens: 49644830720 | elapsed time per iteration (s): 0.08 | learning rate: 9.839E-05 | global batch size: 256 | lm loss: 4.514569E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.232 | TFLOPs: 11.97 | 7: iteration 94700/ 173500 | consumed samples: 24243200 | consumed tokens: 49650073600 | elapsed time per iteration (s): 0.08 | learning rate: 9.837E-05 | global batch size: 256 | lm loss: 4.532823E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.266 | TFLOPs: 11.99 | 7: iteration 94710/ 173500 | consumed samples: 24245760 | consumed tokens: 49655316480 | elapsed time per iteration (s): 0.08 | learning rate: 9.836E-05 | global batch size: 256 | lm loss: 4.513608E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.429 | TFLOPs: 12.00 | 7: iteration 94720/ 173500 | consumed samples: 24248320 | consumed tokens: 49660559360 | elapsed time per iteration (s): 0.08 | learning rate: 9.834E-05 | global batch size: 256 | lm loss: 4.523149E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.211 | TFLOPs: 11.72 | 7: iteration 94730/ 173500 | consumed samples: 24250880 | consumed tokens: 49665802240 | elapsed time per iteration (s): 0.08 | learning rate: 9.833E-05 | global batch size: 256 | lm loss: 4.527916E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.292 | TFLOPs: 11.99 | 7: iteration 94740/ 173500 | consumed samples: 24253440 | consumed tokens: 49671045120 | elapsed time per iteration (s): 0.08 | learning rate: 9.831E-05 | global batch size: 256 | lm loss: 4.511061E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.871 | TFLOPs: 11.85 | 7: iteration 94750/ 173500 | consumed samples: 24256000 | consumed tokens: 49676288000 | elapsed time per iteration (s): 0.08 | learning rate: 9.829E-05 | global batch size: 256 | lm loss: 4.524705E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.599 | TFLOPs: 11.74 | 7: iteration 94760/ 173500 | consumed samples: 24258560 | consumed tokens: 49681530880 | elapsed time per iteration (s): 0.08 | learning rate: 9.828E-05 | global batch size: 256 | lm loss: 4.527936E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.005 | TFLOPs: 12.01 | 7: iteration 94770/ 173500 | consumed samples: 24261120 | consumed tokens: 49686773760 | elapsed time per iteration (s): 0.08 | learning rate: 9.826E-05 | global batch size: 256 | lm loss: 4.518075E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.828 | TFLOPs: 11.99 | 7: iteration 94780/ 173500 | consumed samples: 24263680 | consumed tokens: 49692016640 | elapsed time per iteration (s): 0.08 | learning rate: 9.824E-05 | global batch size: 256 | lm loss: 4.532612E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.488 | TFLOPs: 11.98 | 7: iteration 94790/ 173500 | consumed samples: 24266240 | consumed tokens: 49697259520 | elapsed time per iteration (s): 0.08 | learning rate: 9.823E-05 | global batch size: 256 | lm loss: 4.530859E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.070 | TFLOPs: 12.02 | 7: iteration 94800/ 173500 | consumed samples: 24268800 | consumed tokens: 49702502400 | elapsed time per iteration (s): 0.08 | learning rate: 9.821E-05 | global batch size: 256 | lm loss: 4.508993E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.383 | TFLOPs: 11.72 | 7: iteration 94810/ 173500 | consumed samples: 24271360 | consumed tokens: 49707745280 | elapsed time per iteration (s): 0.08 | learning rate: 9.820E-05 | global batch size: 256 | lm loss: 4.509456E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.804 | TFLOPs: 11.98 | 7: iteration 94820/ 173500 | consumed samples: 24273920 | consumed tokens: 49712988160 | elapsed time per iteration (s): 0.08 | learning rate: 9.818E-05 | global batch size: 256 | lm loss: 4.522058E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.809 | TFLOPs: 11.72 | 7: iteration 94830/ 173500 | consumed samples: 24276480 | consumed tokens: 49718231040 | elapsed time per iteration (s): 0.08 | learning rate: 9.816E-05 | global batch size: 256 | lm loss: 4.522717E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.769 | TFLOPs: 11.99 | 7: iteration 94840/ 173500 | consumed samples: 24279040 | consumed tokens: 49723473920 | elapsed time per iteration (s): 0.08 | learning rate: 9.815E-05 | global batch size: 256 | lm loss: 4.526542E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.923 | TFLOPs: 12.00 | 7: iteration 94850/ 173500 | consumed samples: 24281600 | consumed tokens: 49728716800 | elapsed time per iteration (s): 0.08 | learning rate: 9.813E-05 | global batch size: 256 | lm loss: 4.529367E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.685 | TFLOPs: 11.96 | 7: iteration 94860/ 173500 | consumed samples: 24284160 | consumed tokens: 49733959680 | elapsed time per iteration (s): 0.08 | learning rate: 9.811E-05 | global batch size: 256 | lm loss: 4.525960E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.261 | TFLOPs: 11.83 | 7: iteration 94870/ 173500 | consumed samples: 24286720 | consumed tokens: 49739202560 | elapsed time per iteration (s): 0.08 | learning rate: 9.810E-05 | global batch size: 256 | lm loss: 4.518100E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.978 | TFLOPs: 11.97 | 7: iteration 94880/ 173500 | consumed samples: 24289280 | consumed tokens: 49744445440 | elapsed time per iteration (s): 0.08 | learning rate: 9.808E-05 | global batch size: 256 | lm loss: 4.522625E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.302 | TFLOPs: 11.93 | 7: iteration 94890/ 173500 | consumed samples: 24291840 | consumed tokens: 49749688320 | elapsed time per iteration (s): 0.08 | learning rate: 9.806E-05 | global batch size: 256 | lm loss: 4.519139E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.048 | TFLOPs: 12.00 | 7: iteration 94900/ 173500 | consumed samples: 24294400 | consumed tokens: 49754931200 | elapsed time per iteration (s): 0.08 | learning rate: 9.805E-05 | global batch size: 256 | lm loss: 4.506753E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.142 | TFLOPs: 11.51 | 7: iteration 94910/ 173500 | consumed samples: 24296960 | consumed tokens: 49760174080 | elapsed time per iteration (s): 0.08 | learning rate: 9.803E-05 | global batch size: 256 | lm loss: 4.531835E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.567 | TFLOPs: 11.60 | 7: iteration 94920/ 173500 | consumed samples: 24299520 | consumed tokens: 49765416960 | elapsed time per iteration (s): 0.11 | learning rate: 9.802E-05 | global batch size: 256 | lm loss: 4.513634E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.797 | TFLOPs: 9.02 | 7: iteration 94930/ 173500 | consumed samples: 24302080 | consumed tokens: 49770659840 | elapsed time per iteration (s): 0.11 | learning rate: 9.800E-05 | global batch size: 256 | lm loss: 4.528601E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.180 | TFLOPs: 8.88 | 7: iteration 94940/ 173500 | consumed samples: 24304640 | consumed tokens: 49775902720 | elapsed time per iteration (s): 0.10 | learning rate: 9.798E-05 | global batch size: 256 | lm loss: 4.525083E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2597.890 | TFLOPs: 9.66 | 7: iteration 94950/ 173500 | consumed samples: 24307200 | consumed tokens: 49781145600 | elapsed time per iteration (s): 0.08 | learning rate: 9.797E-05 | global batch size: 256 | lm loss: 4.526647E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.749 | TFLOPs: 12.01 | 7: iteration 94960/ 173500 | consumed samples: 24309760 | consumed tokens: 49786388480 | elapsed time per iteration (s): 0.08 | learning rate: 9.795E-05 | global batch size: 256 | lm loss: 4.523239E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.869 | TFLOPs: 11.96 | 7: iteration 94970/ 173500 | consumed samples: 24312320 | consumed tokens: 49791631360 | elapsed time per iteration (s): 0.08 | learning rate: 9.793E-05 | global batch size: 256 | lm loss: 4.527251E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.463 | TFLOPs: 11.84 | 7: iteration 94980/ 173500 | consumed samples: 24314880 | consumed tokens: 49796874240 | elapsed time per iteration (s): 0.08 | learning rate: 9.792E-05 | global batch size: 256 | lm loss: 4.528308E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.461 | TFLOPs: 11.95 | 7: iteration 94990/ 173500 | consumed samples: 24317440 | consumed tokens: 49802117120 | elapsed time per iteration (s): 0.08 | learning rate: 9.790E-05 | global batch size: 256 | lm loss: 4.521962E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.362 | TFLOPs: 11.97 | 7: iteration 95000/ 173500 | consumed samples: 24320000 | consumed tokens: 49807360000 | elapsed time per iteration (s): 0.08 | learning rate: 9.789E-05 | global batch size: 256 | lm loss: 4.526984E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.444 | TFLOPs: 11.98 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 95000 | lm loss value: 4.395666E+00 | lm loss PPL: 8.109860E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 95000 to checkpoints_14m91b100m 0: [2023-03-17 02:32:44,863] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step95000 is begin to save! 0: [2023-03-17 02:32:44,867] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:32:44,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:32:44,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:32:44,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:32:44,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:32:44,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:32:44,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:32:44,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:32:44,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:32:44,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:32:44,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:32:44,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:32:44,906] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step95000/mp_rank_00_model_states.pt 0: [2023-03-17 02:32:44,906] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:32:44,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:32:44,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:32:44,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:32:44,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:32:44,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:32:44,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:32:44,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2023-03-17 02:32:44,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:32:44,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2023-03-17 02:32:44,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: successfully saved checkpoint at iteration 95000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 94.05 7: iteration 95010/ 173500 | consumed samples: 24322560 | consumed tokens: 49812602880 | elapsed time per iteration (s): 0.09 | learning rate: 9.787E-05 | global batch size: 256 | lm loss: 4.527866E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.488 | TFLOPs: 10.30 | 7: iteration 95020/ 173500 | consumed samples: 24325120 | consumed tokens: 49817845760 | elapsed time per iteration (s): 0.08 | learning rate: 9.785E-05 | global batch size: 256 | lm loss: 4.512042E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.397 | TFLOPs: 11.78 | 7: iteration 95030/ 173500 | consumed samples: 24327680 | consumed tokens: 49823088640 | elapsed time per iteration (s): 0.08 | learning rate: 9.784E-05 | global batch size: 256 | lm loss: 4.521833E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.544 | TFLOPs: 12.00 | 7: iteration 95040/ 173500 | consumed samples: 24330240 | consumed tokens: 49828331520 | elapsed time per iteration (s): 0.08 | learning rate: 9.782E-05 | global batch size: 256 | lm loss: 4.522999E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.693 | TFLOPs: 12.02 | 7: iteration 95050/ 173500 | consumed samples: 24332800 | consumed tokens: 49833574400 | elapsed time per iteration (s): 0.08 | learning rate: 9.780E-05 | global batch size: 256 | lm loss: 4.520724E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.515 | TFLOPs: 12.00 | 7: iteration 95060/ 173500 | consumed samples: 24335360 | consumed tokens: 49838817280 | elapsed time per iteration (s): 0.08 | learning rate: 9.779E-05 | global batch size: 256 | lm loss: 4.526964E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.385 | TFLOPs: 12.00 | 7: iteration 95070/ 173500 | consumed samples: 24337920 | consumed tokens: 49844060160 | elapsed time per iteration (s): 0.08 | learning rate: 9.777E-05 | global batch size: 256 | lm loss: 4.519972E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.207 | TFLOPs: 12.02 | 7: iteration 95080/ 173500 | consumed samples: 24340480 | consumed tokens: 49849303040 | elapsed time per iteration (s): 0.08 | learning rate: 9.775E-05 | global batch size: 256 | lm loss: 4.510908E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.250 | TFLOPs: 12.04 | 7: iteration 95090/ 173500 | consumed samples: 24343040 | consumed tokens: 49854545920 | elapsed time per iteration (s): 0.08 | learning rate: 9.774E-05 | global batch size: 256 | lm loss: 4.512307E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.887 | TFLOPs: 12.03 | 7: iteration 95100/ 173500 | consumed samples: 24345600 | consumed tokens: 49859788800 | elapsed time per iteration (s): 0.08 | learning rate: 9.772E-05 | global batch size: 256 | lm loss: 4.529236E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.237 | TFLOPs: 12.03 | 7: iteration 95110/ 173500 | consumed samples: 24348160 | consumed tokens: 49865031680 | elapsed time per iteration (s): 0.08 | learning rate: 9.771E-05 | global batch size: 256 | lm loss: 4.538351E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.449 | TFLOPs: 12.00 | 7: iteration 95120/ 173500 | consumed samples: 24350720 | consumed tokens: 49870274560 | elapsed time per iteration (s): 0.08 | learning rate: 9.769E-05 | global batch size: 256 | lm loss: 4.533314E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.831 | TFLOPs: 11.31 | 7: iteration 95130/ 173500 | consumed samples: 24353280 | consumed tokens: 49875517440 | elapsed time per iteration (s): 0.08 | learning rate: 9.767E-05 | global batch size: 256 | lm loss: 4.512529E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.996 | TFLOPs: 11.90 | 7: iteration 95140/ 173500 | consumed samples: 24355840 | consumed tokens: 49880760320 | elapsed time per iteration (s): 0.08 | learning rate: 9.766E-05 | global batch size: 256 | lm loss: 4.510921E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.627 | TFLOPs: 11.90 | 7: iteration 95150/ 173500 | consumed samples: 24358400 | consumed tokens: 49886003200 | elapsed time per iteration (s): 0.08 | learning rate: 9.764E-05 | global batch size: 256 | lm loss: 4.521175E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.958 | TFLOPs: 11.91 | 7: iteration 95160/ 173500 | consumed samples: 24360960 | consumed tokens: 49891246080 | elapsed time per iteration (s): 0.08 | learning rate: 9.762E-05 | global batch size: 256 | lm loss: 4.518009E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.157 | TFLOPs: 11.84 | 7: iteration 95170/ 173500 | consumed samples: 24363520 | consumed tokens: 49896488960 | elapsed time per iteration (s): 0.08 | learning rate: 9.761E-05 | global batch size: 256 | lm loss: 4.515784E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.932 | TFLOPs: 11.90 | 7: iteration 95180/ 173500 | consumed samples: 24366080 | consumed tokens: 49901731840 | elapsed time per iteration (s): 0.08 | learning rate: 9.759E-05 | global batch size: 256 | lm loss: 4.520693E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.499 | TFLOPs: 11.65 | 7: iteration 95190/ 173500 | consumed samples: 24368640 | consumed tokens: 49906974720 | elapsed time per iteration (s): 0.08 | learning rate: 9.758E-05 | global batch size: 256 | lm loss: 4.524481E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.982 | TFLOPs: 11.88 | 7: iteration 95200/ 173500 | consumed samples: 24371200 | consumed tokens: 49912217600 | elapsed time per iteration (s): 0.08 | learning rate: 9.756E-05 | global batch size: 256 | lm loss: 4.508324E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.871 | TFLOPs: 11.89 | 7: iteration 95210/ 173500 | consumed samples: 24373760 | consumed tokens: 49917460480 | elapsed time per iteration (s): 0.08 | learning rate: 9.754E-05 | global batch size: 256 | lm loss: 4.535600E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.679 | TFLOPs: 11.83 | 7: iteration 95220/ 173500 | consumed samples: 24376320 | consumed tokens: 49922703360 | elapsed time per iteration (s): 0.08 | learning rate: 9.753E-05 | global batch size: 256 | lm loss: 4.516536E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.754 | TFLOPs: 11.83 | 7: iteration 95230/ 173500 | consumed samples: 24378880 | consumed tokens: 49927946240 | elapsed time per iteration (s): 0.08 | learning rate: 9.751E-05 | global batch size: 256 | lm loss: 4.521190E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.203 | TFLOPs: 11.78 | 7: iteration 95240/ 173500 | consumed samples: 24381440 | consumed tokens: 49933189120 | elapsed time per iteration (s): 0.08 | learning rate: 9.749E-05 | global batch size: 256 | lm loss: 4.510581E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.925 | TFLOPs: 11.87 | 7: iteration 95250/ 173500 | consumed samples: 24384000 | consumed tokens: 49938432000 | elapsed time per iteration (s): 0.08 | learning rate: 9.748E-05 | global batch size: 256 | lm loss: 4.521235E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.834 | TFLOPs: 11.77 | 7: iteration 95260/ 173500 | consumed samples: 24386560 | consumed tokens: 49943674880 | elapsed time per iteration (s): 0.08 | learning rate: 9.746E-05 | global batch size: 256 | lm loss: 4.516525E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.512 | TFLOPs: 11.82 | 7: iteration 95270/ 173500 | consumed samples: 24389120 | consumed tokens: 49948917760 | elapsed time per iteration (s): 0.08 | learning rate: 9.744E-05 | global batch size: 256 | lm loss: 4.515083E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.014 | TFLOPs: 11.88 | 7: iteration 95280/ 173500 | consumed samples: 24391680 | consumed tokens: 49954160640 | elapsed time per iteration (s): 0.08 | learning rate: 9.743E-05 | global batch size: 256 | lm loss: 4.523016E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.709 | TFLOPs: 11.84 | 7: iteration 95290/ 173500 | consumed samples: 24394240 | consumed tokens: 49959403520 | elapsed time per iteration (s): 0.08 | learning rate: 9.741E-05 | global batch size: 256 | lm loss: 4.523741E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.869 | TFLOPs: 11.89 | 7: iteration 95300/ 173500 | consumed samples: 24396800 | consumed tokens: 49964646400 | elapsed time per iteration (s): 0.08 | learning rate: 9.740E-05 | global batch size: 256 | lm loss: 4.509959E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.506 | TFLOPs: 11.89 | 7: iteration 95310/ 173500 | consumed samples: 24399360 | consumed tokens: 49969889280 | elapsed time per iteration (s): 0.08 | learning rate: 9.738E-05 | global batch size: 256 | lm loss: 4.517675E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.741 | TFLOPs: 11.89 | 7: iteration 95320/ 173500 | consumed samples: 24401920 | consumed tokens: 49975132160 | elapsed time per iteration (s): 0.08 | learning rate: 9.736E-05 | global batch size: 256 | lm loss: 4.517370E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.730 | TFLOPs: 11.89 | 7: iteration 95330/ 173500 | consumed samples: 24404480 | consumed tokens: 49980375040 | elapsed time per iteration (s): 0.08 | learning rate: 9.735E-05 | global batch size: 256 | lm loss: 4.522300E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.706 | TFLOPs: 11.86 | 7: iteration 95340/ 173500 | consumed samples: 24407040 | consumed tokens: 49985617920 | elapsed time per iteration (s): 0.08 | learning rate: 9.733E-05 | global batch size: 256 | lm loss: 4.515976E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.308 | TFLOPs: 11.91 | 7: iteration 95350/ 173500 | consumed samples: 24409600 | consumed tokens: 49990860800 | elapsed time per iteration (s): 0.08 | learning rate: 9.731E-05 | global batch size: 256 | lm loss: 4.512469E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.584 | TFLOPs: 11.85 | 7: iteration 95360/ 173500 | consumed samples: 24412160 | consumed tokens: 49996103680 | elapsed time per iteration (s): 0.08 | learning rate: 9.730E-05 | global batch size: 256 | lm loss: 4.519887E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.532 | TFLOPs: 11.90 | 7: iteration 95370/ 173500 | consumed samples: 24414720 | consumed tokens: 50001346560 | elapsed time per iteration (s): 0.08 | learning rate: 9.728E-05 | global batch size: 256 | lm loss: 4.522812E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.086 | TFLOPs: 11.91 | 7: iteration 95380/ 173500 | consumed samples: 24417280 | consumed tokens: 50006589440 | elapsed time per iteration (s): 0.08 | learning rate: 9.727E-05 | global batch size: 256 | lm loss: 4.512429E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.277 | TFLOPs: 11.89 | 7: iteration 95390/ 173500 | consumed samples: 24419840 | consumed tokens: 50011832320 | elapsed time per iteration (s): 0.08 | learning rate: 9.725E-05 | global batch size: 256 | lm loss: 4.513094E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.995 | TFLOPs: 11.87 | 7: iteration 95400/ 173500 | consumed samples: 24422400 | consumed tokens: 50017075200 | elapsed time per iteration (s): 0.08 | learning rate: 9.723E-05 | global batch size: 256 | lm loss: 4.510645E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.788 | TFLOPs: 11.88 | 7: iteration 95410/ 173500 | consumed samples: 24424960 | consumed tokens: 50022318080 | elapsed time per iteration (s): 0.08 | learning rate: 9.722E-05 | global batch size: 256 | lm loss: 4.527072E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.036 | TFLOPs: 11.91 | 7: iteration 95420/ 173500 | consumed samples: 24427520 | consumed tokens: 50027560960 | elapsed time per iteration (s): 0.08 | learning rate: 9.720E-05 | global batch size: 256 | lm loss: 4.499305E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.616 | TFLOPs: 11.85 | 7: iteration 95430/ 173500 | consumed samples: 24430080 | consumed tokens: 50032803840 | elapsed time per iteration (s): 0.08 | learning rate: 9.718E-05 | global batch size: 256 | lm loss: 4.526830E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.229 | TFLOPs: 11.88 | 7: iteration 95440/ 173500 | consumed samples: 24432640 | consumed tokens: 50038046720 | elapsed time per iteration (s): 0.08 | learning rate: 9.717E-05 | global batch size: 256 | lm loss: 4.519472E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.634 | TFLOPs: 11.85 | 7: iteration 95450/ 173500 | consumed samples: 24435200 | consumed tokens: 50043289600 | elapsed time per iteration (s): 0.08 | learning rate: 9.715E-05 | global batch size: 256 | lm loss: 4.519675E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.770 | TFLOPs: 11.86 | 7: iteration 95460/ 173500 | consumed samples: 24437760 | consumed tokens: 50048532480 | elapsed time per iteration (s): 0.08 | learning rate: 9.714E-05 | global batch size: 256 | lm loss: 4.527681E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.708 | TFLOPs: 11.85 | 7: iteration 95470/ 173500 | consumed samples: 24440320 | consumed tokens: 50053775360 | elapsed time per iteration (s): 0.08 | learning rate: 9.712E-05 | global batch size: 256 | lm loss: 4.515760E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.025 | TFLOPs: 11.76 | 7: iteration 95480/ 173500 | consumed samples: 24442880 | consumed tokens: 50059018240 | elapsed time per iteration (s): 0.08 | learning rate: 9.710E-05 | global batch size: 256 | lm loss: 4.512412E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.714 | TFLOPs: 11.82 | 7: iteration 95490/ 173500 | consumed samples: 24445440 | consumed tokens: 50064261120 | elapsed time per iteration (s): 0.08 | learning rate: 9.709E-05 | global batch size: 256 | lm loss: 4.527800E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.560 | TFLOPs: 11.85 | 7: iteration 95500/ 173500 | consumed samples: 24448000 | consumed tokens: 50069504000 | elapsed time per iteration (s): 0.08 | learning rate: 9.707E-05 | global batch size: 256 | lm loss: 4.513668E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.724 | TFLOPs: 11.86 | 7: iteration 95510/ 173500 | consumed samples: 24450560 | consumed tokens: 50074746880 | elapsed time per iteration (s): 0.08 | learning rate: 9.705E-05 | global batch size: 256 | lm loss: 4.521358E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.548 | TFLOPs: 11.89 | 7: iteration 95520/ 173500 | consumed samples: 24453120 | consumed tokens: 50079989760 | elapsed time per iteration (s): 0.08 | learning rate: 9.704E-05 | global batch size: 256 | lm loss: 4.523377E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.622 | TFLOPs: 11.73 | 7: iteration 95530/ 173500 | consumed samples: 24455680 | consumed tokens: 50085232640 | elapsed time per iteration (s): 0.08 | learning rate: 9.702E-05 | global batch size: 256 | lm loss: 4.522021E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.000 | TFLOPs: 11.88 | 7: iteration 95540/ 173500 | consumed samples: 24458240 | consumed tokens: 50090475520 | elapsed time per iteration (s): 0.08 | learning rate: 9.700E-05 | global batch size: 256 | lm loss: 4.519276E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.313 | TFLOPs: 11.85 | 7: iteration 95550/ 173500 | consumed samples: 24460800 | consumed tokens: 50095718400 | elapsed time per iteration (s): 0.08 | learning rate: 9.699E-05 | global batch size: 256 | lm loss: 4.516512E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.038 | TFLOPs: 11.84 | 7: iteration 95560/ 173500 | consumed samples: 24463360 | consumed tokens: 50100961280 | elapsed time per iteration (s): 0.08 | learning rate: 9.697E-05 | global batch size: 256 | lm loss: 4.513419E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.553 | TFLOPs: 11.87 | 7: iteration 95570/ 173500 | consumed samples: 24465920 | consumed tokens: 50106204160 | elapsed time per iteration (s): 0.08 | learning rate: 9.696E-05 | global batch size: 256 | lm loss: 4.520376E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.253 | TFLOPs: 11.81 | 7: iteration 95580/ 173500 | consumed samples: 24468480 | consumed tokens: 50111447040 | elapsed time per iteration (s): 0.08 | learning rate: 9.694E-05 | global batch size: 256 | lm loss: 4.529729E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.196 | TFLOPs: 11.83 | 7: iteration 95590/ 173500 | consumed samples: 24471040 | consumed tokens: 50116689920 | elapsed time per iteration (s): 0.08 | learning rate: 9.692E-05 | global batch size: 256 | lm loss: 4.532346E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.232 | TFLOPs: 11.87 | 7: iteration 95600/ 173500 | consumed samples: 24473600 | consumed tokens: 50121932800 | elapsed time per iteration (s): 0.08 | learning rate: 9.691E-05 | global batch size: 256 | lm loss: 4.512685E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.672 | TFLOPs: 11.85 | 7: iteration 95610/ 173500 | consumed samples: 24476160 | consumed tokens: 50127175680 | elapsed time per iteration (s): 0.08 | learning rate: 9.689E-05 | global batch size: 256 | lm loss: 4.515340E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.028 | TFLOPs: 11.85 | 7: iteration 95620/ 173500 | consumed samples: 24478720 | consumed tokens: 50132418560 | elapsed time per iteration (s): 0.09 | learning rate: 9.687E-05 | global batch size: 256 | lm loss: 4.531640E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.421 | TFLOPs: 11.13 | 7: iteration 95630/ 173500 | consumed samples: 24481280 | consumed tokens: 50137661440 | elapsed time per iteration (s): 0.09 | learning rate: 9.686E-05 | global batch size: 256 | lm loss: 4.516829E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.134 | TFLOPs: 11.16 | 7: iteration 95640/ 173500 | consumed samples: 24483840 | consumed tokens: 50142904320 | elapsed time per iteration (s): 0.08 | learning rate: 9.684E-05 | global batch size: 256 | lm loss: 4.509942E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.417 | TFLOPs: 11.83 | 7: iteration 95650/ 173500 | consumed samples: 24486400 | consumed tokens: 50148147200 | elapsed time per iteration (s): 0.08 | learning rate: 9.683E-05 | global batch size: 256 | lm loss: 4.532150E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.833 | TFLOPs: 11.86 | 7: iteration 95660/ 173500 | consumed samples: 24488960 | consumed tokens: 50153390080 | elapsed time per iteration (s): 0.08 | learning rate: 9.681E-05 | global batch size: 256 | lm loss: 4.511708E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.153 | TFLOPs: 11.85 | 7: iteration 95670/ 173500 | consumed samples: 24491520 | consumed tokens: 50158632960 | elapsed time per iteration (s): 0.08 | learning rate: 9.679E-05 | global batch size: 256 | lm loss: 4.520492E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.759 | TFLOPs: 11.85 | 7: iteration 95680/ 173500 | consumed samples: 24494080 | consumed tokens: 50163875840 | elapsed time per iteration (s): 0.08 | learning rate: 9.678E-05 | global batch size: 256 | lm loss: 4.519304E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.194 | TFLOPs: 11.85 | 7: iteration 95690/ 173500 | consumed samples: 24496640 | consumed tokens: 50169118720 | elapsed time per iteration (s): 0.08 | learning rate: 9.676E-05 | global batch size: 256 | lm loss: 4.511897E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.293 | TFLOPs: 11.88 | 7: iteration 95700/ 173500 | consumed samples: 24499200 | consumed tokens: 50174361600 | elapsed time per iteration (s): 0.08 | learning rate: 9.674E-05 | global batch size: 256 | lm loss: 4.522598E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.172 | TFLOPs: 11.73 | 7: iteration 95710/ 173500 | consumed samples: 24501760 | consumed tokens: 50179604480 | elapsed time per iteration (s): 0.08 | learning rate: 9.673E-05 | global batch size: 256 | lm loss: 4.517355E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.338 | TFLOPs: 11.86 | 7: iteration 95720/ 173500 | consumed samples: 24504320 | consumed tokens: 50184847360 | elapsed time per iteration (s): 0.08 | learning rate: 9.671E-05 | global batch size: 256 | lm loss: 4.516335E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.723 | TFLOPs: 11.87 | 7: iteration 95730/ 173500 | consumed samples: 24506880 | consumed tokens: 50190090240 | elapsed time per iteration (s): 0.08 | learning rate: 9.670E-05 | global batch size: 256 | lm loss: 4.526302E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.809 | TFLOPs: 11.90 | 7: iteration 95740/ 173500 | consumed samples: 24509440 | consumed tokens: 50195333120 | elapsed time per iteration (s): 0.08 | learning rate: 9.668E-05 | global batch size: 256 | lm loss: 4.531541E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.194 | TFLOPs: 11.88 | 7: iteration 95750/ 173500 | consumed samples: 24512000 | consumed tokens: 50200576000 | elapsed time per iteration (s): 0.08 | learning rate: 9.666E-05 | global batch size: 256 | lm loss: 4.511724E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.260 | TFLOPs: 11.48 | 7: iteration 95760/ 173500 | consumed samples: 24514560 | consumed tokens: 50205818880 | elapsed time per iteration (s): 0.08 | learning rate: 9.665E-05 | global batch size: 256 | lm loss: 4.519014E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.214 | TFLOPs: 11.44 | 7: iteration 95770/ 173500 | consumed samples: 24517120 | consumed tokens: 50211061760 | elapsed time per iteration (s): 0.08 | learning rate: 9.663E-05 | global batch size: 256 | lm loss: 4.524632E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.094 | TFLOPs: 11.85 | 7: iteration 95780/ 173500 | consumed samples: 24519680 | consumed tokens: 50216304640 | elapsed time per iteration (s): 0.08 | learning rate: 9.661E-05 | global batch size: 256 | lm loss: 4.521354E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.503 | TFLOPs: 11.67 | 7: iteration 95790/ 173500 | consumed samples: 24522240 | consumed tokens: 50221547520 | elapsed time per iteration (s): 0.08 | learning rate: 9.660E-05 | global batch size: 256 | lm loss: 4.525227E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.530 | TFLOPs: 11.84 | 7: iteration 95800/ 173500 | consumed samples: 24524800 | consumed tokens: 50226790400 | elapsed time per iteration (s): 0.08 | learning rate: 9.658E-05 | global batch size: 256 | lm loss: 4.509651E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.674 | TFLOPs: 11.87 | 7: iteration 95810/ 173500 | consumed samples: 24527360 | consumed tokens: 50232033280 | elapsed time per iteration (s): 0.08 | learning rate: 9.657E-05 | global batch size: 256 | lm loss: 4.527082E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.478 | TFLOPs: 11.89 | 7: iteration 95820/ 173500 | consumed samples: 24529920 | consumed tokens: 50237276160 | elapsed time per iteration (s): 0.09 | learning rate: 9.655E-05 | global batch size: 256 | lm loss: 4.524832E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2921.958 | TFLOPs: 10.87 | 7: iteration 95830/ 173500 | consumed samples: 24532480 | consumed tokens: 50242519040 | elapsed time per iteration (s): 0.08 | learning rate: 9.653E-05 | global batch size: 256 | lm loss: 4.517942E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.228 | TFLOPs: 11.89 | 7: iteration 95840/ 173500 | consumed samples: 24535040 | consumed tokens: 50247761920 | elapsed time per iteration (s): 0.08 | learning rate: 9.652E-05 | global batch size: 256 | lm loss: 4.515898E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.369 | TFLOPs: 11.89 | 7: iteration 95850/ 173500 | consumed samples: 24537600 | consumed tokens: 50253004800 | elapsed time per iteration (s): 0.08 | learning rate: 9.650E-05 | global batch size: 256 | lm loss: 4.508045E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.021 | TFLOPs: 11.88 | 7: iteration 95860/ 173500 | consumed samples: 24540160 | consumed tokens: 50258247680 | elapsed time per iteration (s): 0.08 | learning rate: 9.648E-05 | global batch size: 256 | lm loss: 4.524829E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.897 | TFLOPs: 11.87 | 7: iteration 95870/ 173500 | consumed samples: 24542720 | consumed tokens: 50263490560 | elapsed time per iteration (s): 0.08 | learning rate: 9.647E-05 | global batch size: 256 | lm loss: 4.525518E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.829 | TFLOPs: 11.89 | 7: iteration 95880/ 173500 | consumed samples: 24545280 | consumed tokens: 50268733440 | elapsed time per iteration (s): 0.08 | learning rate: 9.645E-05 | global batch size: 256 | lm loss: 4.515180E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.559 | TFLOPs: 11.89 | 7: iteration 95890/ 173500 | consumed samples: 24547840 | consumed tokens: 50273976320 | elapsed time per iteration (s): 0.08 | learning rate: 9.643E-05 | global batch size: 256 | lm loss: 4.529194E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.820 | TFLOPs: 11.63 | 7: iteration 95900/ 173500 | consumed samples: 24550400 | consumed tokens: 50279219200 | elapsed time per iteration (s): 0.08 | learning rate: 9.642E-05 | global batch size: 256 | lm loss: 4.526199E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.142 | TFLOPs: 11.89 | 7: iteration 95910/ 173500 | consumed samples: 24552960 | consumed tokens: 50284462080 | elapsed time per iteration (s): 0.08 | learning rate: 9.640E-05 | global batch size: 256 | lm loss: 4.529082E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.771 | TFLOPs: 11.64 | 7: iteration 95920/ 173500 | consumed samples: 24555520 | consumed tokens: 50289704960 | elapsed time per iteration (s): 0.08 | learning rate: 9.639E-05 | global batch size: 256 | lm loss: 4.532099E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.915 | TFLOPs: 11.79 | 7: iteration 95930/ 173500 | consumed samples: 24558080 | consumed tokens: 50294947840 | elapsed time per iteration (s): 0.08 | learning rate: 9.637E-05 | global batch size: 256 | lm loss: 4.520291E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.763 | TFLOPs: 11.91 | 7: iteration 95940/ 173500 | consumed samples: 24560640 | consumed tokens: 50300190720 | elapsed time per iteration (s): 0.08 | learning rate: 9.635E-05 | global batch size: 256 | lm loss: 4.523512E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.547 | TFLOPs: 11.64 | 7: iteration 95950/ 173500 | consumed samples: 24563200 | consumed tokens: 50305433600 | elapsed time per iteration (s): 0.08 | learning rate: 9.634E-05 | global batch size: 256 | lm loss: 4.519972E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.502 | TFLOPs: 11.89 | 7: iteration 95960/ 173500 | consumed samples: 24565760 | consumed tokens: 50310676480 | elapsed time per iteration (s): 0.08 | learning rate: 9.632E-05 | global batch size: 256 | lm loss: 4.523902E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.226 | TFLOPs: 11.94 | 7: iteration 95970/ 173500 | consumed samples: 24568320 | consumed tokens: 50315919360 | elapsed time per iteration (s): 0.08 | learning rate: 9.630E-05 | global batch size: 256 | lm loss: 4.529229E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.332 | TFLOPs: 11.90 | 7: iteration 95980/ 173500 | consumed samples: 24570880 | consumed tokens: 50321162240 | elapsed time per iteration (s): 0.08 | learning rate: 9.629E-05 | global batch size: 256 | lm loss: 4.516988E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.319 | TFLOPs: 11.75 | 7: iteration 95990/ 173500 | consumed samples: 24573440 | consumed tokens: 50326405120 | elapsed time per iteration (s): 0.08 | learning rate: 9.627E-05 | global batch size: 256 | lm loss: 4.533110E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.215 | TFLOPs: 11.92 | 0: [2023-03-17 02:34:05,480] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=0, lr=[9.625601507010446e-05, 9.625601507010446e-05, 9.625601507010446e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 96000/ 173500 | consumed samples: 24576000 | consumed tokens: 50331648000 | elapsed time per iteration (s): 0.08 | learning rate: 9.626E-05 | global batch size: 256 | lm loss: 4.526178E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.667 | TFLOPs: 11.90 | 0: steps: 96000 loss: 4.5476 iter time (s): 0.084 samples/sec: 3065.471 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 96000 | lm loss value: 4.388937E+00 | lm loss PPL: 8.055478E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 96000 to checkpoints_14m91b100m 0: [2023-03-17 02:34:05,537] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step96000 is begin to save! 0: [2023-03-17 02:34:05,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:34:05,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:34:05,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:34:05,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:34:05,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:34:05,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:34:05,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:34:05,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:34:05,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:34:05,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:34:05,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:34:05,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:34:05,580] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step96000/mp_rank_00_model_states.pt 0: [2023-03-17 02:34:05,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:34:05,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:34:05,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2023-03-17 02:34:05,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:34:05,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: successfully saved checkpoint at iteration 96000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.58 7: iteration 96010/ 173500 | consumed samples: 24578560 | consumed tokens: 50336890880 | elapsed time per iteration (s): 0.09 | learning rate: 9.624E-05 | global batch size: 256 | lm loss: 4.523133E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.523 | TFLOPs: 10.25 | 7: iteration 96020/ 173500 | consumed samples: 24581120 | consumed tokens: 50342133760 | elapsed time per iteration (s): 0.08 | learning rate: 9.622E-05 | global batch size: 256 | lm loss: 4.518098E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.366 | TFLOPs: 11.93 | 7: iteration 96030/ 173500 | consumed samples: 24583680 | consumed tokens: 50347376640 | elapsed time per iteration (s): 0.08 | learning rate: 9.621E-05 | global batch size: 256 | lm loss: 4.517450E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.861 | TFLOPs: 11.91 | 7: iteration 96040/ 173500 | consumed samples: 24586240 | consumed tokens: 50352619520 | elapsed time per iteration (s): 0.08 | learning rate: 9.619E-05 | global batch size: 256 | lm loss: 4.522941E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.895 | TFLOPs: 11.56 | 7: iteration 96050/ 173500 | consumed samples: 24588800 | consumed tokens: 50357862400 | elapsed time per iteration (s): 0.08 | learning rate: 9.617E-05 | global batch size: 256 | lm loss: 4.518002E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.354 | TFLOPs: 11.84 | 7: iteration 96060/ 173500 | consumed samples: 24591360 | consumed tokens: 50363105280 | elapsed time per iteration (s): 0.08 | learning rate: 9.616E-05 | global batch size: 256 | lm loss: 4.517728E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.801 | TFLOPs: 11.89 | 7: iteration 96070/ 173500 | consumed samples: 24593920 | consumed tokens: 50368348160 | elapsed time per iteration (s): 0.08 | learning rate: 9.614E-05 | global batch size: 256 | lm loss: 4.505954E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.937 | TFLOPs: 11.87 | 7: iteration 96080/ 173500 | consumed samples: 24596480 | consumed tokens: 50373591040 | elapsed time per iteration (s): 0.08 | learning rate: 9.613E-05 | global batch size: 256 | lm loss: 4.515961E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.600 | TFLOPs: 11.89 | 7: iteration 96090/ 173500 | consumed samples: 24599040 | consumed tokens: 50378833920 | elapsed time per iteration (s): 0.08 | learning rate: 9.611E-05 | global batch size: 256 | lm loss: 4.517281E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.270 | TFLOPs: 11.90 | 7: iteration 96100/ 173500 | consumed samples: 24601600 | consumed tokens: 50384076800 | elapsed time per iteration (s): 0.08 | learning rate: 9.609E-05 | global batch size: 256 | lm loss: 4.512745E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.639 | TFLOPs: 11.85 | 7: iteration 96110/ 173500 | consumed samples: 24604160 | consumed tokens: 50389319680 | elapsed time per iteration (s): 0.08 | learning rate: 9.608E-05 | global batch size: 256 | lm loss: 4.532548E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.963 | TFLOPs: 11.82 | 7: iteration 96120/ 173500 | consumed samples: 24606720 | consumed tokens: 50394562560 | elapsed time per iteration (s): 0.08 | learning rate: 9.606E-05 | global batch size: 256 | lm loss: 4.508770E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.233 | TFLOPs: 11.94 | 7: iteration 96130/ 173500 | consumed samples: 24609280 | consumed tokens: 50399805440 | elapsed time per iteration (s): 0.08 | learning rate: 9.604E-05 | global batch size: 256 | lm loss: 4.511662E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.010 | TFLOPs: 11.92 | 7: iteration 96140/ 173500 | consumed samples: 24611840 | consumed tokens: 50405048320 | elapsed time per iteration (s): 0.08 | learning rate: 9.603E-05 | global batch size: 256 | lm loss: 4.512104E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.868 | TFLOPs: 11.52 | 7: iteration 96150/ 173500 | consumed samples: 24614400 | consumed tokens: 50410291200 | elapsed time per iteration (s): 0.08 | learning rate: 9.601E-05 | global batch size: 256 | lm loss: 4.494560E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.087 | TFLOPs: 11.88 | 7: iteration 96160/ 173500 | consumed samples: 24616960 | consumed tokens: 50415534080 | elapsed time per iteration (s): 0.08 | learning rate: 9.600E-05 | global batch size: 256 | lm loss: 4.525596E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.826 | TFLOPs: 11.95 | 7: iteration 96170/ 173500 | consumed samples: 24619520 | consumed tokens: 50420776960 | elapsed time per iteration (s): 0.08 | learning rate: 9.598E-05 | global batch size: 256 | lm loss: 4.537012E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.790 | TFLOPs: 11.94 | 7: iteration 96180/ 173500 | consumed samples: 24622080 | consumed tokens: 50426019840 | elapsed time per iteration (s): 0.08 | learning rate: 9.596E-05 | global batch size: 256 | lm loss: 4.520294E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.585 | TFLOPs: 11.54 | 7: iteration 96190/ 173500 | consumed samples: 24624640 | consumed tokens: 50431262720 | elapsed time per iteration (s): 0.08 | learning rate: 9.595E-05 | global batch size: 256 | lm loss: 4.528833E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.738 | TFLOPs: 11.85 | 7: iteration 96200/ 173500 | consumed samples: 24627200 | consumed tokens: 50436505600 | elapsed time per iteration (s): 0.08 | learning rate: 9.593E-05 | global batch size: 256 | lm loss: 4.508125E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.835 | TFLOPs: 11.81 | 7: iteration 96210/ 173500 | consumed samples: 24629760 | consumed tokens: 50441748480 | elapsed time per iteration (s): 0.08 | learning rate: 9.591E-05 | global batch size: 256 | lm loss: 4.525888E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.415 | TFLOPs: 11.59 | 7: iteration 96220/ 173500 | consumed samples: 24632320 | consumed tokens: 50446991360 | elapsed time per iteration (s): 0.08 | learning rate: 9.590E-05 | global batch size: 256 | lm loss: 4.526852E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.065 | TFLOPs: 11.61 | 7: iteration 96230/ 173500 | consumed samples: 24634880 | consumed tokens: 50452234240 | elapsed time per iteration (s): 0.08 | learning rate: 9.588E-05 | global batch size: 256 | lm loss: 4.516014E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.268 | TFLOPs: 11.59 | 7: iteration 96240/ 173500 | consumed samples: 24637440 | consumed tokens: 50457477120 | elapsed time per iteration (s): 0.09 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 4.516841E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2951.756 | TFLOPs: 10.98 | 7: iteration 96250/ 173500 | consumed samples: 24640000 | consumed tokens: 50462720000 | elapsed time per iteration (s): 0.08 | learning rate: 9.585E-05 | global batch size: 256 | lm loss: 4.536279E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.402 | TFLOPs: 11.58 | 7: iteration 96260/ 173500 | consumed samples: 24642560 | consumed tokens: 50467962880 | elapsed time per iteration (s): 0.08 | learning rate: 9.583E-05 | global batch size: 256 | lm loss: 4.529473E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.544 | TFLOPs: 11.81 | 7: iteration 96270/ 173500 | consumed samples: 24645120 | consumed tokens: 50473205760 | elapsed time per iteration (s): 0.08 | learning rate: 9.582E-05 | global batch size: 256 | lm loss: 4.531252E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.251 | TFLOPs: 11.86 | 7: iteration 96280/ 173500 | consumed samples: 24647680 | consumed tokens: 50478448640 | elapsed time per iteration (s): 0.08 | learning rate: 9.580E-05 | global batch size: 256 | lm loss: 4.518386E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.748 | TFLOPs: 11.86 | 7: iteration 96290/ 173500 | consumed samples: 24650240 | consumed tokens: 50483691520 | elapsed time per iteration (s): 0.08 | learning rate: 9.578E-05 | global batch size: 256 | lm loss: 4.526226E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.042 | TFLOPs: 11.73 | 7: iteration 96300/ 173500 | consumed samples: 24652800 | consumed tokens: 50488934400 | elapsed time per iteration (s): 0.08 | learning rate: 9.577E-05 | global batch size: 256 | lm loss: 4.534489E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.123 | TFLOPs: 11.83 | 7: iteration 96310/ 173500 | consumed samples: 24655360 | consumed tokens: 50494177280 | elapsed time per iteration (s): 0.08 | learning rate: 9.575E-05 | global batch size: 256 | lm loss: 4.528622E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.487 | TFLOPs: 11.85 | 7: iteration 96320/ 173500 | consumed samples: 24657920 | consumed tokens: 50499420160 | elapsed time per iteration (s): 0.08 | learning rate: 9.574E-05 | global batch size: 256 | lm loss: 4.516254E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.876 | TFLOPs: 11.87 | 7: iteration 96330/ 173500 | consumed samples: 24660480 | consumed tokens: 50504663040 | elapsed time per iteration (s): 0.08 | learning rate: 9.572E-05 | global batch size: 256 | lm loss: 4.534971E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.064 | TFLOPs: 11.67 | 7: iteration 96340/ 173500 | consumed samples: 24663040 | consumed tokens: 50509905920 | elapsed time per iteration (s): 0.08 | learning rate: 9.570E-05 | global batch size: 256 | lm loss: 4.510211E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.902 | TFLOPs: 11.53 | 7: iteration 96350/ 173500 | consumed samples: 24665600 | consumed tokens: 50515148800 | elapsed time per iteration (s): 0.08 | learning rate: 9.569E-05 | global batch size: 256 | lm loss: 4.515703E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.428 | TFLOPs: 11.88 | 7: iteration 96360/ 173500 | consumed samples: 24668160 | consumed tokens: 50520391680 | elapsed time per iteration (s): 0.08 | learning rate: 9.567E-05 | global batch size: 256 | lm loss: 4.526694E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.821 | TFLOPs: 11.60 | 7: iteration 96370/ 173500 | consumed samples: 24670720 | consumed tokens: 50525634560 | elapsed time per iteration (s): 0.08 | learning rate: 9.565E-05 | global batch size: 256 | lm loss: 4.508004E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.319 | TFLOPs: 11.81 | 7: iteration 96380/ 173500 | consumed samples: 24673280 | consumed tokens: 50530877440 | elapsed time per iteration (s): 0.08 | learning rate: 9.564E-05 | global batch size: 256 | lm loss: 4.527782E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.419 | TFLOPs: 11.79 | 7: iteration 96390/ 173500 | consumed samples: 24675840 | consumed tokens: 50536120320 | elapsed time per iteration (s): 0.08 | learning rate: 9.562E-05 | global batch size: 256 | lm loss: 4.524866E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.001 | TFLOPs: 11.78 | 7: iteration 96400/ 173500 | consumed samples: 24678400 | consumed tokens: 50541363200 | elapsed time per iteration (s): 0.10 | learning rate: 9.561E-05 | global batch size: 256 | lm loss: 4.529339E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2590.200 | TFLOPs: 9.63 | 7: iteration 96410/ 173500 | consumed samples: 24680960 | consumed tokens: 50546606080 | elapsed time per iteration (s): 0.09 | learning rate: 9.559E-05 | global batch size: 256 | lm loss: 4.517575E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.487 | TFLOPs: 10.04 | 7: iteration 96420/ 173500 | consumed samples: 24683520 | consumed tokens: 50551848960 | elapsed time per iteration (s): 0.08 | learning rate: 9.557E-05 | global batch size: 256 | lm loss: 4.518784E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.630 | TFLOPs: 11.86 | 7: iteration 96430/ 173500 | consumed samples: 24686080 | consumed tokens: 50557091840 | elapsed time per iteration (s): 0.09 | learning rate: 9.556E-05 | global batch size: 256 | lm loss: 4.529574E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.850 | TFLOPs: 11.08 | 7: iteration 96440/ 173500 | consumed samples: 24688640 | consumed tokens: 50562334720 | elapsed time per iteration (s): 0.08 | learning rate: 9.554E-05 | global batch size: 256 | lm loss: 4.509822E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.736 | TFLOPs: 11.60 | 7: iteration 96450/ 173500 | consumed samples: 24691200 | consumed tokens: 50567577600 | elapsed time per iteration (s): 0.08 | learning rate: 9.552E-05 | global batch size: 256 | lm loss: 4.524842E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.417 | TFLOPs: 11.85 | 7: iteration 96460/ 173500 | consumed samples: 24693760 | consumed tokens: 50572820480 | elapsed time per iteration (s): 0.09 | learning rate: 9.551E-05 | global batch size: 256 | lm loss: 4.521579E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.374 | TFLOPs: 10.81 | 7: iteration 96470/ 173500 | consumed samples: 24696320 | consumed tokens: 50578063360 | elapsed time per iteration (s): 0.09 | learning rate: 9.549E-05 | global batch size: 256 | lm loss: 4.515387E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2742.455 | TFLOPs: 10.20 | 7: iteration 96480/ 173500 | consumed samples: 24698880 | consumed tokens: 50583306240 | elapsed time per iteration (s): 0.08 | learning rate: 9.548E-05 | global batch size: 256 | lm loss: 4.523699E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.830 | TFLOPs: 11.34 | 7: iteration 96490/ 173500 | consumed samples: 24701440 | consumed tokens: 50588549120 | elapsed time per iteration (s): 0.08 | learning rate: 9.546E-05 | global batch size: 256 | lm loss: 4.527171E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.709 | TFLOPs: 11.47 | 7: iteration 96500/ 173500 | consumed samples: 24704000 | consumed tokens: 50593792000 | elapsed time per iteration (s): 0.08 | learning rate: 9.544E-05 | global batch size: 256 | lm loss: 4.516582E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.800 | TFLOPs: 11.86 | 7: iteration 96510/ 173500 | consumed samples: 24706560 | consumed tokens: 50599034880 | elapsed time per iteration (s): 0.08 | learning rate: 9.543E-05 | global batch size: 256 | lm loss: 4.522869E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.639 | TFLOPs: 11.58 | 7: iteration 96520/ 173500 | consumed samples: 24709120 | consumed tokens: 50604277760 | elapsed time per iteration (s): 0.08 | learning rate: 9.541E-05 | global batch size: 256 | lm loss: 4.513148E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.276 | TFLOPs: 11.82 | 7: iteration 96530/ 173500 | consumed samples: 24711680 | consumed tokens: 50609520640 | elapsed time per iteration (s): 0.08 | learning rate: 9.539E-05 | global batch size: 256 | lm loss: 4.515303E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.754 | TFLOPs: 11.80 | 7: iteration 96540/ 173500 | consumed samples: 24714240 | consumed tokens: 50614763520 | elapsed time per iteration (s): 0.08 | learning rate: 9.538E-05 | global batch size: 256 | lm loss: 4.507615E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.543 | TFLOPs: 11.83 | 7: iteration 96550/ 173500 | consumed samples: 24716800 | consumed tokens: 50620006400 | elapsed time per iteration (s): 0.08 | learning rate: 9.536E-05 | global batch size: 256 | lm loss: 4.518915E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.545 | TFLOPs: 11.84 | 7: iteration 96560/ 173500 | consumed samples: 24719360 | consumed tokens: 50625249280 | elapsed time per iteration (s): 0.08 | learning rate: 9.535E-05 | global batch size: 256 | lm loss: 4.520457E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.909 | TFLOPs: 11.78 | 7: iteration 96570/ 173500 | consumed samples: 24721920 | consumed tokens: 50630492160 | elapsed time per iteration (s): 0.08 | learning rate: 9.533E-05 | global batch size: 256 | lm loss: 4.535460E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.377 | TFLOPs: 11.84 | 7: iteration 96580/ 173500 | consumed samples: 24724480 | consumed tokens: 50635735040 | elapsed time per iteration (s): 0.08 | learning rate: 9.531E-05 | global batch size: 256 | lm loss: 4.527956E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.328 | TFLOPs: 11.83 | 7: iteration 96590/ 173500 | consumed samples: 24727040 | consumed tokens: 50640977920 | elapsed time per iteration (s): 0.08 | learning rate: 9.530E-05 | global batch size: 256 | lm loss: 4.517862E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.391 | TFLOPs: 11.82 | 7: iteration 96600/ 173500 | consumed samples: 24729600 | consumed tokens: 50646220800 | elapsed time per iteration (s): 0.08 | learning rate: 9.528E-05 | global batch size: 256 | lm loss: 4.518223E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.195 | TFLOPs: 11.83 | 7: iteration 96610/ 173500 | consumed samples: 24732160 | consumed tokens: 50651463680 | elapsed time per iteration (s): 0.08 | learning rate: 9.526E-05 | global batch size: 256 | lm loss: 4.533144E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.417 | TFLOPs: 11.82 | 7: iteration 96620/ 173500 | consumed samples: 24734720 | consumed tokens: 50656706560 | elapsed time per iteration (s): 0.08 | learning rate: 9.525E-05 | global batch size: 256 | lm loss: 4.530210E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.141 | TFLOPs: 11.79 | 7: iteration 96630/ 173500 | consumed samples: 24737280 | consumed tokens: 50661949440 | elapsed time per iteration (s): 0.08 | learning rate: 9.523E-05 | global batch size: 256 | lm loss: 4.513504E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.883 | TFLOPs: 11.53 | 7: iteration 96640/ 173500 | consumed samples: 24739840 | consumed tokens: 50667192320 | elapsed time per iteration (s): 0.08 | learning rate: 9.522E-05 | global batch size: 256 | lm loss: 4.525130E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.094 | TFLOPs: 11.82 | 7: iteration 96650/ 173500 | consumed samples: 24742400 | consumed tokens: 50672435200 | elapsed time per iteration (s): 0.08 | learning rate: 9.520E-05 | global batch size: 256 | lm loss: 4.514925E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.962 | TFLOPs: 11.89 | 7: iteration 96660/ 173500 | consumed samples: 24744960 | consumed tokens: 50677678080 | elapsed time per iteration (s): 0.08 | learning rate: 9.518E-05 | global batch size: 256 | lm loss: 4.527261E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.937 | TFLOPs: 11.73 | 7: iteration 96670/ 173500 | consumed samples: 24747520 | consumed tokens: 50682920960 | elapsed time per iteration (s): 0.09 | learning rate: 9.517E-05 | global batch size: 256 | lm loss: 4.519673E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2721.851 | TFLOPs: 10.12 | 7: iteration 96680/ 173500 | consumed samples: 24750080 | consumed tokens: 50688163840 | elapsed time per iteration (s): 0.10 | learning rate: 9.515E-05 | global batch size: 256 | lm loss: 4.523954E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2657.322 | TFLOPs: 9.88 | 7: iteration 96690/ 173500 | consumed samples: 24752640 | consumed tokens: 50693406720 | elapsed time per iteration (s): 0.09 | learning rate: 9.513E-05 | global batch size: 256 | lm loss: 4.537202E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.422 | TFLOPs: 11.12 | 7: iteration 96700/ 173500 | consumed samples: 24755200 | consumed tokens: 50698649600 | elapsed time per iteration (s): 0.12 | learning rate: 9.512E-05 | global batch size: 256 | lm loss: 4.534643E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2063.339 | TFLOPs: 7.67 | 7: iteration 96710/ 173500 | consumed samples: 24757760 | consumed tokens: 50703892480 | elapsed time per iteration (s): 0.08 | learning rate: 9.510E-05 | global batch size: 256 | lm loss: 4.519399E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.593 | TFLOPs: 11.94 | 7: iteration 96720/ 173500 | consumed samples: 24760320 | consumed tokens: 50709135360 | elapsed time per iteration (s): 0.10 | learning rate: 9.509E-05 | global batch size: 256 | lm loss: 4.515435E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.904 | TFLOPs: 9.61 | 7: iteration 96730/ 173500 | consumed samples: 24762880 | consumed tokens: 50714378240 | elapsed time per iteration (s): 0.09 | learning rate: 9.507E-05 | global batch size: 256 | lm loss: 4.542874E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2860.004 | TFLOPs: 10.64 | 7: iteration 96740/ 173500 | consumed samples: 24765440 | consumed tokens: 50719621120 | elapsed time per iteration (s): 0.08 | learning rate: 9.505E-05 | global batch size: 256 | lm loss: 4.513548E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.731 | TFLOPs: 11.90 | 7: iteration 96750/ 173500 | consumed samples: 24768000 | consumed tokens: 50724864000 | elapsed time per iteration (s): 0.09 | learning rate: 9.504E-05 | global batch size: 256 | lm loss: 4.527262E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.957 | TFLOPs: 10.98 | 7: iteration 96760/ 173500 | consumed samples: 24770560 | consumed tokens: 50730106880 | elapsed time per iteration (s): 0.10 | learning rate: 9.502E-05 | global batch size: 256 | lm loss: 4.512738E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2692.428 | TFLOPs: 10.01 | 7: iteration 96770/ 173500 | consumed samples: 24773120 | consumed tokens: 50735349760 | elapsed time per iteration (s): 0.08 | learning rate: 9.500E-05 | global batch size: 256 | lm loss: 4.509021E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.185 | TFLOPs: 11.91 | 7: iteration 96780/ 173500 | consumed samples: 24775680 | consumed tokens: 50740592640 | elapsed time per iteration (s): 0.08 | learning rate: 9.499E-05 | global batch size: 256 | lm loss: 4.526410E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.009 | TFLOPs: 11.30 | 7: iteration 96790/ 173500 | consumed samples: 24778240 | consumed tokens: 50745835520 | elapsed time per iteration (s): 0.10 | learning rate: 9.497E-05 | global batch size: 256 | lm loss: 4.510509E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.834 | TFLOPs: 9.98 | 7: iteration 96800/ 173500 | consumed samples: 24780800 | consumed tokens: 50751078400 | elapsed time per iteration (s): 0.08 | learning rate: 9.496E-05 | global batch size: 256 | lm loss: 4.519264E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.457 | TFLOPs: 11.96 | 7: iteration 96810/ 173500 | consumed samples: 24783360 | consumed tokens: 50756321280 | elapsed time per iteration (s): 0.08 | learning rate: 9.494E-05 | global batch size: 256 | lm loss: 4.518161E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.911 | TFLOPs: 11.96 | 7: iteration 96820/ 173500 | consumed samples: 24785920 | consumed tokens: 50761564160 | elapsed time per iteration (s): 0.09 | learning rate: 9.492E-05 | global batch size: 256 | lm loss: 4.531514E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.017 | TFLOPs: 10.86 | 7: iteration 96830/ 173500 | consumed samples: 24788480 | consumed tokens: 50766807040 | elapsed time per iteration (s): 0.09 | learning rate: 9.491E-05 | global batch size: 256 | lm loss: 4.522245E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.855 | TFLOPs: 11.14 | 7: iteration 96840/ 173500 | consumed samples: 24791040 | consumed tokens: 50772049920 | elapsed time per iteration (s): 0.08 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 4.526709E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.424 | TFLOPs: 11.88 | 7: iteration 96850/ 173500 | consumed samples: 24793600 | consumed tokens: 50777292800 | elapsed time per iteration (s): 0.08 | learning rate: 9.487E-05 | global batch size: 256 | lm loss: 4.519580E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.213 | TFLOPs: 11.94 | 7: iteration 96860/ 173500 | consumed samples: 24796160 | consumed tokens: 50782535680 | elapsed time per iteration (s): 0.10 | learning rate: 9.486E-05 | global batch size: 256 | lm loss: 4.503284E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.496 | TFLOPs: 9.68 | 7: iteration 96870/ 173500 | consumed samples: 24798720 | consumed tokens: 50787778560 | elapsed time per iteration (s): 0.08 | learning rate: 9.484E-05 | global batch size: 256 | lm loss: 4.526694E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.005 | TFLOPs: 11.62 | 7: iteration 96880/ 173500 | consumed samples: 24801280 | consumed tokens: 50793021440 | elapsed time per iteration (s): 0.09 | learning rate: 9.483E-05 | global batch size: 256 | lm loss: 4.521239E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.822 | TFLOPs: 10.94 | 7: iteration 96890/ 173500 | consumed samples: 24803840 | consumed tokens: 50798264320 | elapsed time per iteration (s): 0.08 | learning rate: 9.481E-05 | global batch size: 256 | lm loss: 4.529847E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.501 | TFLOPs: 11.73 | 7: iteration 96900/ 173500 | consumed samples: 24806400 | consumed tokens: 50803507200 | elapsed time per iteration (s): 0.08 | learning rate: 9.479E-05 | global batch size: 256 | lm loss: 4.522381E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.678 | TFLOPs: 11.97 | 7: iteration 96910/ 173500 | consumed samples: 24808960 | consumed tokens: 50808750080 | elapsed time per iteration (s): 0.09 | learning rate: 9.478E-05 | global batch size: 256 | lm loss: 4.525565E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.911 | TFLOPs: 10.97 | 7: iteration 96920/ 173500 | consumed samples: 24811520 | consumed tokens: 50813992960 | elapsed time per iteration (s): 0.08 | learning rate: 9.476E-05 | global batch size: 256 | lm loss: 4.527042E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.677 | TFLOPs: 12.01 | 7: iteration 96930/ 173500 | consumed samples: 24814080 | consumed tokens: 50819235840 | elapsed time per iteration (s): 0.08 | learning rate: 9.475E-05 | global batch size: 256 | lm loss: 4.526447E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.175 | TFLOPs: 11.91 | 7: iteration 96940/ 173500 | consumed samples: 24816640 | consumed tokens: 50824478720 | elapsed time per iteration (s): 0.08 | learning rate: 9.473E-05 | global batch size: 256 | lm loss: 4.515357E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.812 | TFLOPs: 11.75 | 7: iteration 96950/ 173500 | consumed samples: 24819200 | consumed tokens: 50829721600 | elapsed time per iteration (s): 0.08 | learning rate: 9.471E-05 | global batch size: 256 | lm loss: 4.523009E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.950 | TFLOPs: 11.84 | 7: iteration 96960/ 173500 | consumed samples: 24821760 | consumed tokens: 50834964480 | elapsed time per iteration (s): 0.08 | learning rate: 9.470E-05 | global batch size: 256 | lm loss: 4.519810E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.258 | TFLOPs: 11.87 | 7: iteration 96970/ 173500 | consumed samples: 24824320 | consumed tokens: 50840207360 | elapsed time per iteration (s): 0.08 | learning rate: 9.468E-05 | global batch size: 256 | lm loss: 4.518039E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.851 | TFLOPs: 11.68 | 7: iteration 96980/ 173500 | consumed samples: 24826880 | consumed tokens: 50845450240 | elapsed time per iteration (s): 0.09 | learning rate: 9.466E-05 | global batch size: 256 | lm loss: 4.517574E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.755 | TFLOPs: 11.01 | 7: iteration 96990/ 173500 | consumed samples: 24829440 | consumed tokens: 50850693120 | elapsed time per iteration (s): 0.08 | learning rate: 9.465E-05 | global batch size: 256 | lm loss: 4.524630E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.624 | TFLOPs: 11.83 | 7: iteration 97000/ 173500 | consumed samples: 24832000 | consumed tokens: 50855936000 | elapsed time per iteration (s): 0.10 | learning rate: 9.463E-05 | global batch size: 256 | lm loss: 4.519894E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.897 | TFLOPs: 9.61 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 97000 | lm loss value: 4.389789E+00 | lm loss PPL: 8.062341E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 97000 to checkpoints_14m91b100m 0: [2023-03-17 02:35:29,112] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step97000 is begin to save! 0: [2023-03-17 02:35:29,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:35:29,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:35:29,142] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:35:29,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:35:29,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:35:29,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:35:29,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:35:29,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:35:29,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:35:29,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:35:29,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:35:29,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:35:29,156] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step97000/mp_rank_00_model_states.pt 0: [2023-03-17 02:35:29,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:35:29,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:35:29,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:35:29,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:35:29,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2023-03-17 02:35:29,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:35:29,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2023-03-17 02:35:29,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:35:29,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:35:29,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: successfully saved checkpoint at iteration 97000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.55 7: iteration 97010/ 173500 | consumed samples: 24834560 | consumed tokens: 50861178880 | elapsed time per iteration (s): 0.11 | learning rate: 9.462E-05 | global batch size: 256 | lm loss: 4.512485E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.502 | TFLOPs: 9.04 | 7: iteration 97020/ 173500 | consumed samples: 24837120 | consumed tokens: 50866421760 | elapsed time per iteration (s): 0.08 | learning rate: 9.460E-05 | global batch size: 256 | lm loss: 4.527974E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.195 | TFLOPs: 11.68 | 7: iteration 97030/ 173500 | consumed samples: 24839680 | consumed tokens: 50871664640 | elapsed time per iteration (s): 0.08 | learning rate: 9.458E-05 | global batch size: 256 | lm loss: 4.519742E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.949 | TFLOPs: 11.94 | 7: iteration 97040/ 173500 | consumed samples: 24842240 | consumed tokens: 50876907520 | elapsed time per iteration (s): 0.08 | learning rate: 9.457E-05 | global batch size: 256 | lm loss: 4.521858E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.689 | TFLOPs: 11.88 | 7: iteration 97050/ 173500 | consumed samples: 24844800 | consumed tokens: 50882150400 | elapsed time per iteration (s): 0.11 | learning rate: 9.455E-05 | global batch size: 256 | lm loss: 4.511172E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.509 | TFLOPs: 9.02 | 7: iteration 97060/ 173500 | consumed samples: 24847360 | consumed tokens: 50887393280 | elapsed time per iteration (s): 0.13 | learning rate: 9.453E-05 | global batch size: 256 | lm loss: 4.522242E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.545 | TFLOPs: 7.58 | 7: iteration 97070/ 173500 | consumed samples: 24849920 | consumed tokens: 50892636160 | elapsed time per iteration (s): 0.09 | learning rate: 9.452E-05 | global batch size: 256 | lm loss: 4.525861E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.636 | TFLOPs: 10.26 | 7: iteration 97080/ 173500 | consumed samples: 24852480 | consumed tokens: 50897879040 | elapsed time per iteration (s): 0.09 | learning rate: 9.450E-05 | global batch size: 256 | lm loss: 4.530647E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.144 | TFLOPs: 10.27 | 7: iteration 97090/ 173500 | consumed samples: 24855040 | consumed tokens: 50903121920 | elapsed time per iteration (s): 0.08 | learning rate: 9.449E-05 | global batch size: 256 | lm loss: 4.517194E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.804 | TFLOPs: 11.98 | 7: iteration 97100/ 173500 | consumed samples: 24857600 | consumed tokens: 50908364800 | elapsed time per iteration (s): 0.08 | learning rate: 9.447E-05 | global batch size: 256 | lm loss: 4.521514E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.507 | TFLOPs: 11.80 | 7: iteration 97110/ 173500 | consumed samples: 24860160 | consumed tokens: 50913607680 | elapsed time per iteration (s): 0.08 | learning rate: 9.445E-05 | global batch size: 256 | lm loss: 4.516433E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.259 | TFLOPs: 11.96 | 7: iteration 97120/ 173500 | consumed samples: 24862720 | consumed tokens: 50918850560 | elapsed time per iteration (s): 0.08 | learning rate: 9.444E-05 | global batch size: 256 | lm loss: 4.512671E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.245 | TFLOPs: 11.97 | 7: iteration 97130/ 173500 | consumed samples: 24865280 | consumed tokens: 50924093440 | elapsed time per iteration (s): 0.08 | learning rate: 9.442E-05 | global batch size: 256 | lm loss: 4.534423E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.856 | TFLOPs: 11.68 | 7: iteration 97140/ 173500 | consumed samples: 24867840 | consumed tokens: 50929336320 | elapsed time per iteration (s): 0.08 | learning rate: 9.440E-05 | global batch size: 256 | lm loss: 4.520655E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.358 | TFLOPs: 11.77 | 7: iteration 97150/ 173500 | consumed samples: 24870400 | consumed tokens: 50934579200 | elapsed time per iteration (s): 0.08 | learning rate: 9.439E-05 | global batch size: 256 | lm loss: 4.522049E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.938 | TFLOPs: 11.96 | 7: iteration 97160/ 173500 | consumed samples: 24872960 | consumed tokens: 50939822080 | elapsed time per iteration (s): 0.08 | learning rate: 9.437E-05 | global batch size: 256 | lm loss: 4.532082E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.475 | TFLOPs: 12.03 | 7: iteration 97170/ 173500 | consumed samples: 24875520 | consumed tokens: 50945064960 | elapsed time per iteration (s): 0.08 | learning rate: 9.436E-05 | global batch size: 256 | lm loss: 4.534195E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.255 | TFLOPs: 11.87 | 7: iteration 97180/ 173500 | consumed samples: 24878080 | consumed tokens: 50950307840 | elapsed time per iteration (s): 0.09 | learning rate: 9.434E-05 | global batch size: 256 | lm loss: 4.521302E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.905 | TFLOPs: 10.70 | 7: iteration 97190/ 173500 | consumed samples: 24880640 | consumed tokens: 50955550720 | elapsed time per iteration (s): 0.08 | learning rate: 9.432E-05 | global batch size: 256 | lm loss: 4.515711E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.673 | TFLOPs: 11.94 | 7: iteration 97200/ 173500 | consumed samples: 24883200 | consumed tokens: 50960793600 | elapsed time per iteration (s): 0.09 | learning rate: 9.431E-05 | global batch size: 256 | lm loss: 4.522519E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2914.177 | TFLOPs: 10.84 | 7: iteration 97210/ 173500 | consumed samples: 24885760 | consumed tokens: 50966036480 | elapsed time per iteration (s): 0.08 | learning rate: 9.429E-05 | global batch size: 256 | lm loss: 4.532664E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.543 | TFLOPs: 11.98 | 7: iteration 97220/ 173500 | consumed samples: 24888320 | consumed tokens: 50971279360 | elapsed time per iteration (s): 0.09 | learning rate: 9.427E-05 | global batch size: 256 | lm loss: 4.517497E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.190 | TFLOPs: 10.92 | 7: iteration 97230/ 173500 | consumed samples: 24890880 | consumed tokens: 50976522240 | elapsed time per iteration (s): 0.10 | learning rate: 9.426E-05 | global batch size: 256 | lm loss: 4.524550E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.607 | TFLOPs: 10.00 | 7: iteration 97240/ 173500 | consumed samples: 24893440 | consumed tokens: 50981765120 | elapsed time per iteration (s): 0.08 | learning rate: 9.424E-05 | global batch size: 256 | lm loss: 4.523515E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.634 | TFLOPs: 11.76 | 7: iteration 97250/ 173500 | consumed samples: 24896000 | consumed tokens: 50987008000 | elapsed time per iteration (s): 0.09 | learning rate: 9.423E-05 | global batch size: 256 | lm loss: 4.520914E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.268 | TFLOPs: 10.80 | 7: iteration 97260/ 173500 | consumed samples: 24898560 | consumed tokens: 50992250880 | elapsed time per iteration (s): 0.08 | learning rate: 9.421E-05 | global batch size: 256 | lm loss: 4.512028E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.336 | TFLOPs: 11.29 | 7: iteration 97270/ 173500 | consumed samples: 24901120 | consumed tokens: 50997493760 | elapsed time per iteration (s): 0.08 | learning rate: 9.419E-05 | global batch size: 256 | lm loss: 4.518153E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.585 | TFLOPs: 11.85 | 7: iteration 97280/ 173500 | consumed samples: 24903680 | consumed tokens: 51002736640 | elapsed time per iteration (s): 0.10 | learning rate: 9.418E-05 | global batch size: 256 | lm loss: 4.531657E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.895 | TFLOPs: 9.29 | 7: iteration 97290/ 173500 | consumed samples: 24906240 | consumed tokens: 51007979520 | elapsed time per iteration (s): 0.08 | learning rate: 9.416E-05 | global batch size: 256 | lm loss: 4.523893E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.018 | TFLOPs: 11.92 | 7: iteration 97300/ 173500 | consumed samples: 24908800 | consumed tokens: 51013222400 | elapsed time per iteration (s): 0.08 | learning rate: 9.415E-05 | global batch size: 256 | lm loss: 4.528744E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.433 | TFLOPs: 11.88 | 7: iteration 97310/ 173500 | consumed samples: 24911360 | consumed tokens: 51018465280 | elapsed time per iteration (s): 0.09 | learning rate: 9.413E-05 | global batch size: 256 | lm loss: 4.516660E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.106 | TFLOPs: 11.12 | 7: iteration 97320/ 173500 | consumed samples: 24913920 | consumed tokens: 51023708160 | elapsed time per iteration (s): 0.09 | learning rate: 9.411E-05 | global batch size: 256 | lm loss: 4.512878E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2704.232 | TFLOPs: 10.06 | 7: iteration 97330/ 173500 | consumed samples: 24916480 | consumed tokens: 51028951040 | elapsed time per iteration (s): 0.08 | learning rate: 9.410E-05 | global batch size: 256 | lm loss: 4.517098E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.602 | TFLOPs: 11.82 | 7: iteration 97340/ 173500 | consumed samples: 24919040 | consumed tokens: 51034193920 | elapsed time per iteration (s): 0.08 | learning rate: 9.408E-05 | global batch size: 256 | lm loss: 4.514367E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.790 | TFLOPs: 11.52 | 7: iteration 97350/ 173500 | consumed samples: 24921600 | consumed tokens: 51039436800 | elapsed time per iteration (s): 0.08 | learning rate: 9.406E-05 | global batch size: 256 | lm loss: 4.526672E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.563 | TFLOPs: 11.79 | 7: iteration 97360/ 173500 | consumed samples: 24924160 | consumed tokens: 51044679680 | elapsed time per iteration (s): 0.08 | learning rate: 9.405E-05 | global batch size: 256 | lm loss: 4.502199E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.389 | TFLOPs: 11.70 | 7: iteration 97370/ 173500 | consumed samples: 24926720 | consumed tokens: 51049922560 | elapsed time per iteration (s): 0.08 | learning rate: 9.403E-05 | global batch size: 256 | lm loss: 4.532491E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.766 | TFLOPs: 11.79 | 7: iteration 97380/ 173500 | consumed samples: 24929280 | consumed tokens: 51055165440 | elapsed time per iteration (s): 0.08 | learning rate: 9.402E-05 | global batch size: 256 | lm loss: 4.524281E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.867 | TFLOPs: 11.82 | 7: iteration 97390/ 173500 | consumed samples: 24931840 | consumed tokens: 51060408320 | elapsed time per iteration (s): 0.08 | learning rate: 9.400E-05 | global batch size: 256 | lm loss: 4.518977E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.607 | TFLOPs: 11.82 | 7: iteration 97400/ 173500 | consumed samples: 24934400 | consumed tokens: 51065651200 | elapsed time per iteration (s): 0.09 | learning rate: 9.398E-05 | global batch size: 256 | lm loss: 4.519418E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.573 | TFLOPs: 10.18 | 7: iteration 97410/ 173500 | consumed samples: 24936960 | consumed tokens: 51070894080 | elapsed time per iteration (s): 0.10 | learning rate: 9.397E-05 | global batch size: 256 | lm loss: 4.517455E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.286 | TFLOPs: 9.47 | 7: iteration 97420/ 173500 | consumed samples: 24939520 | consumed tokens: 51076136960 | elapsed time per iteration (s): 0.08 | learning rate: 9.395E-05 | global batch size: 256 | lm loss: 4.524399E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.567 | TFLOPs: 11.98 | 7: iteration 97430/ 173500 | consumed samples: 24942080 | consumed tokens: 51081379840 | elapsed time per iteration (s): 0.08 | learning rate: 9.393E-05 | global batch size: 256 | lm loss: 4.517371E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.467 | TFLOPs: 11.32 | 7: iteration 97440/ 173500 | consumed samples: 24944640 | consumed tokens: 51086622720 | elapsed time per iteration (s): 0.10 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 4.517934E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2534.481 | TFLOPs: 9.43 | 7: iteration 97450/ 173500 | consumed samples: 24947200 | consumed tokens: 51091865600 | elapsed time per iteration (s): 0.10 | learning rate: 9.390E-05 | global batch size: 256 | lm loss: 4.530324E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.523 | TFLOPs: 9.23 | 7: iteration 97460/ 173500 | consumed samples: 24949760 | consumed tokens: 51097108480 | elapsed time per iteration (s): 0.08 | learning rate: 9.389E-05 | global batch size: 256 | lm loss: 4.530299E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.350 | TFLOPs: 12.02 | 7: iteration 97470/ 173500 | consumed samples: 24952320 | consumed tokens: 51102351360 | elapsed time per iteration (s): 0.08 | learning rate: 9.387E-05 | global batch size: 256 | lm loss: 4.520795E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.660 | TFLOPs: 12.01 | 7: iteration 97480/ 173500 | consumed samples: 24954880 | consumed tokens: 51107594240 | elapsed time per iteration (s): 0.08 | learning rate: 9.385E-05 | global batch size: 256 | lm loss: 4.518685E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.479 | TFLOPs: 11.75 | 7: iteration 97490/ 173500 | consumed samples: 24957440 | consumed tokens: 51112837120 | elapsed time per iteration (s): 0.08 | learning rate: 9.384E-05 | global batch size: 256 | lm loss: 4.523373E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.498 | TFLOPs: 12.03 | 7: iteration 97500/ 173500 | consumed samples: 24960000 | consumed tokens: 51118080000 | elapsed time per iteration (s): 0.08 | learning rate: 9.382E-05 | global batch size: 256 | lm loss: 4.513016E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.130 | TFLOPs: 11.63 | 7: iteration 97510/ 173500 | consumed samples: 24962560 | consumed tokens: 51123322880 | elapsed time per iteration (s): 0.08 | learning rate: 9.381E-05 | global batch size: 256 | lm loss: 4.521354E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.349 | TFLOPs: 11.97 | 7: iteration 97520/ 173500 | consumed samples: 24965120 | consumed tokens: 51128565760 | elapsed time per iteration (s): 0.08 | learning rate: 9.379E-05 | global batch size: 256 | lm loss: 4.509359E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.903 | TFLOPs: 11.60 | 7: iteration 97530/ 173500 | consumed samples: 24967680 | consumed tokens: 51133808640 | elapsed time per iteration (s): 0.08 | learning rate: 9.377E-05 | global batch size: 256 | lm loss: 4.523890E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.691 | TFLOPs: 11.99 | 7: iteration 97540/ 173500 | consumed samples: 24970240 | consumed tokens: 51139051520 | elapsed time per iteration (s): 0.09 | learning rate: 9.376E-05 | global batch size: 256 | lm loss: 4.517276E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.401 | TFLOPs: 10.72 | 7: iteration 97550/ 173500 | consumed samples: 24972800 | consumed tokens: 51144294400 | elapsed time per iteration (s): 0.09 | learning rate: 9.374E-05 | global batch size: 256 | lm loss: 4.510313E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.299 | TFLOPs: 10.28 | 7: iteration 97560/ 173500 | consumed samples: 24975360 | consumed tokens: 51149537280 | elapsed time per iteration (s): 0.09 | learning rate: 9.372E-05 | global batch size: 256 | lm loss: 4.514011E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.536 | TFLOPs: 10.61 | 7: iteration 97570/ 173500 | consumed samples: 24977920 | consumed tokens: 51154780160 | elapsed time per iteration (s): 0.08 | learning rate: 9.371E-05 | global batch size: 256 | lm loss: 4.521293E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.195 | TFLOPs: 11.85 | 7: iteration 97580/ 173500 | consumed samples: 24980480 | consumed tokens: 51160023040 | elapsed time per iteration (s): 0.08 | learning rate: 9.369E-05 | global batch size: 256 | lm loss: 4.521970E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.652 | TFLOPs: 12.04 | 7: iteration 97590/ 173500 | consumed samples: 24983040 | consumed tokens: 51165265920 | elapsed time per iteration (s): 0.08 | learning rate: 9.368E-05 | global batch size: 256 | lm loss: 4.510585E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.834 | TFLOPs: 12.03 | 7: iteration 97600/ 173500 | consumed samples: 24985600 | consumed tokens: 51170508800 | elapsed time per iteration (s): 0.08 | learning rate: 9.366E-05 | global batch size: 256 | lm loss: 4.526289E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.454 | TFLOPs: 11.92 | 7: iteration 97610/ 173500 | consumed samples: 24988160 | consumed tokens: 51175751680 | elapsed time per iteration (s): 0.08 | learning rate: 9.364E-05 | global batch size: 256 | lm loss: 4.531513E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.878 | TFLOPs: 11.24 | 7: iteration 97620/ 173500 | consumed samples: 24990720 | consumed tokens: 51180994560 | elapsed time per iteration (s): 0.10 | learning rate: 9.363E-05 | global batch size: 256 | lm loss: 4.522580E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2567.750 | TFLOPs: 9.55 | 7: iteration 97630/ 173500 | consumed samples: 24993280 | consumed tokens: 51186237440 | elapsed time per iteration (s): 0.08 | learning rate: 9.361E-05 | global batch size: 256 | lm loss: 4.523915E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.213 | TFLOPs: 11.27 | 7: iteration 97640/ 173500 | consumed samples: 24995840 | consumed tokens: 51191480320 | elapsed time per iteration (s): 0.09 | learning rate: 9.359E-05 | global batch size: 256 | lm loss: 4.525052E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.857 | TFLOPs: 10.35 | 7: iteration 97650/ 173500 | consumed samples: 24998400 | consumed tokens: 51196723200 | elapsed time per iteration (s): 0.08 | learning rate: 9.358E-05 | global batch size: 256 | lm loss: 4.526366E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.940 | TFLOPs: 12.03 | 7: iteration 97660/ 173500 | consumed samples: 25000960 | consumed tokens: 51201966080 | elapsed time per iteration (s): 0.09 | learning rate: 9.356E-05 | global batch size: 256 | lm loss: 4.522084E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.288 | TFLOPs: 11.11 | 7: iteration 97670/ 173500 | consumed samples: 25003520 | consumed tokens: 51207208960 | elapsed time per iteration (s): 0.08 | learning rate: 9.355E-05 | global batch size: 256 | lm loss: 4.519627E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.228 | TFLOPs: 11.99 | 7: iteration 97680/ 173500 | consumed samples: 25006080 | consumed tokens: 51212451840 | elapsed time per iteration (s): 0.08 | learning rate: 9.353E-05 | global batch size: 256 | lm loss: 4.511988E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.110 | TFLOPs: 11.91 | 7: iteration 97690/ 173500 | consumed samples: 25008640 | consumed tokens: 51217694720 | elapsed time per iteration (s): 0.08 | learning rate: 9.351E-05 | global batch size: 256 | lm loss: 4.537506E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.990 | TFLOPs: 11.90 | 7: iteration 97700/ 173500 | consumed samples: 25011200 | consumed tokens: 51222937600 | elapsed time per iteration (s): 0.08 | learning rate: 9.350E-05 | global batch size: 256 | lm loss: 4.520172E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.183 | TFLOPs: 11.67 | 7: iteration 97710/ 173500 | consumed samples: 25013760 | consumed tokens: 51228180480 | elapsed time per iteration (s): 0.08 | learning rate: 9.348E-05 | global batch size: 256 | lm loss: 4.523428E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.153 | TFLOPs: 11.84 | 7: iteration 97720/ 173500 | consumed samples: 25016320 | consumed tokens: 51233423360 | elapsed time per iteration (s): 0.08 | learning rate: 9.347E-05 | global batch size: 256 | lm loss: 4.528511E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.428 | TFLOPs: 11.90 | 7: iteration 97730/ 173500 | consumed samples: 25018880 | consumed tokens: 51238666240 | elapsed time per iteration (s): 0.09 | learning rate: 9.345E-05 | global batch size: 256 | lm loss: 4.530772E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.524 | TFLOPs: 10.50 | 7: iteration 97740/ 173500 | consumed samples: 25021440 | consumed tokens: 51243909120 | elapsed time per iteration (s): 0.08 | learning rate: 9.343E-05 | global batch size: 256 | lm loss: 4.531634E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.677 | TFLOPs: 11.30 | 7: iteration 97750/ 173500 | consumed samples: 25024000 | consumed tokens: 51249152000 | elapsed time per iteration (s): 0.08 | learning rate: 9.342E-05 | global batch size: 256 | lm loss: 4.522311E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.239 | TFLOPs: 12.04 | 7: iteration 97760/ 173500 | consumed samples: 25026560 | consumed tokens: 51254394880 | elapsed time per iteration (s): 0.08 | learning rate: 9.340E-05 | global batch size: 256 | lm loss: 4.516341E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.001 | TFLOPs: 11.98 | 7: iteration 97770/ 173500 | consumed samples: 25029120 | consumed tokens: 51259637760 | elapsed time per iteration (s): 0.08 | learning rate: 9.338E-05 | global batch size: 256 | lm loss: 4.521317E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.798 | TFLOPs: 11.91 | 7: iteration 97780/ 173500 | consumed samples: 25031680 | consumed tokens: 51264880640 | elapsed time per iteration (s): 0.09 | learning rate: 9.337E-05 | global batch size: 256 | lm loss: 4.527399E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.844 | TFLOPs: 10.40 | 7: iteration 97790/ 173500 | consumed samples: 25034240 | consumed tokens: 51270123520 | elapsed time per iteration (s): 0.09 | learning rate: 9.335E-05 | global batch size: 256 | lm loss: 4.518581E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.856 | TFLOPs: 10.99 | 7: iteration 97800/ 173500 | consumed samples: 25036800 | consumed tokens: 51275366400 | elapsed time per iteration (s): 0.08 | learning rate: 9.334E-05 | global batch size: 256 | lm loss: 4.518876E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.214 | TFLOPs: 11.43 | 7: iteration 97810/ 173500 | consumed samples: 25039360 | consumed tokens: 51280609280 | elapsed time per iteration (s): 0.08 | learning rate: 9.332E-05 | global batch size: 256 | lm loss: 4.520496E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.237 | TFLOPs: 11.91 | 7: iteration 97820/ 173500 | consumed samples: 25041920 | consumed tokens: 51285852160 | elapsed time per iteration (s): 0.08 | learning rate: 9.330E-05 | global batch size: 256 | lm loss: 4.507854E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.848 | TFLOPs: 11.91 | 7: iteration 97830/ 173500 | consumed samples: 25044480 | consumed tokens: 51291095040 | elapsed time per iteration (s): 0.08 | learning rate: 9.329E-05 | global batch size: 256 | lm loss: 4.513431E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.535 | TFLOPs: 11.88 | 7: iteration 97840/ 173500 | consumed samples: 25047040 | consumed tokens: 51296337920 | elapsed time per iteration (s): 0.09 | learning rate: 9.327E-05 | global batch size: 256 | lm loss: 4.523328E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.040 | TFLOPs: 11.10 | 7: iteration 97850/ 173500 | consumed samples: 25049600 | consumed tokens: 51301580800 | elapsed time per iteration (s): 0.08 | learning rate: 9.325E-05 | global batch size: 256 | lm loss: 4.527757E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.728 | TFLOPs: 11.85 | 7: iteration 97860/ 173500 | consumed samples: 25052160 | consumed tokens: 51306823680 | elapsed time per iteration (s): 0.08 | learning rate: 9.324E-05 | global batch size: 256 | lm loss: 4.519503E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.645 | TFLOPs: 11.93 | 7: iteration 97870/ 173500 | consumed samples: 25054720 | consumed tokens: 51312066560 | elapsed time per iteration (s): 0.08 | learning rate: 9.322E-05 | global batch size: 256 | lm loss: 4.513137E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.393 | TFLOPs: 11.22 | 7: iteration 97880/ 173500 | consumed samples: 25057280 | consumed tokens: 51317309440 | elapsed time per iteration (s): 0.08 | learning rate: 9.321E-05 | global batch size: 256 | lm loss: 4.533244E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.434 | TFLOPs: 11.86 | 7: iteration 97890/ 173500 | consumed samples: 25059840 | consumed tokens: 51322552320 | elapsed time per iteration (s): 0.08 | learning rate: 9.319E-05 | global batch size: 256 | lm loss: 4.518725E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.446 | TFLOPs: 11.81 | 7: iteration 97900/ 173500 | consumed samples: 25062400 | consumed tokens: 51327795200 | elapsed time per iteration (s): 0.08 | learning rate: 9.317E-05 | global batch size: 256 | lm loss: 4.505599E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.733 | TFLOPs: 11.89 | 7: iteration 97910/ 173500 | consumed samples: 25064960 | consumed tokens: 51333038080 | elapsed time per iteration (s): 0.09 | learning rate: 9.316E-05 | global batch size: 256 | lm loss: 4.524138E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2906.257 | TFLOPs: 10.81 | 7: iteration 97920/ 173500 | consumed samples: 25067520 | consumed tokens: 51338280960 | elapsed time per iteration (s): 0.08 | learning rate: 9.314E-05 | global batch size: 256 | lm loss: 4.513685E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.266 | TFLOPs: 11.87 | 7: iteration 97930/ 173500 | consumed samples: 25070080 | consumed tokens: 51343523840 | elapsed time per iteration (s): 0.09 | learning rate: 9.313E-05 | global batch size: 256 | lm loss: 4.522315E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.490 | TFLOPs: 10.84 | 7: iteration 97940/ 173500 | consumed samples: 25072640 | consumed tokens: 51348766720 | elapsed time per iteration (s): 0.08 | learning rate: 9.311E-05 | global batch size: 256 | lm loss: 4.513181E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.410 | TFLOPs: 11.34 | 7: iteration 97950/ 173500 | consumed samples: 25075200 | consumed tokens: 51354009600 | elapsed time per iteration (s): 0.08 | learning rate: 9.309E-05 | global batch size: 256 | lm loss: 4.518377E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.998 | TFLOPs: 11.62 | 7: iteration 97960/ 173500 | consumed samples: 25077760 | consumed tokens: 51359252480 | elapsed time per iteration (s): 0.08 | learning rate: 9.308E-05 | global batch size: 256 | lm loss: 4.520683E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.703 | TFLOPs: 11.73 | 7: iteration 97970/ 173500 | consumed samples: 25080320 | consumed tokens: 51364495360 | elapsed time per iteration (s): 0.09 | learning rate: 9.306E-05 | global batch size: 256 | lm loss: 4.514511E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2894.378 | TFLOPs: 10.77 | 7: iteration 97980/ 173500 | consumed samples: 25082880 | consumed tokens: 51369738240 | elapsed time per iteration (s): 0.09 | learning rate: 9.304E-05 | global batch size: 256 | lm loss: 4.524403E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.348 | TFLOPs: 11.19 | 7: iteration 97990/ 173500 | consumed samples: 25085440 | consumed tokens: 51374981120 | elapsed time per iteration (s): 0.09 | learning rate: 9.303E-05 | global batch size: 256 | lm loss: 4.520691E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2793.284 | TFLOPs: 10.39 | 0: [2023-03-17 02:36:53,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=0, lr=[9.301234885879047e-05, 9.301234885879047e-05, 9.301234885879047e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 98000/ 173500 | consumed samples: 25088000 | consumed tokens: 51380224000 | elapsed time per iteration (s): 0.08 | learning rate: 9.301E-05 | global batch size: 256 | lm loss: 4.519709E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.704 | TFLOPs: 11.64 | 0: steps: 98000 loss: 4.5367 iter time (s): 0.083 samples/sec: 3066.407 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 98000 | lm loss value: 4.369790E+00 | lm loss PPL: 7.902704E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 98000 to checkpoints_14m91b100m 0: [2023-03-17 02:36:53,977] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step98000 is begin to save! 0: [2023-03-17 02:36:53,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:36:54,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:36:54,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:36:54,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:36:54,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:36:54,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:36:54,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:36:54,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:36:54,018] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:36:54,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:36:54,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:36:54,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:36:54,022] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step98000/mp_rank_00_model_states.pt 0: [2023-03-17 02:36:54,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:36:54,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:36:54,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:36:54,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2023-03-17 02:36:54,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2023-03-17 02:36:54,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:36:54,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:36:54,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:36:54,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:36:54,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: successfully saved checkpoint at iteration 98000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.58 7: iteration 98010/ 173500 | consumed samples: 25090560 | consumed tokens: 51385466880 | elapsed time per iteration (s): 0.10 | learning rate: 9.300E-05 | global batch size: 256 | lm loss: 4.505625E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.828 | TFLOPs: 9.94 | 7: iteration 98020/ 173500 | consumed samples: 25093120 | consumed tokens: 51390709760 | elapsed time per iteration (s): 0.08 | learning rate: 9.298E-05 | global batch size: 256 | lm loss: 4.529518E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.023 | TFLOPs: 11.97 | 7: iteration 98030/ 173500 | consumed samples: 25095680 | consumed tokens: 51395952640 | elapsed time per iteration (s): 0.08 | learning rate: 9.296E-05 | global batch size: 256 | lm loss: 4.524345E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.813 | TFLOPs: 11.97 | 7: iteration 98040/ 173500 | consumed samples: 25098240 | consumed tokens: 51401195520 | elapsed time per iteration (s): 0.08 | learning rate: 9.295E-05 | global batch size: 256 | lm loss: 4.521877E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.351 | TFLOPs: 11.95 | 7: iteration 98050/ 173500 | consumed samples: 25100800 | consumed tokens: 51406438400 | elapsed time per iteration (s): 0.09 | learning rate: 9.293E-05 | global batch size: 256 | lm loss: 4.524382E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.540 | TFLOPs: 10.47 | 7: iteration 98060/ 173500 | consumed samples: 25103360 | consumed tokens: 51411681280 | elapsed time per iteration (s): 0.09 | learning rate: 9.292E-05 | global batch size: 256 | lm loss: 4.523236E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2787.570 | TFLOPs: 10.37 | 7: iteration 98070/ 173500 | consumed samples: 25105920 | consumed tokens: 51416924160 | elapsed time per iteration (s): 0.08 | learning rate: 9.290E-05 | global batch size: 256 | lm loss: 4.533518E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.426 | TFLOPs: 11.87 | 7: iteration 98080/ 173500 | consumed samples: 25108480 | consumed tokens: 51422167040 | elapsed time per iteration (s): 0.08 | learning rate: 9.288E-05 | global batch size: 256 | lm loss: 4.514479E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.464 | TFLOPs: 11.86 | 7: iteration 98090/ 173500 | consumed samples: 25111040 | consumed tokens: 51427409920 | elapsed time per iteration (s): 0.08 | learning rate: 9.287E-05 | global batch size: 256 | lm loss: 4.518734E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.484 | TFLOPs: 11.92 | 7: iteration 98100/ 173500 | consumed samples: 25113600 | consumed tokens: 51432652800 | elapsed time per iteration (s): 0.08 | learning rate: 9.285E-05 | global batch size: 256 | lm loss: 4.518629E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.249 | TFLOPs: 11.94 | 7: iteration 98110/ 173500 | consumed samples: 25116160 | consumed tokens: 51437895680 | elapsed time per iteration (s): 0.09 | learning rate: 9.283E-05 | global batch size: 256 | lm loss: 4.511305E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.996 | TFLOPs: 11.03 | 7: iteration 98120/ 173500 | consumed samples: 25118720 | consumed tokens: 51443138560 | elapsed time per iteration (s): 0.10 | learning rate: 9.282E-05 | global batch size: 256 | lm loss: 4.524458E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.021 | TFLOPs: 9.23 | 7: iteration 98130/ 173500 | consumed samples: 25121280 | consumed tokens: 51448381440 | elapsed time per iteration (s): 0.09 | learning rate: 9.280E-05 | global batch size: 256 | lm loss: 4.513819E+00 | grad norm: 0.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.440 | TFLOPs: 10.36 | 7: iteration 98140/ 173500 | consumed samples: 25123840 | consumed tokens: 51453624320 | elapsed time per iteration (s): 0.12 | learning rate: 9.279E-05 | global batch size: 256 | lm loss: 4.513294E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2076.531 | TFLOPs: 7.72 | 7: iteration 98150/ 173500 | consumed samples: 25126400 | consumed tokens: 51458867200 | elapsed time per iteration (s): 0.08 | learning rate: 9.277E-05 | global batch size: 256 | lm loss: 4.520362E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.773 | TFLOPs: 11.49 | 7: iteration 98160/ 173500 | consumed samples: 25128960 | consumed tokens: 51464110080 | elapsed time per iteration (s): 0.08 | learning rate: 9.275E-05 | global batch size: 256 | lm loss: 4.523950E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.303 | TFLOPs: 11.88 | 7: iteration 98170/ 173500 | consumed samples: 25131520 | consumed tokens: 51469352960 | elapsed time per iteration (s): 0.09 | learning rate: 9.274E-05 | global batch size: 256 | lm loss: 4.516621E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2916.525 | TFLOPs: 10.85 | 7: iteration 98180/ 173500 | consumed samples: 25134080 | consumed tokens: 51474595840 | elapsed time per iteration (s): 0.08 | learning rate: 9.272E-05 | global batch size: 256 | lm loss: 4.517403E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.896 | TFLOPs: 11.88 | 7: iteration 98190/ 173500 | consumed samples: 25136640 | consumed tokens: 51479838720 | elapsed time per iteration (s): 0.08 | learning rate: 9.271E-05 | global batch size: 256 | lm loss: 4.515577E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.401 | TFLOPs: 11.35 | 7: iteration 98200/ 173500 | consumed samples: 25139200 | consumed tokens: 51485081600 | elapsed time per iteration (s): 0.10 | learning rate: 9.269E-05 | global batch size: 256 | lm loss: 4.520026E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.634 | TFLOPs: 10.00 | 7: iteration 98210/ 173500 | consumed samples: 25141760 | consumed tokens: 51490324480 | elapsed time per iteration (s): 0.08 | learning rate: 9.267E-05 | global batch size: 256 | lm loss: 4.517884E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.288 | TFLOPs: 11.83 | 7: iteration 98220/ 173500 | consumed samples: 25144320 | consumed tokens: 51495567360 | elapsed time per iteration (s): 0.08 | learning rate: 9.266E-05 | global batch size: 256 | lm loss: 4.521054E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.883 | TFLOPs: 11.94 | 7: iteration 98230/ 173500 | consumed samples: 25146880 | consumed tokens: 51500810240 | elapsed time per iteration (s): 0.08 | learning rate: 9.264E-05 | global batch size: 256 | lm loss: 4.522268E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.414 | TFLOPs: 11.78 | 7: iteration 98240/ 173500 | consumed samples: 25149440 | consumed tokens: 51506053120 | elapsed time per iteration (s): 0.09 | learning rate: 9.262E-05 | global batch size: 256 | lm loss: 4.513670E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.447 | TFLOPs: 10.72 | 7: iteration 98250/ 173500 | consumed samples: 25152000 | consumed tokens: 51511296000 | elapsed time per iteration (s): 0.09 | learning rate: 9.261E-05 | global batch size: 256 | lm loss: 4.524064E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.629 | TFLOPs: 10.45 | 7: iteration 98260/ 173500 | consumed samples: 25154560 | consumed tokens: 51516538880 | elapsed time per iteration (s): 0.08 | learning rate: 9.259E-05 | global batch size: 256 | lm loss: 4.509223E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.124 | TFLOPs: 11.88 | 7: iteration 98270/ 173500 | consumed samples: 25157120 | consumed tokens: 51521781760 | elapsed time per iteration (s): 0.08 | learning rate: 9.258E-05 | global batch size: 256 | lm loss: 4.526184E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.640 | TFLOPs: 11.89 | 7: iteration 98280/ 173500 | consumed samples: 25159680 | consumed tokens: 51527024640 | elapsed time per iteration (s): 0.08 | learning rate: 9.256E-05 | global batch size: 256 | lm loss: 4.526519E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.275 | TFLOPs: 11.96 | 7: iteration 98290/ 173500 | consumed samples: 25162240 | consumed tokens: 51532267520 | elapsed time per iteration (s): 0.08 | learning rate: 9.254E-05 | global batch size: 256 | lm loss: 4.516259E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.038 | TFLOPs: 11.89 | 7: iteration 98300/ 173500 | consumed samples: 25164800 | consumed tokens: 51537510400 | elapsed time per iteration (s): 0.08 | learning rate: 9.253E-05 | global batch size: 256 | lm loss: 4.518227E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.352 | TFLOPs: 11.28 | 7: iteration 98310/ 173500 | consumed samples: 25167360 | consumed tokens: 51542753280 | elapsed time per iteration (s): 0.09 | learning rate: 9.251E-05 | global batch size: 256 | lm loss: 4.527024E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2766.877 | TFLOPs: 10.29 | 7: iteration 98320/ 173500 | consumed samples: 25169920 | consumed tokens: 51547996160 | elapsed time per iteration (s): 0.08 | learning rate: 9.250E-05 | global batch size: 256 | lm loss: 4.517977E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.103 | TFLOPs: 11.85 | 7: iteration 98330/ 173500 | consumed samples: 25172480 | consumed tokens: 51553239040 | elapsed time per iteration (s): 0.09 | learning rate: 9.248E-05 | global batch size: 256 | lm loss: 4.515310E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.771 | TFLOPs: 10.39 | 7: iteration 98340/ 173500 | consumed samples: 25175040 | consumed tokens: 51558481920 | elapsed time per iteration (s): 0.08 | learning rate: 9.246E-05 | global batch size: 256 | lm loss: 4.518947E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.572 | TFLOPs: 11.56 | 7: iteration 98350/ 173500 | consumed samples: 25177600 | consumed tokens: 51563724800 | elapsed time per iteration (s): 0.08 | learning rate: 9.245E-05 | global batch size: 256 | lm loss: 4.519219E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.621 | TFLOPs: 11.90 | 7: iteration 98360/ 173500 | consumed samples: 25180160 | consumed tokens: 51568967680 | elapsed time per iteration (s): 0.08 | learning rate: 9.243E-05 | global batch size: 256 | lm loss: 4.529050E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.432 | TFLOPs: 11.91 | 7: iteration 98370/ 173500 | consumed samples: 25182720 | consumed tokens: 51574210560 | elapsed time per iteration (s): 0.09 | learning rate: 9.241E-05 | global batch size: 256 | lm loss: 4.514532E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.874 | TFLOPs: 11.00 | 7: iteration 98380/ 173500 | consumed samples: 25185280 | consumed tokens: 51579453440 | elapsed time per iteration (s): 0.09 | learning rate: 9.240E-05 | global batch size: 256 | lm loss: 4.511630E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.687 | TFLOPs: 10.74 | 7: iteration 98390/ 173500 | consumed samples: 25187840 | consumed tokens: 51584696320 | elapsed time per iteration (s): 0.08 | learning rate: 9.238E-05 | global batch size: 256 | lm loss: 4.524077E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.982 | TFLOPs: 11.92 | 7: iteration 98400/ 173500 | consumed samples: 25190400 | consumed tokens: 51589939200 | elapsed time per iteration (s): 0.08 | learning rate: 9.237E-05 | global batch size: 256 | lm loss: 4.522733E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.952 | TFLOPs: 11.97 | 7: iteration 98410/ 173500 | consumed samples: 25192960 | consumed tokens: 51595182080 | elapsed time per iteration (s): 0.08 | learning rate: 9.235E-05 | global batch size: 256 | lm loss: 4.521448E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.360 | TFLOPs: 11.89 | 7: iteration 98420/ 173500 | consumed samples: 25195520 | consumed tokens: 51600424960 | elapsed time per iteration (s): 0.08 | learning rate: 9.233E-05 | global batch size: 256 | lm loss: 4.521339E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.856 | TFLOPs: 11.66 | 7: iteration 98430/ 173500 | consumed samples: 25198080 | consumed tokens: 51605667840 | elapsed time per iteration (s): 0.10 | learning rate: 9.232E-05 | global batch size: 256 | lm loss: 4.528053E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.105 | TFLOPs: 9.56 | 7: iteration 98440/ 173500 | consumed samples: 25200640 | consumed tokens: 51610910720 | elapsed time per iteration (s): 0.08 | learning rate: 9.230E-05 | global batch size: 256 | lm loss: 4.511449E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.587 | TFLOPs: 11.38 | 7: iteration 98450/ 173500 | consumed samples: 25203200 | consumed tokens: 51616153600 | elapsed time per iteration (s): 0.08 | learning rate: 9.229E-05 | global batch size: 256 | lm loss: 4.508413E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.330 | TFLOPs: 11.99 | 7: iteration 98460/ 173500 | consumed samples: 25205760 | consumed tokens: 51621396480 | elapsed time per iteration (s): 0.08 | learning rate: 9.227E-05 | global batch size: 256 | lm loss: 4.527064E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.980 | TFLOPs: 11.55 | 7: iteration 98470/ 173500 | consumed samples: 25208320 | consumed tokens: 51626639360 | elapsed time per iteration (s): 0.08 | learning rate: 9.225E-05 | global batch size: 256 | lm loss: 4.536789E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.619 | TFLOPs: 11.21 | 7: iteration 98480/ 173500 | consumed samples: 25210880 | consumed tokens: 51631882240 | elapsed time per iteration (s): 0.08 | learning rate: 9.224E-05 | global batch size: 256 | lm loss: 4.521579E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.637 | TFLOPs: 11.90 | 7: iteration 98490/ 173500 | consumed samples: 25213440 | consumed tokens: 51637125120 | elapsed time per iteration (s): 0.08 | learning rate: 9.222E-05 | global batch size: 256 | lm loss: 4.518101E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.765 | TFLOPs: 11.91 | 7: iteration 98500/ 173500 | consumed samples: 25216000 | consumed tokens: 51642368000 | elapsed time per iteration (s): 0.08 | learning rate: 9.220E-05 | global batch size: 256 | lm loss: 4.518481E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.129 | TFLOPs: 11.91 | 7: iteration 98510/ 173500 | consumed samples: 25218560 | consumed tokens: 51647610880 | elapsed time per iteration (s): 0.08 | learning rate: 9.219E-05 | global batch size: 256 | lm loss: 4.508357E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.468 | TFLOPs: 11.78 | 7: iteration 98520/ 173500 | consumed samples: 25221120 | consumed tokens: 51652853760 | elapsed time per iteration (s): 0.09 | learning rate: 9.217E-05 | global batch size: 256 | lm loss: 4.515434E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.753 | TFLOPs: 10.53 | 7: iteration 98530/ 173500 | consumed samples: 25223680 | consumed tokens: 51658096640 | elapsed time per iteration (s): 0.09 | learning rate: 9.216E-05 | global batch size: 256 | lm loss: 4.519257E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.714 | TFLOPs: 10.50 | 7: iteration 98540/ 173500 | consumed samples: 25226240 | consumed tokens: 51663339520 | elapsed time per iteration (s): 0.08 | learning rate: 9.214E-05 | global batch size: 256 | lm loss: 4.527976E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.729 | TFLOPs: 11.89 | 7: iteration 98550/ 173500 | consumed samples: 25228800 | consumed tokens: 51668582400 | elapsed time per iteration (s): 0.08 | learning rate: 9.212E-05 | global batch size: 256 | lm loss: 4.521460E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.701 | TFLOPs: 11.91 | 7: iteration 98560/ 173500 | consumed samples: 25231360 | consumed tokens: 51673825280 | elapsed time per iteration (s): 0.08 | learning rate: 9.211E-05 | global batch size: 256 | lm loss: 4.521587E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.625 | TFLOPs: 11.88 | 7: iteration 98570/ 173500 | consumed samples: 25233920 | consumed tokens: 51679068160 | elapsed time per iteration (s): 0.08 | learning rate: 9.209E-05 | global batch size: 256 | lm loss: 4.519747E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.127 | TFLOPs: 11.88 | 7: iteration 98580/ 173500 | consumed samples: 25236480 | consumed tokens: 51684311040 | elapsed time per iteration (s): 0.08 | learning rate: 9.208E-05 | global batch size: 256 | lm loss: 4.523375E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.615 | TFLOPs: 11.88 | 7: iteration 98590/ 173500 | consumed samples: 25239040 | consumed tokens: 51689553920 | elapsed time per iteration (s): 0.09 | learning rate: 9.206E-05 | global batch size: 256 | lm loss: 4.518752E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.392 | TFLOPs: 10.74 | 7: iteration 98600/ 173500 | consumed samples: 25241600 | consumed tokens: 51694796800 | elapsed time per iteration (s): 0.09 | learning rate: 9.204E-05 | global batch size: 256 | lm loss: 4.532125E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.192 | TFLOPs: 10.68 | 7: iteration 98610/ 173500 | consumed samples: 25244160 | consumed tokens: 51700039680 | elapsed time per iteration (s): 0.08 | learning rate: 9.203E-05 | global batch size: 256 | lm loss: 4.520071E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.213 | TFLOPs: 11.88 | 7: iteration 98620/ 173500 | consumed samples: 25246720 | consumed tokens: 51705282560 | elapsed time per iteration (s): 0.08 | learning rate: 9.201E-05 | global batch size: 256 | lm loss: 4.518454E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.126 | TFLOPs: 11.87 | 7: iteration 98630/ 173500 | consumed samples: 25249280 | consumed tokens: 51710525440 | elapsed time per iteration (s): 0.08 | learning rate: 9.200E-05 | global batch size: 256 | lm loss: 4.536612E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.851 | TFLOPs: 11.88 | 7: iteration 98640/ 173500 | consumed samples: 25251840 | consumed tokens: 51715768320 | elapsed time per iteration (s): 0.08 | learning rate: 9.198E-05 | global batch size: 256 | lm loss: 4.522749E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.714 | TFLOPs: 11.69 | 7: iteration 98650/ 173500 | consumed samples: 25254400 | consumed tokens: 51721011200 | elapsed time per iteration (s): 0.08 | learning rate: 9.196E-05 | global batch size: 256 | lm loss: 4.519799E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.849 | TFLOPs: 11.50 | 7: iteration 98660/ 173500 | consumed samples: 25256960 | consumed tokens: 51726254080 | elapsed time per iteration (s): 0.08 | learning rate: 9.195E-05 | global batch size: 256 | lm loss: 4.520064E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.725 | TFLOPs: 11.91 | 7: iteration 98670/ 173500 | consumed samples: 25259520 | consumed tokens: 51731496960 | elapsed time per iteration (s): 0.08 | learning rate: 9.193E-05 | global batch size: 256 | lm loss: 4.517329E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.580 | TFLOPs: 11.28 | 7: iteration 98680/ 173500 | consumed samples: 25262080 | consumed tokens: 51736739840 | elapsed time per iteration (s): 0.08 | learning rate: 9.191E-05 | global batch size: 256 | lm loss: 4.534837E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.869 | TFLOPs: 11.27 | 7: iteration 98690/ 173500 | consumed samples: 25264640 | consumed tokens: 51741982720 | elapsed time per iteration (s): 0.10 | learning rate: 9.190E-05 | global batch size: 256 | lm loss: 4.513313E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2626.179 | TFLOPs: 9.77 | 7: iteration 98700/ 173500 | consumed samples: 25267200 | consumed tokens: 51747225600 | elapsed time per iteration (s): 0.08 | learning rate: 9.188E-05 | global batch size: 256 | lm loss: 4.518089E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.793 | TFLOPs: 11.89 | 7: iteration 98710/ 173500 | consumed samples: 25269760 | consumed tokens: 51752468480 | elapsed time per iteration (s): 0.08 | learning rate: 9.187E-05 | global batch size: 256 | lm loss: 4.507969E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.182 | TFLOPs: 11.85 | 7: iteration 98720/ 173500 | consumed samples: 25272320 | consumed tokens: 51757711360 | elapsed time per iteration (s): 0.23 | learning rate: 9.185E-05 | global batch size: 256 | lm loss: 4.521355E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1128.971 | TFLOPs: 4.20 | 7: iteration 98730/ 173500 | consumed samples: 25274880 | consumed tokens: 51762954240 | elapsed time per iteration (s): 0.09 | learning rate: 9.183E-05 | global batch size: 256 | lm loss: 4.532490E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.047 | TFLOPs: 11.02 | 7: iteration 98740/ 173500 | consumed samples: 25277440 | consumed tokens: 51768197120 | elapsed time per iteration (s): 0.08 | learning rate: 9.182E-05 | global batch size: 256 | lm loss: 4.521524E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.134 | TFLOPs: 11.86 | 7: iteration 98750/ 173500 | consumed samples: 25280000 | consumed tokens: 51773440000 | elapsed time per iteration (s): 0.08 | learning rate: 9.180E-05 | global batch size: 256 | lm loss: 4.519987E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.034 | TFLOPs: 11.85 | 7: iteration 98760/ 173500 | consumed samples: 25282560 | consumed tokens: 51778682880 | elapsed time per iteration (s): 0.08 | learning rate: 9.179E-05 | global batch size: 256 | lm loss: 4.536771E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.651 | TFLOPs: 11.83 | 7: iteration 98770/ 173500 | consumed samples: 25285120 | consumed tokens: 51783925760 | elapsed time per iteration (s): 0.09 | learning rate: 9.177E-05 | global batch size: 256 | lm loss: 4.517924E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.381 | TFLOPs: 10.03 | 7: iteration 98780/ 173500 | consumed samples: 25287680 | consumed tokens: 51789168640 | elapsed time per iteration (s): 0.08 | learning rate: 9.175E-05 | global batch size: 256 | lm loss: 4.524193E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.082 | TFLOPs: 11.80 | 7: iteration 98790/ 173500 | consumed samples: 25290240 | consumed tokens: 51794411520 | elapsed time per iteration (s): 0.08 | learning rate: 9.174E-05 | global batch size: 256 | lm loss: 4.527343E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.817 | TFLOPs: 11.82 | 7: iteration 98800/ 173500 | consumed samples: 25292800 | consumed tokens: 51799654400 | elapsed time per iteration (s): 0.08 | learning rate: 9.172E-05 | global batch size: 256 | lm loss: 4.526567E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.193 | TFLOPs: 11.88 | 7: iteration 98810/ 173500 | consumed samples: 25295360 | consumed tokens: 51804897280 | elapsed time per iteration (s): 0.08 | learning rate: 9.170E-05 | global batch size: 256 | lm loss: 4.504697E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.015 | TFLOPs: 11.86 | 7: iteration 98820/ 173500 | consumed samples: 25297920 | consumed tokens: 51810140160 | elapsed time per iteration (s): 0.08 | learning rate: 9.169E-05 | global batch size: 256 | lm loss: 4.526970E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.253 | TFLOPs: 11.86 | 7: iteration 98830/ 173500 | consumed samples: 25300480 | consumed tokens: 51815383040 | elapsed time per iteration (s): 0.08 | learning rate: 9.167E-05 | global batch size: 256 | lm loss: 4.527132E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.633 | TFLOPs: 11.80 | 7: iteration 98840/ 173500 | consumed samples: 25303040 | consumed tokens: 51820625920 | elapsed time per iteration (s): 0.09 | learning rate: 9.166E-05 | global batch size: 256 | lm loss: 4.533907E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.251 | TFLOPs: 11.11 | 7: iteration 98850/ 173500 | consumed samples: 25305600 | consumed tokens: 51825868800 | elapsed time per iteration (s): 0.09 | learning rate: 9.164E-05 | global batch size: 256 | lm loss: 4.532308E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.328 | TFLOPs: 11.14 | 7: iteration 98860/ 173500 | consumed samples: 25308160 | consumed tokens: 51831111680 | elapsed time per iteration (s): 0.10 | learning rate: 9.162E-05 | global batch size: 256 | lm loss: 4.517628E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.818 | TFLOPs: 9.33 | 7: iteration 98870/ 173500 | consumed samples: 25310720 | consumed tokens: 51836354560 | elapsed time per iteration (s): 0.08 | learning rate: 9.161E-05 | global batch size: 256 | lm loss: 4.515520E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.295 | TFLOPs: 11.77 | 7: iteration 98880/ 173500 | consumed samples: 25313280 | consumed tokens: 51841597440 | elapsed time per iteration (s): 0.08 | learning rate: 9.159E-05 | global batch size: 256 | lm loss: 4.510984E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.018 | TFLOPs: 11.86 | 7: iteration 98890/ 173500 | consumed samples: 25315840 | consumed tokens: 51846840320 | elapsed time per iteration (s): 0.10 | learning rate: 9.158E-05 | global batch size: 256 | lm loss: 4.525316E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2454.070 | TFLOPs: 9.13 | 7: iteration 98900/ 173500 | consumed samples: 25318400 | consumed tokens: 51852083200 | elapsed time per iteration (s): 0.08 | learning rate: 9.156E-05 | global batch size: 256 | lm loss: 4.518158E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.703 | TFLOPs: 11.86 | 7: iteration 98910/ 173500 | consumed samples: 25320960 | consumed tokens: 51857326080 | elapsed time per iteration (s): 0.09 | learning rate: 9.154E-05 | global batch size: 256 | lm loss: 4.510428E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.497 | TFLOPs: 11.12 | 7: iteration 98920/ 173500 | consumed samples: 25323520 | consumed tokens: 51862568960 | elapsed time per iteration (s): 0.09 | learning rate: 9.153E-05 | global batch size: 256 | lm loss: 4.535449E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.053 | TFLOPs: 11.18 | 7: iteration 98930/ 173500 | consumed samples: 25326080 | consumed tokens: 51867811840 | elapsed time per iteration (s): 0.08 | learning rate: 9.151E-05 | global batch size: 256 | lm loss: 4.522405E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.605 | TFLOPs: 11.87 | 7: iteration 98940/ 173500 | consumed samples: 25328640 | consumed tokens: 51873054720 | elapsed time per iteration (s): 0.08 | learning rate: 9.150E-05 | global batch size: 256 | lm loss: 4.521531E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.536 | TFLOPs: 11.86 | 7: iteration 98950/ 173500 | consumed samples: 25331200 | consumed tokens: 51878297600 | elapsed time per iteration (s): 0.08 | learning rate: 9.148E-05 | global batch size: 256 | lm loss: 4.521899E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.439 | TFLOPs: 11.88 | 7: iteration 98960/ 173500 | consumed samples: 25333760 | consumed tokens: 51883540480 | elapsed time per iteration (s): 0.08 | learning rate: 9.146E-05 | global batch size: 256 | lm loss: 4.519912E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.151 | TFLOPs: 11.88 | 7: iteration 98970/ 173500 | consumed samples: 25336320 | consumed tokens: 51888783360 | elapsed time per iteration (s): 0.08 | learning rate: 9.145E-05 | global batch size: 256 | lm loss: 4.521431E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.751 | TFLOPs: 11.83 | 7: iteration 98980/ 173500 | consumed samples: 25338880 | consumed tokens: 51894026240 | elapsed time per iteration (s): 0.10 | learning rate: 9.143E-05 | global batch size: 256 | lm loss: 4.521734E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2590.936 | TFLOPs: 9.64 | 7: iteration 98990/ 173500 | consumed samples: 25341440 | consumed tokens: 51899269120 | elapsed time per iteration (s): 0.09 | learning rate: 9.141E-05 | global batch size: 256 | lm loss: 4.519875E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2856.831 | TFLOPs: 10.63 | 7: iteration 99000/ 173500 | consumed samples: 25344000 | consumed tokens: 51904512000 | elapsed time per iteration (s): 0.08 | learning rate: 9.140E-05 | global batch size: 256 | lm loss: 4.517615E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.367 | TFLOPs: 11.84 | 7: ------------------------------------------------------------------------------------------------ 7: validation loss at iteration 99000 | lm loss value: 4.433365E+00 | lm loss PPL: 8.421435E+01 | 7: ------------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 99000 to checkpoints_14m91b100m 0: [2023-03-17 02:38:19,909] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step99000 is begin to save! 0: [2023-03-17 02:38:19,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:38:19,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:38:19,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:38:19,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:38:19,942] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:38:19,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:38:19,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:38:19,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:38:19,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:38:19,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:38:19,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:38:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:38:19,952] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step99000/mp_rank_00_model_states.pt 0: [2023-03-17 02:38:19,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:38:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:38:19,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:38:19,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:38:19,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:38:19,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:38:19,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 02:38:19,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2023-03-17 02:38:19,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2023-03-17 02:38:19,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:38:19,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:38:19,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2023-03-17 02:38:19,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:38:19,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:38:19,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2023-03-17 02:38:19,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:38:19,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:38:19,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: successfully saved checkpoint at iteration 99000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.87 7: iteration 99010/ 173500 | consumed samples: 25346560 | consumed tokens: 51909754880 | elapsed time per iteration (s): 0.09 | learning rate: 9.138E-05 | global batch size: 256 | lm loss: 4.521637E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.829 | TFLOPs: 10.36 | 7: iteration 99020/ 173500 | consumed samples: 25349120 | consumed tokens: 51914997760 | elapsed time per iteration (s): 0.08 | learning rate: 9.137E-05 | global batch size: 256 | lm loss: 4.522099E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.913 | TFLOPs: 11.83 | 7: iteration 99030/ 173500 | consumed samples: 25351680 | consumed tokens: 51920240640 | elapsed time per iteration (s): 0.08 | learning rate: 9.135E-05 | global batch size: 256 | lm loss: 4.526702E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.921 | TFLOPs: 11.30 | 7: iteration 99040/ 173500 | consumed samples: 25354240 | consumed tokens: 51925483520 | elapsed time per iteration (s): 0.08 | learning rate: 9.133E-05 | global batch size: 256 | lm loss: 4.509691E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.588 | TFLOPs: 11.23 | 7: iteration 99050/ 173500 | consumed samples: 25356800 | consumed tokens: 51930726400 | elapsed time per iteration (s): 0.08 | learning rate: 9.132E-05 | global batch size: 256 | lm loss: 4.512152E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.328 | TFLOPs: 11.23 | 7: iteration 99060/ 173500 | consumed samples: 25359360 | consumed tokens: 51935969280 | elapsed time per iteration (s): 0.09 | learning rate: 9.130E-05 | global batch size: 256 | lm loss: 4.509173E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.580 | TFLOPs: 10.04 | 7: iteration 99070/ 173500 | consumed samples: 25361920 | consumed tokens: 51941212160 | elapsed time per iteration (s): 0.10 | learning rate: 9.129E-05 | global batch size: 256 | lm loss: 4.519484E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2487.948 | TFLOPs: 9.25 | 7: iteration 99080/ 173500 | consumed samples: 25364480 | consumed tokens: 51946455040 | elapsed time per iteration (s): 0.08 | learning rate: 9.127E-05 | global batch size: 256 | lm loss: 4.519862E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.073 | TFLOPs: 11.98 | 7: iteration 99090/ 173500 | consumed samples: 25367040 | consumed tokens: 51951697920 | elapsed time per iteration (s): 0.08 | learning rate: 9.125E-05 | global batch size: 256 | lm loss: 4.516463E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.316 | TFLOPs: 11.96 | 7: iteration 99100/ 173500 | consumed samples: 25369600 | consumed tokens: 51956940800 | elapsed time per iteration (s): 0.08 | learning rate: 9.124E-05 | global batch size: 256 | lm loss: 4.505985E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.861 | TFLOPs: 11.56 | 7: iteration 99110/ 173500 | consumed samples: 25372160 | consumed tokens: 51962183680 | elapsed time per iteration (s): 0.09 | learning rate: 9.122E-05 | global batch size: 256 | lm loss: 4.508780E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.582 | TFLOPs: 10.54 | 7: iteration 99120/ 173500 | consumed samples: 25374720 | consumed tokens: 51967426560 | elapsed time per iteration (s): 0.11 | learning rate: 9.121E-05 | global batch size: 256 | lm loss: 4.518530E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2368.659 | TFLOPs: 8.81 | 7: iteration 99130/ 173500 | consumed samples: 25377280 | consumed tokens: 51972669440 | elapsed time per iteration (s): 0.08 | learning rate: 9.119E-05 | global batch size: 256 | lm loss: 4.528646E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.822 | TFLOPs: 12.03 | 7: iteration 99140/ 173500 | consumed samples: 25379840 | consumed tokens: 51977912320 | elapsed time per iteration (s): 0.08 | learning rate: 9.117E-05 | global batch size: 256 | lm loss: 4.520897E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.029 | TFLOPs: 11.55 | 7: iteration 99150/ 173500 | consumed samples: 25382400 | consumed tokens: 51983155200 | elapsed time per iteration (s): 0.08 | learning rate: 9.116E-05 | global batch size: 256 | lm loss: 4.520516E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.538 | TFLOPs: 11.83 | 7: iteration 99160/ 173500 | consumed samples: 25384960 | consumed tokens: 51988398080 | elapsed time per iteration (s): 0.08 | learning rate: 9.114E-05 | global batch size: 256 | lm loss: 4.530976E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.168 | TFLOPs: 11.61 | 7: iteration 99170/ 173500 | consumed samples: 25387520 | consumed tokens: 51993640960 | elapsed time per iteration (s): 0.09 | learning rate: 9.113E-05 | global batch size: 256 | lm loss: 4.521777E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.529 | TFLOPs: 10.15 | 7: iteration 99180/ 173500 | consumed samples: 25390080 | consumed tokens: 51998883840 | elapsed time per iteration (s): 0.08 | learning rate: 9.111E-05 | global batch size: 256 | lm loss: 4.520736E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.610 | TFLOPs: 11.97 | 7: iteration 99190/ 173500 | consumed samples: 25392640 | consumed tokens: 52004126720 | elapsed time per iteration (s): 0.08 | learning rate: 9.109E-05 | global batch size: 256 | lm loss: 4.517323E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.872 | TFLOPs: 11.97 | 7: iteration 99200/ 173500 | consumed samples: 25395200 | consumed tokens: 52009369600 | elapsed time per iteration (s): 0.08 | learning rate: 9.108E-05 | global batch size: 256 | lm loss: 4.516919E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.626 | TFLOPs: 12.01 | 7: iteration 99210/ 173500 | consumed samples: 25397760 | consumed tokens: 52014612480 | elapsed time per iteration (s): 0.08 | learning rate: 9.106E-05 | global batch size: 256 | lm loss: 4.519585E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.693 | TFLOPs: 11.93 | 7: iteration 99220/ 173500 | consumed samples: 25400320 | consumed tokens: 52019855360 | elapsed time per iteration (s): 0.08 | learning rate: 9.104E-05 | global batch size: 256 | lm loss: 4.524406E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.101 | TFLOPs: 11.74 | 7: iteration 99230/ 173500 | consumed samples: 25402880 | consumed tokens: 52025098240 | elapsed time per iteration (s): 0.08 | learning rate: 9.103E-05 | global batch size: 256 | lm loss: 4.521678E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.537 | TFLOPs: 11.86 | 7: iteration 99240/ 173500 | consumed samples: 25405440 | consumed tokens: 52030341120 | elapsed time per iteration (s): 0.08 | learning rate: 9.101E-05 | global batch size: 256 | lm loss: 4.539275E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.410 | TFLOPs: 11.86 | 7: iteration 99250/ 173500 | consumed samples: 25408000 | consumed tokens: 52035584000 | elapsed time per iteration (s): 0.08 | learning rate: 9.100E-05 | global batch size: 256 | lm loss: 4.522977E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.332 | TFLOPs: 11.86 | 7: iteration 99260/ 173500 | consumed samples: 25410560 | consumed tokens: 52040826880 | elapsed time per iteration (s): 0.08 | learning rate: 9.098E-05 | global batch size: 256 | lm loss: 4.522668E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.904 | TFLOPs: 11.88 | 7: iteration 99270/ 173500 | consumed samples: 25413120 | consumed tokens: 52046069760 | elapsed time per iteration (s): 0.08 | learning rate: 9.096E-05 | global batch size: 256 | lm loss: 4.519711E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.797 | TFLOPs: 11.87 | 7: iteration 99280/ 173500 | consumed samples: 25415680 | consumed tokens: 52051312640 | elapsed time per iteration (s): 0.08 | learning rate: 9.095E-05 | global batch size: 256 | lm loss: 4.509964E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.523 | TFLOPs: 11.89 | 7: iteration 99290/ 173500 | consumed samples: 25418240 | consumed tokens: 52056555520 | elapsed time per iteration (s): 0.08 | learning rate: 9.093E-05 | global batch size: 256 | lm loss: 4.501957E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.589 | TFLOPs: 11.92 | 7: iteration 99300/ 173500 | consumed samples: 25420800 | consumed tokens: 52061798400 | elapsed time per iteration (s): 0.08 | learning rate: 9.092E-05 | global batch size: 256 | lm loss: 4.526672E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.338 | TFLOPs: 11.81 | 7: iteration 99310/ 173500 | consumed samples: 25423360 | consumed tokens: 52067041280 | elapsed time per iteration (s): 0.08 | learning rate: 9.090E-05 | global batch size: 256 | lm loss: 4.525295E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.431 | TFLOPs: 11.81 | 7: iteration 99320/ 173500 | consumed samples: 25425920 | consumed tokens: 52072284160 | elapsed time per iteration (s): 0.08 | learning rate: 9.088E-05 | global batch size: 256 | lm loss: 4.528720E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.069 | TFLOPs: 11.92 | 7: iteration 99330/ 173500 | consumed samples: 25428480 | consumed tokens: 52077527040 | elapsed time per iteration (s): 0.08 | learning rate: 9.087E-05 | global batch size: 256 | lm loss: 4.517996E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.350 | TFLOPs: 11.95 | 7: iteration 99340/ 173500 | consumed samples: 25431040 | consumed tokens: 52082769920 | elapsed time per iteration (s): 0.08 | learning rate: 9.085E-05 | global batch size: 256 | lm loss: 4.513670E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.900 | TFLOPs: 11.85 | 7: iteration 99350/ 173500 | consumed samples: 25433600 | consumed tokens: 52088012800 | elapsed time per iteration (s): 0.08 | learning rate: 9.084E-05 | global batch size: 256 | lm loss: 4.510192E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.679 | TFLOPs: 11.69 | 7: iteration 99360/ 173500 | consumed samples: 25436160 | consumed tokens: 52093255680 | elapsed time per iteration (s): 0.08 | learning rate: 9.082E-05 | global batch size: 256 | lm loss: 4.523293E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.035 | TFLOPs: 11.30 | 7: iteration 99370/ 173500 | consumed samples: 25438720 | consumed tokens: 52098498560 | elapsed time per iteration (s): 0.08 | learning rate: 9.080E-05 | global batch size: 256 | lm loss: 4.513425E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.229 | TFLOPs: 11.91 | 7: iteration 99380/ 173500 | consumed samples: 25441280 | consumed tokens: 52103741440 | elapsed time per iteration (s): 0.09 | learning rate: 9.079E-05 | global batch size: 256 | lm loss: 4.528313E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.836 | TFLOPs: 11.16 | 7: iteration 99390/ 173500 | consumed samples: 25443840 | consumed tokens: 52108984320 | elapsed time per iteration (s): 0.08 | learning rate: 9.077E-05 | global batch size: 256 | lm loss: 4.519656E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.958 | TFLOPs: 11.97 | 7: iteration 99400/ 173500 | consumed samples: 25446400 | consumed tokens: 52114227200 | elapsed time per iteration (s): 0.08 | learning rate: 9.076E-05 | global batch size: 256 | lm loss: 4.527691E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.556 | TFLOPs: 11.66 | 7: iteration 99410/ 173500 | consumed samples: 25448960 | consumed tokens: 52119470080 | elapsed time per iteration (s): 0.08 | learning rate: 9.074E-05 | global batch size: 256 | lm loss: 4.525162E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.872 | TFLOPs: 11.95 | 7: iteration 99420/ 173500 | consumed samples: 25451520 | consumed tokens: 52124712960 | elapsed time per iteration (s): 0.08 | learning rate: 9.072E-05 | global batch size: 256 | lm loss: 4.524539E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.702 | TFLOPs: 11.91 | 7: iteration 99430/ 173500 | consumed samples: 25454080 | consumed tokens: 52129955840 | elapsed time per iteration (s): 0.08 | learning rate: 9.071E-05 | global batch size: 256 | lm loss: 4.520154E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.668 | TFLOPs: 11.93 | 7: iteration 99440/ 173500 | consumed samples: 25456640 | consumed tokens: 52135198720 | elapsed time per iteration (s): 0.09 | learning rate: 9.069E-05 | global batch size: 256 | lm loss: 4.525457E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.258 | TFLOPs: 11.04 | 7: iteration 99450/ 173500 | consumed samples: 25459200 | consumed tokens: 52140441600 | elapsed time per iteration (s): 0.09 | learning rate: 9.067E-05 | global batch size: 256 | lm loss: 4.516769E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.263 | TFLOPs: 10.13 | 7: iteration 99460/ 173500 | consumed samples: 25461760 | consumed tokens: 52145684480 | elapsed time per iteration (s): 0.11 | learning rate: 9.066E-05 | global batch size: 256 | lm loss: 4.521353E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2266.010 | TFLOPs: 8.43 | 7: iteration 99470/ 173500 | consumed samples: 25464320 | consumed tokens: 52150927360 | elapsed time per iteration (s): 0.08 | learning rate: 9.064E-05 | global batch size: 256 | lm loss: 4.521142E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.205 | TFLOPs: 11.88 | 7: iteration 99480/ 173500 | consumed samples: 25466880 | consumed tokens: 52156170240 | elapsed time per iteration (s): 0.08 | learning rate: 9.063E-05 | global batch size: 256 | lm loss: 4.512125E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.175 | TFLOPs: 11.93 | 7: iteration 99490/ 173500 | consumed samples: 25469440 | consumed tokens: 52161413120 | elapsed time per iteration (s): 0.08 | learning rate: 9.061E-05 | global batch size: 256 | lm loss: 4.513496E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.657 | TFLOPs: 11.96 | 7: iteration 99500/ 173500 | consumed samples: 25472000 | consumed tokens: 52166656000 | elapsed time per iteration (s): 0.08 | learning rate: 9.059E-05 | global batch size: 256 | lm loss: 4.528632E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.482 | TFLOPs: 11.35 | 7: iteration 99510/ 173500 | consumed samples: 25474560 | consumed tokens: 52171898880 | elapsed time per iteration (s): 0.09 | learning rate: 9.058E-05 | global batch size: 256 | lm loss: 4.513485E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.169 | TFLOPs: 10.27 | 7: iteration 99520/ 173500 | consumed samples: 25477120 | consumed tokens: 52177141760 | elapsed time per iteration (s): 0.09 | learning rate: 9.056E-05 | global batch size: 256 | lm loss: 4.523445E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2755.170 | TFLOPs: 10.25 | 7: iteration 99530/ 173500 | consumed samples: 25479680 | consumed tokens: 52182384640 | elapsed time per iteration (s): 0.09 | learning rate: 9.055E-05 | global batch size: 256 | lm loss: 4.512601E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.510 | TFLOPs: 10.99 | 7: iteration 99540/ 173500 | consumed samples: 25482240 | consumed tokens: 52187627520 | elapsed time per iteration (s): 0.08 | learning rate: 9.053E-05 | global batch size: 256 | lm loss: 4.515135E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.559 | TFLOPs: 11.71 | 7: iteration 99550/ 173500 | consumed samples: 25484800 | consumed tokens: 52192870400 | elapsed time per iteration (s): 0.09 | learning rate: 9.051E-05 | global batch size: 256 | lm loss: 4.524192E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.129 | TFLOPs: 10.78 | 7: iteration 99560/ 173500 | consumed samples: 25487360 | consumed tokens: 52198113280 | elapsed time per iteration (s): 0.08 | learning rate: 9.050E-05 | global batch size: 256 | lm loss: 4.518285E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.817 | TFLOPs: 11.99 | 7: iteration 99570/ 173500 | consumed samples: 25489920 | consumed tokens: 52203356160 | elapsed time per iteration (s): 0.08 | learning rate: 9.048E-05 | global batch size: 256 | lm loss: 4.517035E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.301 | TFLOPs: 11.97 | 7: iteration 99580/ 173500 | consumed samples: 25492480 | consumed tokens: 52208599040 | elapsed time per iteration (s): 0.08 | learning rate: 9.047E-05 | global batch size: 256 | lm loss: 4.528740E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.668 | TFLOPs: 11.99 | 7: iteration 99590/ 173500 | consumed samples: 25495040 | consumed tokens: 52213841920 | elapsed time per iteration (s): 0.08 | learning rate: 9.045E-05 | global batch size: 256 | lm loss: 4.522196E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.990 | TFLOPs: 11.87 | 7: iteration 99600/ 173500 | consumed samples: 25497600 | consumed tokens: 52219084800 | elapsed time per iteration (s): 0.08 | learning rate: 9.043E-05 | global batch size: 256 | lm loss: 4.508865E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.401 | TFLOPs: 11.96 | 7: iteration 99610/ 173500 | consumed samples: 25500160 | consumed tokens: 52224327680 | elapsed time per iteration (s): 0.08 | learning rate: 9.042E-05 | global batch size: 256 | lm loss: 4.516672E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.574 | TFLOPs: 11.85 | 7: iteration 99620/ 173500 | consumed samples: 25502720 | consumed tokens: 52229570560 | elapsed time per iteration (s): 0.08 | learning rate: 9.040E-05 | global batch size: 256 | lm loss: 4.512006E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.308 | TFLOPs: 11.90 | 7: iteration 99630/ 173500 | consumed samples: 25505280 | consumed tokens: 52234813440 | elapsed time per iteration (s): 0.08 | learning rate: 9.039E-05 | global batch size: 256 | lm loss: 4.525312E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.702 | TFLOPs: 11.25 | 7: iteration 99640/ 173500 | consumed samples: 25507840 | consumed tokens: 52240056320 | elapsed time per iteration (s): 0.08 | learning rate: 9.037E-05 | global batch size: 256 | lm loss: 4.508967E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.532 | TFLOPs: 11.60 | 7: iteration 99650/ 173500 | consumed samples: 25510400 | consumed tokens: 52245299200 | elapsed time per iteration (s): 0.08 | learning rate: 9.035E-05 | global batch size: 256 | lm loss: 4.516392E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.777 | TFLOPs: 11.85 | 7: iteration 99660/ 173500 | consumed samples: 25512960 | consumed tokens: 52250542080 | elapsed time per iteration (s): 0.09 | learning rate: 9.034E-05 | global batch size: 256 | lm loss: 4.527646E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.864 | TFLOPs: 11.10 | 7: iteration 99670/ 173500 | consumed samples: 25515520 | consumed tokens: 52255784960 | elapsed time per iteration (s): 0.08 | learning rate: 9.032E-05 | global batch size: 256 | lm loss: 4.527973E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.996 | TFLOPs: 11.99 | 7: iteration 99680/ 173500 | consumed samples: 25518080 | consumed tokens: 52261027840 | elapsed time per iteration (s): 0.08 | learning rate: 9.031E-05 | global batch size: 256 | lm loss: 4.518980E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.841 | TFLOPs: 11.99 | 7: iteration 99690/ 173500 | consumed samples: 25520640 | consumed tokens: 52266270720 | elapsed time per iteration (s): 0.10 | learning rate: 9.029E-05 | global batch size: 256 | lm loss: 4.523680E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2608.773 | TFLOPs: 9.70 | 7: iteration 99700/ 173500 | consumed samples: 25523200 | consumed tokens: 52271513600 | elapsed time per iteration (s): 0.08 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 4.520049E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.281 | TFLOPs: 11.97 | 7: iteration 99710/ 173500 | consumed samples: 25525760 | consumed tokens: 52276756480 | elapsed time per iteration (s): 0.09 | learning rate: 9.026E-05 | global batch size: 256 | lm loss: 4.519133E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.065 | TFLOPs: 10.97 | 7: iteration 99720/ 173500 | consumed samples: 25528320 | consumed tokens: 52281999360 | elapsed time per iteration (s): 0.10 | learning rate: 9.024E-05 | global batch size: 256 | lm loss: 4.526717E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2645.722 | TFLOPs: 9.84 | 7: iteration 99730/ 173500 | consumed samples: 25530880 | consumed tokens: 52287242240 | elapsed time per iteration (s): 0.08 | learning rate: 9.022E-05 | global batch size: 256 | lm loss: 4.520210E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.716 | TFLOPs: 11.96 | 7: iteration 99740/ 173500 | consumed samples: 25533440 | consumed tokens: 52292485120 | elapsed time per iteration (s): 0.08 | learning rate: 9.021E-05 | global batch size: 256 | lm loss: 4.519040E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.346 | TFLOPs: 11.90 | 7: iteration 99750/ 173500 | consumed samples: 25536000 | consumed tokens: 52297728000 | elapsed time per iteration (s): 0.08 | learning rate: 9.019E-05 | global batch size: 256 | lm loss: 4.523035E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.445 | TFLOPs: 11.79 | 7: iteration 99760/ 173500 | consumed samples: 25538560 | consumed tokens: 52302970880 | elapsed time per iteration (s): 0.08 | learning rate: 9.018E-05 | global batch size: 256 | lm loss: 4.520514E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.406 | TFLOPs: 11.83 | 7: iteration 99770/ 173500 | consumed samples: 25541120 | consumed tokens: 52308213760 | elapsed time per iteration (s): 0.08 | learning rate: 9.016E-05 | global batch size: 256 | lm loss: 4.524321E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.830 | TFLOPs: 11.75 | 7: iteration 99780/ 173500 | consumed samples: 25543680 | consumed tokens: 52313456640 | elapsed time per iteration (s): 0.08 | learning rate: 9.014E-05 | global batch size: 256 | lm loss: 4.507948E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.246 | TFLOPs: 11.74 | 7: iteration 99790/ 173500 | consumed samples: 25546240 | consumed tokens: 52318699520 | elapsed time per iteration (s): 0.08 | learning rate: 9.013E-05 | global batch size: 256 | lm loss: 4.525963E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.130 | TFLOPs: 11.84 | 7: iteration 99800/ 173500 | consumed samples: 25548800 | consumed tokens: 52323942400 | elapsed time per iteration (s): 0.08 | learning rate: 9.011E-05 | global batch size: 256 | lm loss: 4.529523E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.833 | TFLOPs: 11.81 | 7: iteration 99810/ 173500 | consumed samples: 25551360 | consumed tokens: 52329185280 | elapsed time per iteration (s): 0.08 | learning rate: 9.010E-05 | global batch size: 256 | lm loss: 4.517572E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.559 | TFLOPs: 11.78 | 7: iteration 99820/ 173500 | consumed samples: 25553920 | consumed tokens: 52334428160 | elapsed time per iteration (s): 0.08 | learning rate: 9.008E-05 | global batch size: 256 | lm loss: 4.518645E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.257 | TFLOPs: 11.80 | 7: iteration 99830/ 173500 | consumed samples: 25556480 | consumed tokens: 52339671040 | elapsed time per iteration (s): 0.08 | learning rate: 9.006E-05 | global batch size: 256 | lm loss: 4.526609E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.618 | TFLOPs: 11.74 | 7: iteration 99840/ 173500 | consumed samples: 25559040 | consumed tokens: 52344913920 | elapsed time per iteration (s): 0.08 | learning rate: 9.005E-05 | global batch size: 256 | lm loss: 4.519147E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.639 | TFLOPs: 11.68 | 7: iteration 99850/ 173500 | consumed samples: 25561600 | consumed tokens: 52350156800 | elapsed time per iteration (s): 0.08 | learning rate: 9.003E-05 | global batch size: 256 | lm loss: 4.519088E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.185 | TFLOPs: 11.78 | 7: iteration 99860/ 173500 | consumed samples: 25564160 | consumed tokens: 52355399680 | elapsed time per iteration (s): 0.08 | learning rate: 9.002E-05 | global batch size: 256 | lm loss: 4.508993E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.419 | TFLOPs: 11.76 | 7: iteration 99870/ 173500 | consumed samples: 25566720 | consumed tokens: 52360642560 | elapsed time per iteration (s): 0.08 | learning rate: 9.000E-05 | global batch size: 256 | lm loss: 4.505327E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.865 | TFLOPs: 11.78 | 7: iteration 99880/ 173500 | consumed samples: 25569280 | consumed tokens: 52365885440 | elapsed time per iteration (s): 0.08 | learning rate: 8.998E-05 | global batch size: 256 | lm loss: 4.510711E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.701 | TFLOPs: 11.76 | 7: iteration 99890/ 173500 | consumed samples: 25571840 | consumed tokens: 52371128320 | elapsed time per iteration (s): 0.09 | learning rate: 8.997E-05 | global batch size: 256 | lm loss: 4.519687E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.206 | TFLOPs: 11.07 | 7: iteration 99900/ 173500 | consumed samples: 25574400 | consumed tokens: 52376371200 | elapsed time per iteration (s): 0.08 | learning rate: 8.995E-05 | global batch size: 256 | lm loss: 4.524697E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.873 | TFLOPs: 11.79 | 7: iteration 99910/ 173500 | consumed samples: 25576960 | consumed tokens: 52381614080 | elapsed time per iteration (s): 0.09 | learning rate: 8.994E-05 | global batch size: 256 | lm loss: 4.526822E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.442 | TFLOPs: 10.36 | 7: iteration 99920/ 173500 | consumed samples: 25579520 | consumed tokens: 52386856960 | elapsed time per iteration (s): 0.10 | learning rate: 8.992E-05 | global batch size: 256 | lm loss: 4.524545E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.428 | TFLOPs: 9.94 | 7: iteration 99930/ 173500 | consumed samples: 25582080 | consumed tokens: 52392099840 | elapsed time per iteration (s): 0.08 | learning rate: 8.990E-05 | global batch size: 256 | lm loss: 4.520442E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.417 | TFLOPs: 11.81 | 7: iteration 99940/ 173500 | consumed samples: 25584640 | consumed tokens: 52397342720 | elapsed time per iteration (s): 0.08 | learning rate: 8.989E-05 | global batch size: 256 | lm loss: 4.505604E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.118 | TFLOPs: 11.79 | 7: iteration 99950/ 173500 | consumed samples: 25587200 | consumed tokens: 52402585600 | elapsed time per iteration (s): 0.08 | learning rate: 8.987E-05 | global batch size: 256 | lm loss: 4.514753E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.506 | TFLOPs: 11.82 | 7: iteration 99960/ 173500 | consumed samples: 25589760 | consumed tokens: 52407828480 | elapsed time per iteration (s): 0.08 | learning rate: 8.986E-05 | global batch size: 256 | lm loss: 4.523105E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.600 | TFLOPs: 11.73 | 7: iteration 99970/ 173500 | consumed samples: 25592320 | consumed tokens: 52413071360 | elapsed time per iteration (s): 0.08 | learning rate: 8.984E-05 | global batch size: 256 | lm loss: 4.516568E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.874 | TFLOPs: 11.76 | 7: iteration 99980/ 173500 | consumed samples: 25594880 | consumed tokens: 52418314240 | elapsed time per iteration (s): 0.11 | learning rate: 8.982E-05 | global batch size: 256 | lm loss: 4.519902E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2411.002 | TFLOPs: 8.97 | 7: iteration 99990/ 173500 | consumed samples: 25597440 | consumed tokens: 52423557120 | elapsed time per iteration (s): 0.08 | learning rate: 8.981E-05 | global batch size: 256 | lm loss: 4.515665E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.155 | TFLOPs: 11.71 | 0: [2023-03-17 02:39:43,762] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=0, lr=[8.979141123724914e-05, 8.979141123724914e-05, 8.979141123724914e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 100000/ 173500 | consumed samples: 25600000 | consumed tokens: 52428800000 | elapsed time per iteration (s): 0.11 | learning rate: 8.979E-05 | global batch size: 256 | lm loss: 4.529346E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2363.950 | TFLOPs: 8.79 | 0: steps: 100000 loss: 4.5167 iter time (s): 0.084 samples/sec: 3040.414 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 100000 | lm loss value: 4.361545E+00 | lm loss PPL: 7.837810E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 100000 to checkpoints_14m91b100m 0: [2023-03-17 02:39:43,844] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step100000 is begin to save! 0: [2023-03-17 02:39:43,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:39:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:39:43,873] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:39:43,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:39:43,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:39:43,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:39:43,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:39:43,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:39:43,883] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:39:43,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:39:43,886] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:39:43,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:39:43,887] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step100000/mp_rank_00_model_states.pt 0: [2023-03-17 02:39:43,887] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:39:43,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:39:43,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:39:43,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 02:39:43,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,914] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,914] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,915] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,915] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,916] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,917] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2023-03-17 02:39:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:39:43,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2023-03-17 02:39:43,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: successfully saved checkpoint at iteration 100000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.58 7: iteration 100010/ 173500 | consumed samples: 25602560 | consumed tokens: 52434042880 | elapsed time per iteration (s): 0.11 | learning rate: 8.978E-05 | global batch size: 256 | lm loss: 4.522852E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.750 | TFLOPs: 8.38 | 7: iteration 100020/ 173500 | consumed samples: 25605120 | consumed tokens: 52439285760 | elapsed time per iteration (s): 0.08 | learning rate: 8.976E-05 | global batch size: 256 | lm loss: 4.528050E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.406 | TFLOPs: 11.83 | 7: iteration 100030/ 173500 | consumed samples: 25607680 | consumed tokens: 52444528640 | elapsed time per iteration (s): 0.08 | learning rate: 8.974E-05 | global batch size: 256 | lm loss: 4.516169E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.881 | TFLOPs: 11.82 | 7: iteration 100040/ 173500 | consumed samples: 25610240 | consumed tokens: 52449771520 | elapsed time per iteration (s): 0.08 | learning rate: 8.973E-05 | global batch size: 256 | lm loss: 4.529655E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.732 | TFLOPs: 11.82 | 7: iteration 100050/ 173500 | consumed samples: 25612800 | consumed tokens: 52455014400 | elapsed time per iteration (s): 0.08 | learning rate: 8.971E-05 | global batch size: 256 | lm loss: 4.522179E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.923 | TFLOPs: 11.84 | 7: iteration 100060/ 173500 | consumed samples: 25615360 | consumed tokens: 52460257280 | elapsed time per iteration (s): 0.08 | learning rate: 8.970E-05 | global batch size: 256 | lm loss: 4.514991E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.252 | TFLOPs: 11.83 | 7: iteration 100070/ 173500 | consumed samples: 25617920 | consumed tokens: 52465500160 | elapsed time per iteration (s): 0.08 | learning rate: 8.968E-05 | global batch size: 256 | lm loss: 4.510941E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.553 | TFLOPs: 11.53 | 7: iteration 100080/ 173500 | consumed samples: 25620480 | consumed tokens: 52470743040 | elapsed time per iteration (s): 0.09 | learning rate: 8.966E-05 | global batch size: 256 | lm loss: 4.517798E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.564 | TFLOPs: 10.82 | 7: iteration 100090/ 173500 | consumed samples: 25623040 | consumed tokens: 52475985920 | elapsed time per iteration (s): 0.08 | learning rate: 8.965E-05 | global batch size: 256 | lm loss: 4.524144E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.761 | TFLOPs: 11.83 | 7: iteration 100100/ 173500 | consumed samples: 25625600 | consumed tokens: 52481228800 | elapsed time per iteration (s): 0.08 | learning rate: 8.963E-05 | global batch size: 256 | lm loss: 4.509745E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.068 | TFLOPs: 11.81 | 7: iteration 100110/ 173500 | consumed samples: 25628160 | consumed tokens: 52486471680 | elapsed time per iteration (s): 0.08 | learning rate: 8.962E-05 | global batch size: 256 | lm loss: 4.510648E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.583 | TFLOPs: 11.80 | 7: iteration 100120/ 173500 | consumed samples: 25630720 | consumed tokens: 52491714560 | elapsed time per iteration (s): 0.08 | learning rate: 8.960E-05 | global batch size: 256 | lm loss: 4.530141E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.848 | TFLOPs: 11.83 | 7: iteration 100130/ 173500 | consumed samples: 25633280 | consumed tokens: 52496957440 | elapsed time per iteration (s): 0.08 | learning rate: 8.958E-05 | global batch size: 256 | lm loss: 4.523058E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.392 | TFLOPs: 11.83 | 7: iteration 100140/ 173500 | consumed samples: 25635840 | consumed tokens: 52502200320 | elapsed time per iteration (s): 0.08 | learning rate: 8.957E-05 | global batch size: 256 | lm loss: 4.517224E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.027 | TFLOPs: 11.84 | 7: iteration 100150/ 173500 | consumed samples: 25638400 | consumed tokens: 52507443200 | elapsed time per iteration (s): 0.08 | learning rate: 8.955E-05 | global batch size: 256 | lm loss: 4.521311E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.771 | TFLOPs: 11.72 | 7: iteration 100160/ 173500 | consumed samples: 25640960 | consumed tokens: 52512686080 | elapsed time per iteration (s): 0.08 | learning rate: 8.953E-05 | global batch size: 256 | lm loss: 4.526330E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.441 | TFLOPs: 11.84 | 7: iteration 100170/ 173500 | consumed samples: 25643520 | consumed tokens: 52517928960 | elapsed time per iteration (s): 0.08 | learning rate: 8.952E-05 | global batch size: 256 | lm loss: 4.521291E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.159 | TFLOPs: 11.82 | 7: iteration 100180/ 173500 | consumed samples: 25646080 | consumed tokens: 52523171840 | elapsed time per iteration (s): 0.10 | learning rate: 8.950E-05 | global batch size: 256 | lm loss: 4.526627E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.273 | TFLOPs: 9.43 | 7: iteration 100190/ 173500 | consumed samples: 25648640 | consumed tokens: 52528414720 | elapsed time per iteration (s): 0.12 | learning rate: 8.949E-05 | global batch size: 256 | lm loss: 4.531655E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.017 | TFLOPs: 8.22 | 7: iteration 100200/ 173500 | consumed samples: 25651200 | consumed tokens: 52533657600 | elapsed time per iteration (s): 0.12 | learning rate: 8.947E-05 | global batch size: 256 | lm loss: 4.523673E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2153.579 | TFLOPs: 8.01 | 7: iteration 100210/ 173500 | consumed samples: 25653760 | consumed tokens: 52538900480 | elapsed time per iteration (s): 0.12 | learning rate: 8.945E-05 | global batch size: 256 | lm loss: 4.508534E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2077.975 | TFLOPs: 7.73 | 7: iteration 100220/ 173500 | consumed samples: 25656320 | consumed tokens: 52544143360 | elapsed time per iteration (s): 0.13 | learning rate: 8.944E-05 | global batch size: 256 | lm loss: 4.505461E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.735 | TFLOPs: 7.42 | 7: iteration 100230/ 173500 | consumed samples: 25658880 | consumed tokens: 52549386240 | elapsed time per iteration (s): 0.12 | learning rate: 8.942E-05 | global batch size: 256 | lm loss: 4.520612E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.204 | TFLOPs: 7.70 | 7: iteration 100240/ 173500 | consumed samples: 25661440 | consumed tokens: 52554629120 | elapsed time per iteration (s): 0.13 | learning rate: 8.941E-05 | global batch size: 256 | lm loss: 4.505127E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.370 | TFLOPs: 7.37 | 7: iteration 100250/ 173500 | consumed samples: 25664000 | consumed tokens: 52559872000 | elapsed time per iteration (s): 0.13 | learning rate: 8.939E-05 | global batch size: 256 | lm loss: 4.517532E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.223 | TFLOPs: 7.35 | 7: iteration 100260/ 173500 | consumed samples: 25666560 | consumed tokens: 52565114880 | elapsed time per iteration (s): 0.13 | learning rate: 8.937E-05 | global batch size: 256 | lm loss: 4.506393E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.658 | TFLOPs: 7.49 | 7: iteration 100270/ 173500 | consumed samples: 25669120 | consumed tokens: 52570357760 | elapsed time per iteration (s): 0.12 | learning rate: 8.936E-05 | global batch size: 256 | lm loss: 4.514330E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.114 | TFLOPs: 7.78 | 7: iteration 100280/ 173500 | consumed samples: 25671680 | consumed tokens: 52575600640 | elapsed time per iteration (s): 0.12 | learning rate: 8.934E-05 | global batch size: 256 | lm loss: 4.523138E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.782 | TFLOPs: 8.07 | 7: iteration 100290/ 173500 | consumed samples: 25674240 | consumed tokens: 52580843520 | elapsed time per iteration (s): 0.13 | learning rate: 8.933E-05 | global batch size: 256 | lm loss: 4.516981E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.621 | TFLOPs: 7.37 | 7: iteration 100300/ 173500 | consumed samples: 25676800 | consumed tokens: 52586086400 | elapsed time per iteration (s): 0.13 | learning rate: 8.931E-05 | global batch size: 256 | lm loss: 4.516713E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.244 | TFLOPs: 7.41 | 7: iteration 100310/ 173500 | consumed samples: 25679360 | consumed tokens: 52591329280 | elapsed time per iteration (s): 0.12 | learning rate: 8.929E-05 | global batch size: 256 | lm loss: 4.517644E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2185.698 | TFLOPs: 8.13 | 7: iteration 100320/ 173500 | consumed samples: 25681920 | consumed tokens: 52596572160 | elapsed time per iteration (s): 0.13 | learning rate: 8.928E-05 | global batch size: 256 | lm loss: 4.526351E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.362 | TFLOPs: 7.58 | 7: iteration 100330/ 173500 | consumed samples: 25684480 | consumed tokens: 52601815040 | elapsed time per iteration (s): 0.13 | learning rate: 8.926E-05 | global batch size: 256 | lm loss: 4.524177E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.597 | TFLOPs: 7.49 | 7: iteration 100340/ 173500 | consumed samples: 25687040 | consumed tokens: 52607057920 | elapsed time per iteration (s): 0.13 | learning rate: 8.925E-05 | global batch size: 256 | lm loss: 4.505953E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.400 | TFLOPs: 7.37 | 7: iteration 100350/ 173500 | consumed samples: 25689600 | consumed tokens: 52612300800 | elapsed time per iteration (s): 0.13 | learning rate: 8.923E-05 | global batch size: 256 | lm loss: 4.513866E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2037.495 | TFLOPs: 7.58 | 7: iteration 100360/ 173500 | consumed samples: 25692160 | consumed tokens: 52617543680 | elapsed time per iteration (s): 0.12 | learning rate: 8.921E-05 | global batch size: 256 | lm loss: 4.515982E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.181 | TFLOPs: 7.82 | 7: iteration 100370/ 173500 | consumed samples: 25694720 | consumed tokens: 52622786560 | elapsed time per iteration (s): 0.13 | learning rate: 8.920E-05 | global batch size: 256 | lm loss: 4.528532E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1941.835 | TFLOPs: 7.22 | 7: iteration 100380/ 173500 | consumed samples: 25697280 | consumed tokens: 52628029440 | elapsed time per iteration (s): 0.13 | learning rate: 8.918E-05 | global batch size: 256 | lm loss: 4.532372E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.831 | TFLOPs: 7.61 | 7: iteration 100390/ 173500 | consumed samples: 25699840 | consumed tokens: 52633272320 | elapsed time per iteration (s): 0.11 | learning rate: 8.917E-05 | global batch size: 256 | lm loss: 4.516531E+00 | grad norm: 0.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2249.537 | TFLOPs: 8.37 | 7: iteration 100400/ 173500 | consumed samples: 25702400 | consumed tokens: 52638515200 | elapsed time per iteration (s): 0.09 | learning rate: 8.915E-05 | global batch size: 256 | lm loss: 4.533489E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.124 | TFLOPs: 10.27 | 7: iteration 100410/ 173500 | consumed samples: 25704960 | consumed tokens: 52643758080 | elapsed time per iteration (s): 0.08 | learning rate: 8.913E-05 | global batch size: 256 | lm loss: 4.527292E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.219 | TFLOPs: 11.88 | 7: iteration 100420/ 173500 | consumed samples: 25707520 | consumed tokens: 52649000960 | elapsed time per iteration (s): 0.08 | learning rate: 8.912E-05 | global batch size: 256 | lm loss: 4.524176E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.072 | TFLOPs: 11.85 | 7: iteration 100430/ 173500 | consumed samples: 25710080 | consumed tokens: 52654243840 | elapsed time per iteration (s): 0.08 | learning rate: 8.910E-05 | global batch size: 256 | lm loss: 4.528698E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.353 | TFLOPs: 11.82 | 7: iteration 100440/ 173500 | consumed samples: 25712640 | consumed tokens: 52659486720 | elapsed time per iteration (s): 0.08 | learning rate: 8.909E-05 | global batch size: 256 | lm loss: 4.517393E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.384 | TFLOPs: 11.57 | 7: iteration 100450/ 173500 | consumed samples: 25715200 | consumed tokens: 52664729600 | elapsed time per iteration (s): 0.08 | learning rate: 8.907E-05 | global batch size: 256 | lm loss: 4.522704E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.502 | TFLOPs: 11.86 | 7: iteration 100460/ 173500 | consumed samples: 25717760 | consumed tokens: 52669972480 | elapsed time per iteration (s): 0.08 | learning rate: 8.905E-05 | global batch size: 256 | lm loss: 4.527089E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.347 | TFLOPs: 11.87 | 7: iteration 100470/ 173500 | consumed samples: 25720320 | consumed tokens: 52675215360 | elapsed time per iteration (s): 0.08 | learning rate: 8.904E-05 | global batch size: 256 | lm loss: 4.509166E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.117 | TFLOPs: 11.85 | 7: iteration 100480/ 173500 | consumed samples: 25722880 | consumed tokens: 52680458240 | elapsed time per iteration (s): 0.08 | learning rate: 8.902E-05 | global batch size: 256 | lm loss: 4.526110E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.425 | TFLOPs: 11.83 | 7: iteration 100490/ 173500 | consumed samples: 25725440 | consumed tokens: 52685701120 | elapsed time per iteration (s): 0.08 | learning rate: 8.901E-05 | global batch size: 256 | lm loss: 4.511589E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.276 | TFLOPs: 11.84 | 7: iteration 100500/ 173500 | consumed samples: 25728000 | consumed tokens: 52690944000 | elapsed time per iteration (s): 0.08 | learning rate: 8.899E-05 | global batch size: 256 | lm loss: 4.530608E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.529 | TFLOPs: 11.83 | 7: iteration 100510/ 173500 | consumed samples: 25730560 | consumed tokens: 52696186880 | elapsed time per iteration (s): 0.08 | learning rate: 8.897E-05 | global batch size: 256 | lm loss: 4.517876E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.780 | TFLOPs: 11.88 | 7: iteration 100520/ 173500 | consumed samples: 25733120 | consumed tokens: 52701429760 | elapsed time per iteration (s): 0.08 | learning rate: 8.896E-05 | global batch size: 256 | lm loss: 4.524123E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.101 | TFLOPs: 11.86 | 7: iteration 100530/ 173500 | consumed samples: 25735680 | consumed tokens: 52706672640 | elapsed time per iteration (s): 0.08 | learning rate: 8.894E-05 | global batch size: 256 | lm loss: 4.534640E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.961 | TFLOPs: 11.85 | 7: iteration 100540/ 173500 | consumed samples: 25738240 | consumed tokens: 52711915520 | elapsed time per iteration (s): 0.08 | learning rate: 8.893E-05 | global batch size: 256 | lm loss: 4.520259E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.066 | TFLOPs: 11.85 | 7: iteration 100550/ 173500 | consumed samples: 25740800 | consumed tokens: 52717158400 | elapsed time per iteration (s): 0.08 | learning rate: 8.891E-05 | global batch size: 256 | lm loss: 4.525476E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.590 | TFLOPs: 11.84 | 7: iteration 100560/ 173500 | consumed samples: 25743360 | consumed tokens: 52722401280 | elapsed time per iteration (s): 0.08 | learning rate: 8.889E-05 | global batch size: 256 | lm loss: 4.504903E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.084 | TFLOPs: 11.72 | 7: iteration 100570/ 173500 | consumed samples: 25745920 | consumed tokens: 52727644160 | elapsed time per iteration (s): 0.08 | learning rate: 8.888E-05 | global batch size: 256 | lm loss: 4.521259E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.423 | TFLOPs: 11.79 | 7: iteration 100580/ 173500 | consumed samples: 25748480 | consumed tokens: 52732887040 | elapsed time per iteration (s): 0.08 | learning rate: 8.886E-05 | global batch size: 256 | lm loss: 4.523839E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.123 | TFLOPs: 11.89 | 7: iteration 100590/ 173500 | consumed samples: 25751040 | consumed tokens: 52738129920 | elapsed time per iteration (s): 0.10 | learning rate: 8.885E-05 | global batch size: 256 | lm loss: 4.508184E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.588 | TFLOPs: 10.00 | 7: iteration 100600/ 173500 | consumed samples: 25753600 | consumed tokens: 52743372800 | elapsed time per iteration (s): 0.08 | learning rate: 8.883E-05 | global batch size: 256 | lm loss: 4.501683E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.449 | TFLOPs: 11.86 | 7: iteration 100610/ 173500 | consumed samples: 25756160 | consumed tokens: 52748615680 | elapsed time per iteration (s): 0.08 | learning rate: 8.881E-05 | global batch size: 256 | lm loss: 4.520090E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.308 | TFLOPs: 11.91 | 7: iteration 100620/ 173500 | consumed samples: 25758720 | consumed tokens: 52753858560 | elapsed time per iteration (s): 0.08 | learning rate: 8.880E-05 | global batch size: 256 | lm loss: 4.528043E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.629 | TFLOPs: 11.85 | 7: iteration 100630/ 173500 | consumed samples: 25761280 | consumed tokens: 52759101440 | elapsed time per iteration (s): 0.08 | learning rate: 8.878E-05 | global batch size: 256 | lm loss: 4.510022E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.111 | TFLOPs: 11.66 | 7: iteration 100640/ 173500 | consumed samples: 25763840 | consumed tokens: 52764344320 | elapsed time per iteration (s): 0.08 | learning rate: 8.877E-05 | global batch size: 256 | lm loss: 4.514734E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.284 | TFLOPs: 11.69 | 7: iteration 100650/ 173500 | consumed samples: 25766400 | consumed tokens: 52769587200 | elapsed time per iteration (s): 0.08 | learning rate: 8.875E-05 | global batch size: 256 | lm loss: 4.521932E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.102 | TFLOPs: 11.93 | 7: iteration 100660/ 173500 | consumed samples: 25768960 | consumed tokens: 52774830080 | elapsed time per iteration (s): 0.08 | learning rate: 8.873E-05 | global batch size: 256 | lm loss: 4.513047E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.149 | TFLOPs: 11.90 | 7: iteration 100670/ 173500 | consumed samples: 25771520 | consumed tokens: 52780072960 | elapsed time per iteration (s): 0.08 | learning rate: 8.872E-05 | global batch size: 256 | lm loss: 4.513168E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.716 | TFLOPs: 11.95 | 7: iteration 100680/ 173500 | consumed samples: 25774080 | consumed tokens: 52785315840 | elapsed time per iteration (s): 0.08 | learning rate: 8.870E-05 | global batch size: 256 | lm loss: 4.511707E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.811 | TFLOPs: 11.88 | 7: iteration 100690/ 173500 | consumed samples: 25776640 | consumed tokens: 52790558720 | elapsed time per iteration (s): 0.08 | learning rate: 8.869E-05 | global batch size: 256 | lm loss: 4.507519E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.352 | TFLOPs: 11.87 | 7: iteration 100700/ 173500 | consumed samples: 25779200 | consumed tokens: 52795801600 | elapsed time per iteration (s): 0.08 | learning rate: 8.867E-05 | global batch size: 256 | lm loss: 4.502324E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.170 | TFLOPs: 11.67 | 7: iteration 100710/ 173500 | consumed samples: 25781760 | consumed tokens: 52801044480 | elapsed time per iteration (s): 0.08 | learning rate: 8.865E-05 | global batch size: 256 | lm loss: 4.519880E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.470 | TFLOPs: 11.57 | 7: iteration 100720/ 173500 | consumed samples: 25784320 | consumed tokens: 52806287360 | elapsed time per iteration (s): 0.08 | learning rate: 8.864E-05 | global batch size: 256 | lm loss: 4.520796E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.948 | TFLOPs: 11.64 | 7: iteration 100730/ 173500 | consumed samples: 25786880 | consumed tokens: 52811530240 | elapsed time per iteration (s): 0.08 | learning rate: 8.862E-05 | global batch size: 256 | lm loss: 4.513663E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.001 | TFLOPs: 11.72 | 7: iteration 100740/ 173500 | consumed samples: 25789440 | consumed tokens: 52816773120 | elapsed time per iteration (s): 0.08 | learning rate: 8.861E-05 | global batch size: 256 | lm loss: 4.508746E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.211 | TFLOPs: 11.91 | 7: iteration 100750/ 173500 | consumed samples: 25792000 | consumed tokens: 52822016000 | elapsed time per iteration (s): 0.08 | learning rate: 8.859E-05 | global batch size: 256 | lm loss: 4.518632E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.113 | TFLOPs: 11.87 | 7: iteration 100760/ 173500 | consumed samples: 25794560 | consumed tokens: 52827258880 | elapsed time per iteration (s): 0.09 | learning rate: 8.857E-05 | global batch size: 256 | lm loss: 4.520733E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.194 | TFLOPs: 10.23 | 7: iteration 100770/ 173500 | consumed samples: 25797120 | consumed tokens: 52832501760 | elapsed time per iteration (s): 0.08 | learning rate: 8.856E-05 | global batch size: 256 | lm loss: 4.514507E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.616 | TFLOPs: 11.89 | 7: iteration 100780/ 173500 | consumed samples: 25799680 | consumed tokens: 52837744640 | elapsed time per iteration (s): 0.08 | learning rate: 8.854E-05 | global batch size: 256 | lm loss: 4.516510E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.946 | TFLOPs: 11.88 | 7: iteration 100790/ 173500 | consumed samples: 25802240 | consumed tokens: 52842987520 | elapsed time per iteration (s): 0.08 | learning rate: 8.853E-05 | global batch size: 256 | lm loss: 4.514238E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.176 | TFLOPs: 11.91 | 7: iteration 100800/ 173500 | consumed samples: 25804800 | consumed tokens: 52848230400 | elapsed time per iteration (s): 0.08 | learning rate: 8.851E-05 | global batch size: 256 | lm loss: 4.526400E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.673 | TFLOPs: 11.82 | 7: iteration 100810/ 173500 | consumed samples: 25807360 | consumed tokens: 52853473280 | elapsed time per iteration (s): 0.09 | learning rate: 8.849E-05 | global batch size: 256 | lm loss: 4.507474E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2853.975 | TFLOPs: 10.62 | 7: iteration 100820/ 173500 | consumed samples: 25809920 | consumed tokens: 52858716160 | elapsed time per iteration (s): 0.08 | learning rate: 8.848E-05 | global batch size: 256 | lm loss: 4.527730E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.320 | TFLOPs: 11.82 | 7: iteration 100830/ 173500 | consumed samples: 25812480 | consumed tokens: 52863959040 | elapsed time per iteration (s): 0.10 | learning rate: 8.846E-05 | global batch size: 256 | lm loss: 4.528265E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2449.892 | TFLOPs: 9.11 | 7: iteration 100840/ 173500 | consumed samples: 25815040 | consumed tokens: 52869201920 | elapsed time per iteration (s): 0.11 | learning rate: 8.845E-05 | global batch size: 256 | lm loss: 4.525845E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.249 | TFLOPs: 8.56 | 7: iteration 100850/ 173500 | consumed samples: 25817600 | consumed tokens: 52874444800 | elapsed time per iteration (s): 0.12 | learning rate: 8.843E-05 | global batch size: 256 | lm loss: 4.516869E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.275 | TFLOPs: 7.88 | 7: iteration 100860/ 173500 | consumed samples: 25820160 | consumed tokens: 52879687680 | elapsed time per iteration (s): 0.11 | learning rate: 8.841E-05 | global batch size: 256 | lm loss: 4.503687E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2414.382 | TFLOPs: 8.98 | 7: iteration 100870/ 173500 | consumed samples: 25822720 | consumed tokens: 52884930560 | elapsed time per iteration (s): 0.08 | learning rate: 8.840E-05 | global batch size: 256 | lm loss: 4.524977E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.566 | TFLOPs: 11.89 | 7: iteration 100880/ 173500 | consumed samples: 25825280 | consumed tokens: 52890173440 | elapsed time per iteration (s): 0.08 | learning rate: 8.838E-05 | global batch size: 256 | lm loss: 4.516884E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.010 | TFLOPs: 11.89 | 7: iteration 100890/ 173500 | consumed samples: 25827840 | consumed tokens: 52895416320 | elapsed time per iteration (s): 0.08 | learning rate: 8.837E-05 | global batch size: 256 | lm loss: 4.517717E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.916 | TFLOPs: 11.87 | 7: iteration 100900/ 173500 | consumed samples: 25830400 | consumed tokens: 52900659200 | elapsed time per iteration (s): 0.08 | learning rate: 8.835E-05 | global batch size: 256 | lm loss: 4.520932E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.807 | TFLOPs: 11.89 | 7: iteration 100910/ 173500 | consumed samples: 25832960 | consumed tokens: 52905902080 | elapsed time per iteration (s): 0.08 | learning rate: 8.833E-05 | global batch size: 256 | lm loss: 4.511805E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.183 | TFLOPs: 11.89 | 7: iteration 100920/ 173500 | consumed samples: 25835520 | consumed tokens: 52911144960 | elapsed time per iteration (s): 0.08 | learning rate: 8.832E-05 | global batch size: 256 | lm loss: 4.521492E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.860 | TFLOPs: 11.78 | 7: iteration 100930/ 173500 | consumed samples: 25838080 | consumed tokens: 52916387840 | elapsed time per iteration (s): 0.08 | learning rate: 8.830E-05 | global batch size: 256 | lm loss: 4.512965E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.371 | TFLOPs: 11.87 | 7: iteration 100940/ 173500 | consumed samples: 25840640 | consumed tokens: 52921630720 | elapsed time per iteration (s): 0.08 | learning rate: 8.829E-05 | global batch size: 256 | lm loss: 4.531329E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.796 | TFLOPs: 11.89 | 7: iteration 100950/ 173500 | consumed samples: 25843200 | consumed tokens: 52926873600 | elapsed time per iteration (s): 0.08 | learning rate: 8.827E-05 | global batch size: 256 | lm loss: 4.520299E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.055 | TFLOPs: 11.40 | 7: iteration 100960/ 173500 | consumed samples: 25845760 | consumed tokens: 52932116480 | elapsed time per iteration (s): 0.08 | learning rate: 8.825E-05 | global batch size: 256 | lm loss: 4.506474E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.764 | TFLOPs: 11.86 | 7: iteration 100970/ 173500 | consumed samples: 25848320 | consumed tokens: 52937359360 | elapsed time per iteration (s): 0.08 | learning rate: 8.824E-05 | global batch size: 256 | lm loss: 4.518441E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.586 | TFLOPs: 11.83 | 7: iteration 100980/ 173500 | consumed samples: 25850880 | consumed tokens: 52942602240 | elapsed time per iteration (s): 0.09 | learning rate: 8.822E-05 | global batch size: 256 | lm loss: 4.517268E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.464 | TFLOPs: 10.38 | 7: iteration 100990/ 173500 | consumed samples: 25853440 | consumed tokens: 52947845120 | elapsed time per iteration (s): 0.08 | learning rate: 8.821E-05 | global batch size: 256 | lm loss: 4.521901E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.583 | TFLOPs: 11.63 | 7: iteration 101000/ 173500 | consumed samples: 25856000 | consumed tokens: 52953088000 | elapsed time per iteration (s): 0.08 | learning rate: 8.819E-05 | global batch size: 256 | lm loss: 4.524295E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.247 | TFLOPs: 11.80 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 101000 | lm loss value: 4.383206E+00 | lm loss PPL: 8.009443E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 101000 to checkpoints_14m91b100m 0: [2023-03-17 02:41:16,021] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step101000 is begin to save! 0: [2023-03-17 02:41:16,026] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:41:16,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:41:16,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:41:16,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:41:16,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:41:16,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:41:16,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:41:16,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:41:16,061] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:41:16,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:41:16,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:41:16,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:41:16,065] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step101000/mp_rank_00_model_states.pt 0: [2023-03-17 02:41:16,065] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:41:16,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:41:16,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:41:16,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:41:16,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 4: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:41:16,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:41:16,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: successfully saved checkpoint at iteration 101000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.33 7: iteration 101010/ 173500 | consumed samples: 25858560 | consumed tokens: 52958330880 | elapsed time per iteration (s): 0.09 | learning rate: 8.817E-05 | global batch size: 256 | lm loss: 4.528289E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.510 | TFLOPs: 10.33 | 7: iteration 101020/ 173500 | consumed samples: 25861120 | consumed tokens: 52963573760 | elapsed time per iteration (s): 0.08 | learning rate: 8.816E-05 | global batch size: 256 | lm loss: 4.514152E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.071 | TFLOPs: 11.58 | 7: iteration 101030/ 173500 | consumed samples: 25863680 | consumed tokens: 52968816640 | elapsed time per iteration (s): 0.08 | learning rate: 8.814E-05 | global batch size: 256 | lm loss: 4.515326E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.181 | TFLOPs: 11.38 | 7: iteration 101040/ 173500 | consumed samples: 25866240 | consumed tokens: 52974059520 | elapsed time per iteration (s): 0.08 | learning rate: 8.813E-05 | global batch size: 256 | lm loss: 4.521540E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.563 | TFLOPs: 11.31 | 7: iteration 101050/ 173500 | consumed samples: 25868800 | consumed tokens: 52979302400 | elapsed time per iteration (s): 0.08 | learning rate: 8.811E-05 | global batch size: 256 | lm loss: 4.521922E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.299 | TFLOPs: 11.72 | 7: iteration 101060/ 173500 | consumed samples: 25871360 | consumed tokens: 52984545280 | elapsed time per iteration (s): 0.08 | learning rate: 8.810E-05 | global batch size: 256 | lm loss: 4.523804E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.741 | TFLOPs: 11.86 | 7: iteration 101070/ 173500 | consumed samples: 25873920 | consumed tokens: 52989788160 | elapsed time per iteration (s): 0.08 | learning rate: 8.808E-05 | global batch size: 256 | lm loss: 4.515314E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.160 | TFLOPs: 11.76 | 7: iteration 101080/ 173500 | consumed samples: 25876480 | consumed tokens: 52995031040 | elapsed time per iteration (s): 0.08 | learning rate: 8.806E-05 | global batch size: 256 | lm loss: 4.523279E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.782 | TFLOPs: 11.83 | 7: iteration 101090/ 173500 | consumed samples: 25879040 | consumed tokens: 53000273920 | elapsed time per iteration (s): 0.08 | learning rate: 8.805E-05 | global batch size: 256 | lm loss: 4.527536E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.213 | TFLOPs: 11.83 | 7: iteration 101100/ 173500 | consumed samples: 25881600 | consumed tokens: 53005516800 | elapsed time per iteration (s): 0.08 | learning rate: 8.803E-05 | global batch size: 256 | lm loss: 4.523228E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.241 | TFLOPs: 11.84 | 7: iteration 101110/ 173500 | consumed samples: 25884160 | consumed tokens: 53010759680 | elapsed time per iteration (s): 0.08 | learning rate: 8.802E-05 | global batch size: 256 | lm loss: 4.529994E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.859 | TFLOPs: 11.84 | 7: iteration 101120/ 173500 | consumed samples: 25886720 | consumed tokens: 53016002560 | elapsed time per iteration (s): 0.08 | learning rate: 8.800E-05 | global batch size: 256 | lm loss: 4.526128E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.773 | TFLOPs: 11.86 | 7: iteration 101130/ 173500 | consumed samples: 25889280 | consumed tokens: 53021245440 | elapsed time per iteration (s): 0.09 | learning rate: 8.798E-05 | global batch size: 256 | lm loss: 4.508152E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.756 | TFLOPs: 10.96 | 7: iteration 101140/ 173500 | consumed samples: 25891840 | consumed tokens: 53026488320 | elapsed time per iteration (s): 0.10 | learning rate: 8.797E-05 | global batch size: 256 | lm loss: 4.532747E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2467.321 | TFLOPs: 9.18 | 7: iteration 101150/ 173500 | consumed samples: 25894400 | consumed tokens: 53031731200 | elapsed time per iteration (s): 0.08 | learning rate: 8.795E-05 | global batch size: 256 | lm loss: 4.521428E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.221 | TFLOPs: 11.83 | 7: iteration 101160/ 173500 | consumed samples: 25896960 | consumed tokens: 53036974080 | elapsed time per iteration (s): 0.08 | learning rate: 8.794E-05 | global batch size: 256 | lm loss: 4.518981E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.201 | TFLOPs: 11.86 | 7: iteration 101170/ 173500 | consumed samples: 25899520 | consumed tokens: 53042216960 | elapsed time per iteration (s): 0.08 | learning rate: 8.792E-05 | global batch size: 256 | lm loss: 4.516837E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.515 | TFLOPs: 11.88 | 7: iteration 101180/ 173500 | consumed samples: 25902080 | consumed tokens: 53047459840 | elapsed time per iteration (s): 0.08 | learning rate: 8.790E-05 | global batch size: 256 | lm loss: 4.523366E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.573 | TFLOPs: 11.85 | 7: iteration 101190/ 173500 | consumed samples: 25904640 | consumed tokens: 53052702720 | elapsed time per iteration (s): 0.11 | learning rate: 8.789E-05 | global batch size: 256 | lm loss: 4.519636E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.582 | TFLOPs: 8.75 | 7: iteration 101200/ 173500 | consumed samples: 25907200 | consumed tokens: 53057945600 | elapsed time per iteration (s): 0.10 | learning rate: 8.787E-05 | global batch size: 256 | lm loss: 4.518610E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.373 | TFLOPs: 9.96 | 7: iteration 101210/ 173500 | consumed samples: 25909760 | consumed tokens: 53063188480 | elapsed time per iteration (s): 0.08 | learning rate: 8.786E-05 | global batch size: 256 | lm loss: 4.532322E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.552 | TFLOPs: 11.85 | 7: iteration 101220/ 173500 | consumed samples: 25912320 | consumed tokens: 53068431360 | elapsed time per iteration (s): 0.08 | learning rate: 8.784E-05 | global batch size: 256 | lm loss: 4.528601E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.193 | TFLOPs: 11.39 | 7: iteration 101230/ 173500 | consumed samples: 25914880 | consumed tokens: 53073674240 | elapsed time per iteration (s): 0.13 | learning rate: 8.782E-05 | global batch size: 256 | lm loss: 4.524950E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.337 | TFLOPs: 7.37 | 7: iteration 101240/ 173500 | consumed samples: 25917440 | consumed tokens: 53078917120 | elapsed time per iteration (s): 0.08 | learning rate: 8.781E-05 | global batch size: 256 | lm loss: 4.517031E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.545 | TFLOPs: 11.51 | 7: iteration 101250/ 173500 | consumed samples: 25920000 | consumed tokens: 53084160000 | elapsed time per iteration (s): 0.08 | learning rate: 8.779E-05 | global batch size: 256 | lm loss: 4.534789E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.243 | TFLOPs: 11.81 | 7: iteration 101260/ 173500 | consumed samples: 25922560 | consumed tokens: 53089402880 | elapsed time per iteration (s): 0.08 | learning rate: 8.778E-05 | global batch size: 256 | lm loss: 4.519065E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.432 | TFLOPs: 11.86 | 7: iteration 101270/ 173500 | consumed samples: 25925120 | consumed tokens: 53094645760 | elapsed time per iteration (s): 0.08 | learning rate: 8.776E-05 | global batch size: 256 | lm loss: 4.503884E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.824 | TFLOPs: 11.83 | 7: iteration 101280/ 173500 | consumed samples: 25927680 | consumed tokens: 53099888640 | elapsed time per iteration (s): 0.08 | learning rate: 8.774E-05 | global batch size: 256 | lm loss: 4.515324E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.885 | TFLOPs: 11.85 | 7: iteration 101290/ 173500 | consumed samples: 25930240 | consumed tokens: 53105131520 | elapsed time per iteration (s): 0.08 | learning rate: 8.773E-05 | global batch size: 256 | lm loss: 4.523454E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.172 | TFLOPs: 11.42 | 7: iteration 101300/ 173500 | consumed samples: 25932800 | consumed tokens: 53110374400 | elapsed time per iteration (s): 0.08 | learning rate: 8.771E-05 | global batch size: 256 | lm loss: 4.523325E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.195 | TFLOPs: 11.84 | 7: iteration 101310/ 173500 | consumed samples: 25935360 | consumed tokens: 53115617280 | elapsed time per iteration (s): 0.08 | learning rate: 8.770E-05 | global batch size: 256 | lm loss: 4.515199E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.725 | TFLOPs: 11.88 | 7: iteration 101320/ 173500 | consumed samples: 25937920 | consumed tokens: 53120860160 | elapsed time per iteration (s): 0.08 | learning rate: 8.768E-05 | global batch size: 256 | lm loss: 4.524989E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.092 | TFLOPs: 11.88 | 7: iteration 101330/ 173500 | consumed samples: 25940480 | consumed tokens: 53126103040 | elapsed time per iteration (s): 0.08 | learning rate: 8.766E-05 | global batch size: 256 | lm loss: 4.523585E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.781 | TFLOPs: 11.85 | 7: iteration 101340/ 173500 | consumed samples: 25943040 | consumed tokens: 53131345920 | elapsed time per iteration (s): 0.08 | learning rate: 8.765E-05 | global batch size: 256 | lm loss: 4.498254E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.675 | TFLOPs: 11.57 | 7: iteration 101350/ 173500 | consumed samples: 25945600 | consumed tokens: 53136588800 | elapsed time per iteration (s): 0.08 | learning rate: 8.763E-05 | global batch size: 256 | lm loss: 4.516047E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.234 | TFLOPs: 11.88 | 7: iteration 101360/ 173500 | consumed samples: 25948160 | consumed tokens: 53141831680 | elapsed time per iteration (s): 0.08 | learning rate: 8.762E-05 | global batch size: 256 | lm loss: 4.520359E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.165 | TFLOPs: 11.82 | 7: iteration 101370/ 173500 | consumed samples: 25950720 | consumed tokens: 53147074560 | elapsed time per iteration (s): 0.08 | learning rate: 8.760E-05 | global batch size: 256 | lm loss: 4.526668E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.228 | TFLOPs: 11.80 | 7: iteration 101380/ 173500 | consumed samples: 25953280 | consumed tokens: 53152317440 | elapsed time per iteration (s): 0.09 | learning rate: 8.758E-05 | global batch size: 256 | lm loss: 4.518201E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.262 | TFLOPs: 11.12 | 7: iteration 101390/ 173500 | consumed samples: 25955840 | consumed tokens: 53157560320 | elapsed time per iteration (s): 0.08 | learning rate: 8.757E-05 | global batch size: 256 | lm loss: 4.526860E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.048 | TFLOPs: 11.54 | 7: iteration 101400/ 173500 | consumed samples: 25958400 | consumed tokens: 53162803200 | elapsed time per iteration (s): 0.08 | learning rate: 8.755E-05 | global batch size: 256 | lm loss: 4.525353E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.245 | TFLOPs: 11.31 | 7: iteration 101410/ 173500 | consumed samples: 25960960 | consumed tokens: 53168046080 | elapsed time per iteration (s): 0.08 | learning rate: 8.754E-05 | global batch size: 256 | lm loss: 4.515486E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.948 | TFLOPs: 11.74 | 7: iteration 101420/ 173500 | consumed samples: 25963520 | consumed tokens: 53173288960 | elapsed time per iteration (s): 0.08 | learning rate: 8.752E-05 | global batch size: 256 | lm loss: 4.515757E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.421 | TFLOPs: 11.86 | 7: iteration 101430/ 173500 | consumed samples: 25966080 | consumed tokens: 53178531840 | elapsed time per iteration (s): 0.08 | learning rate: 8.750E-05 | global batch size: 256 | lm loss: 4.508948E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.615 | TFLOPs: 11.79 | 7: iteration 101440/ 173500 | consumed samples: 25968640 | consumed tokens: 53183774720 | elapsed time per iteration (s): 0.09 | learning rate: 8.749E-05 | global batch size: 256 | lm loss: 4.515087E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.825 | TFLOPs: 10.82 | 7: iteration 101450/ 173500 | consumed samples: 25971200 | consumed tokens: 53189017600 | elapsed time per iteration (s): 0.14 | learning rate: 8.747E-05 | global batch size: 256 | lm loss: 4.525571E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1878.380 | TFLOPs: 6.99 | 7: iteration 101460/ 173500 | consumed samples: 25973760 | consumed tokens: 53194260480 | elapsed time per iteration (s): 0.09 | learning rate: 8.746E-05 | global batch size: 256 | lm loss: 4.522790E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2730.132 | TFLOPs: 10.15 | 7: iteration 101470/ 173500 | consumed samples: 25976320 | consumed tokens: 53199503360 | elapsed time per iteration (s): 0.08 | learning rate: 8.744E-05 | global batch size: 256 | lm loss: 4.535313E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.588 | TFLOPs: 11.95 | 7: iteration 101480/ 173500 | consumed samples: 25978880 | consumed tokens: 53204746240 | elapsed time per iteration (s): 0.08 | learning rate: 8.743E-05 | global batch size: 256 | lm loss: 4.531133E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.574 | TFLOPs: 11.88 | 7: iteration 101490/ 173500 | consumed samples: 25981440 | consumed tokens: 53209989120 | elapsed time per iteration (s): 0.08 | learning rate: 8.741E-05 | global batch size: 256 | lm loss: 4.517651E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.723 | TFLOPs: 11.69 | 7: iteration 101500/ 173500 | consumed samples: 25984000 | consumed tokens: 53215232000 | elapsed time per iteration (s): 0.08 | learning rate: 8.739E-05 | global batch size: 256 | lm loss: 4.519649E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.080 | TFLOPs: 11.96 | 7: iteration 101510/ 173500 | consumed samples: 25986560 | consumed tokens: 53220474880 | elapsed time per iteration (s): 0.08 | learning rate: 8.738E-05 | global batch size: 256 | lm loss: 4.515884E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.991 | TFLOPs: 11.37 | 7: iteration 101520/ 173500 | consumed samples: 25989120 | consumed tokens: 53225717760 | elapsed time per iteration (s): 0.08 | learning rate: 8.736E-05 | global batch size: 256 | lm loss: 4.517123E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.467 | TFLOPs: 11.93 | 7: iteration 101530/ 173500 | consumed samples: 25991680 | consumed tokens: 53230960640 | elapsed time per iteration (s): 0.08 | learning rate: 8.735E-05 | global batch size: 256 | lm loss: 4.515556E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.291 | TFLOPs: 11.92 | 7: iteration 101540/ 173500 | consumed samples: 25994240 | consumed tokens: 53236203520 | elapsed time per iteration (s): 0.08 | learning rate: 8.733E-05 | global batch size: 256 | lm loss: 4.526186E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.972 | TFLOPs: 11.92 | 7: iteration 101550/ 173500 | consumed samples: 25996800 | consumed tokens: 53241446400 | elapsed time per iteration (s): 0.08 | learning rate: 8.731E-05 | global batch size: 256 | lm loss: 4.508566E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.072 | TFLOPs: 11.85 | 7: iteration 101560/ 173500 | consumed samples: 25999360 | consumed tokens: 53246689280 | elapsed time per iteration (s): 0.08 | learning rate: 8.730E-05 | global batch size: 256 | lm loss: 4.524797E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.492 | TFLOPs: 11.92 | 7: iteration 101570/ 173500 | consumed samples: 26001920 | consumed tokens: 53251932160 | elapsed time per iteration (s): 0.08 | learning rate: 8.728E-05 | global batch size: 256 | lm loss: 4.525789E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.568 | TFLOPs: 11.93 | 7: iteration 101580/ 173500 | consumed samples: 26004480 | consumed tokens: 53257175040 | elapsed time per iteration (s): 0.08 | learning rate: 8.727E-05 | global batch size: 256 | lm loss: 4.523169E+00 | grad norm: 0.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.756 | TFLOPs: 11.91 | 7: iteration 101590/ 173500 | consumed samples: 26007040 | consumed tokens: 53262417920 | elapsed time per iteration (s): 0.08 | learning rate: 8.725E-05 | global batch size: 256 | lm loss: 4.512185E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.008 | TFLOPs: 11.81 | 7: iteration 101600/ 173500 | consumed samples: 26009600 | consumed tokens: 53267660800 | elapsed time per iteration (s): 0.08 | learning rate: 8.723E-05 | global batch size: 256 | lm loss: 4.515289E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.942 | TFLOPs: 11.89 | 7: iteration 101610/ 173500 | consumed samples: 26012160 | consumed tokens: 53272903680 | elapsed time per iteration (s): 0.08 | learning rate: 8.722E-05 | global batch size: 256 | lm loss: 4.519529E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.411 | TFLOPs: 11.93 | 7: iteration 101620/ 173500 | consumed samples: 26014720 | consumed tokens: 53278146560 | elapsed time per iteration (s): 0.08 | learning rate: 8.720E-05 | global batch size: 256 | lm loss: 4.513545E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.046 | TFLOPs: 11.88 | 7: iteration 101630/ 173500 | consumed samples: 26017280 | consumed tokens: 53283389440 | elapsed time per iteration (s): 0.11 | learning rate: 8.719E-05 | global batch size: 256 | lm loss: 4.520362E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2334.021 | TFLOPs: 8.68 | 7: iteration 101640/ 173500 | consumed samples: 26019840 | consumed tokens: 53288632320 | elapsed time per iteration (s): 0.09 | learning rate: 8.717E-05 | global batch size: 256 | lm loss: 4.508385E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2818.071 | TFLOPs: 10.48 | 7: iteration 101650/ 173500 | consumed samples: 26022400 | consumed tokens: 53293875200 | elapsed time per iteration (s): 0.08 | learning rate: 8.715E-05 | global batch size: 256 | lm loss: 4.525006E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.576 | TFLOPs: 11.28 | 7: iteration 101660/ 173500 | consumed samples: 26024960 | consumed tokens: 53299118080 | elapsed time per iteration (s): 0.09 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 4.523374E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.984 | TFLOPs: 10.45 | 7: iteration 101670/ 173500 | consumed samples: 26027520 | consumed tokens: 53304360960 | elapsed time per iteration (s): 0.08 | learning rate: 8.712E-05 | global batch size: 256 | lm loss: 4.514145E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.674 | TFLOPs: 11.40 | 7: iteration 101680/ 173500 | consumed samples: 26030080 | consumed tokens: 53309603840 | elapsed time per iteration (s): 0.10 | learning rate: 8.711E-05 | global batch size: 256 | lm loss: 4.518535E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.791 | TFLOPs: 9.61 | 7: iteration 101690/ 173500 | consumed samples: 26032640 | consumed tokens: 53314846720 | elapsed time per iteration (s): 0.08 | learning rate: 8.709E-05 | global batch size: 256 | lm loss: 4.527682E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.269 | TFLOPs: 11.78 | 7: iteration 101700/ 173500 | consumed samples: 26035200 | consumed tokens: 53320089600 | elapsed time per iteration (s): 0.08 | learning rate: 8.707E-05 | global batch size: 256 | lm loss: 4.512956E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.698 | TFLOPs: 11.86 | 7: iteration 101710/ 173500 | consumed samples: 26037760 | consumed tokens: 53325332480 | elapsed time per iteration (s): 0.08 | learning rate: 8.706E-05 | global batch size: 256 | lm loss: 4.519720E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.299 | TFLOPs: 11.86 | 7: iteration 101720/ 173500 | consumed samples: 26040320 | consumed tokens: 53330575360 | elapsed time per iteration (s): 0.08 | learning rate: 8.704E-05 | global batch size: 256 | lm loss: 4.513739E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.545 | TFLOPs: 11.92 | 7: iteration 101730/ 173500 | consumed samples: 26042880 | consumed tokens: 53335818240 | elapsed time per iteration (s): 0.10 | learning rate: 8.703E-05 | global batch size: 256 | lm loss: 4.524220E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.505 | TFLOPs: 9.38 | 7: iteration 101740/ 173500 | consumed samples: 26045440 | consumed tokens: 53341061120 | elapsed time per iteration (s): 0.11 | learning rate: 8.701E-05 | global batch size: 256 | lm loss: 4.514921E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.188 | TFLOPs: 8.53 | 7: iteration 101750/ 173500 | consumed samples: 26048000 | consumed tokens: 53346304000 | elapsed time per iteration (s): 0.10 | learning rate: 8.700E-05 | global batch size: 256 | lm loss: 4.516985E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.227 | TFLOPs: 9.22 | 7: iteration 101760/ 173500 | consumed samples: 26050560 | consumed tokens: 53351546880 | elapsed time per iteration (s): 0.08 | learning rate: 8.698E-05 | global batch size: 256 | lm loss: 4.523031E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.051 | TFLOPs: 11.94 | 7: iteration 101770/ 173500 | consumed samples: 26053120 | consumed tokens: 53356789760 | elapsed time per iteration (s): 0.08 | learning rate: 8.696E-05 | global batch size: 256 | lm loss: 4.525607E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.720 | TFLOPs: 11.96 | 7: iteration 101780/ 173500 | consumed samples: 26055680 | consumed tokens: 53362032640 | elapsed time per iteration (s): 0.08 | learning rate: 8.695E-05 | global batch size: 256 | lm loss: 4.524883E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.928 | TFLOPs: 11.88 | 7: iteration 101790/ 173500 | consumed samples: 26058240 | consumed tokens: 53367275520 | elapsed time per iteration (s): 0.08 | learning rate: 8.693E-05 | global batch size: 256 | lm loss: 4.523442E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.876 | TFLOPs: 11.94 | 7: iteration 101800/ 173500 | consumed samples: 26060800 | consumed tokens: 53372518400 | elapsed time per iteration (s): 0.08 | learning rate: 8.692E-05 | global batch size: 256 | lm loss: 4.519030E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.011 | TFLOPs: 11.92 | 7: iteration 101810/ 173500 | consumed samples: 26063360 | consumed tokens: 53377761280 | elapsed time per iteration (s): 0.08 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 4.525565E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.218 | TFLOPs: 11.94 | 7: iteration 101820/ 173500 | consumed samples: 26065920 | consumed tokens: 53383004160 | elapsed time per iteration (s): 0.08 | learning rate: 8.688E-05 | global batch size: 256 | lm loss: 4.518960E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.123 | TFLOPs: 11.96 | 7: iteration 101830/ 173500 | consumed samples: 26068480 | consumed tokens: 53388247040 | elapsed time per iteration (s): 0.08 | learning rate: 8.687E-05 | global batch size: 256 | lm loss: 4.515253E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.649 | TFLOPs: 11.93 | 7: iteration 101840/ 173500 | consumed samples: 26071040 | consumed tokens: 53393489920 | elapsed time per iteration (s): 0.08 | learning rate: 8.685E-05 | global batch size: 256 | lm loss: 4.514799E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.790 | TFLOPs: 11.94 | 7: iteration 101850/ 173500 | consumed samples: 26073600 | consumed tokens: 53398732800 | elapsed time per iteration (s): 0.08 | learning rate: 8.684E-05 | global batch size: 256 | lm loss: 4.515324E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.965 | TFLOPs: 11.81 | 7: iteration 101860/ 173500 | consumed samples: 26076160 | consumed tokens: 53403975680 | elapsed time per iteration (s): 0.08 | learning rate: 8.682E-05 | global batch size: 256 | lm loss: 4.520217E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.419 | TFLOPs: 11.95 | 7: iteration 101870/ 173500 | consumed samples: 26078720 | consumed tokens: 53409218560 | elapsed time per iteration (s): 0.08 | learning rate: 8.680E-05 | global batch size: 256 | lm loss: 4.527110E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.667 | TFLOPs: 11.87 | 7: iteration 101880/ 173500 | consumed samples: 26081280 | consumed tokens: 53414461440 | elapsed time per iteration (s): 0.08 | learning rate: 8.679E-05 | global batch size: 256 | lm loss: 4.514118E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.045 | TFLOPs: 11.81 | 7: iteration 101890/ 173500 | consumed samples: 26083840 | consumed tokens: 53419704320 | elapsed time per iteration (s): 0.08 | learning rate: 8.677E-05 | global batch size: 256 | lm loss: 4.522297E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.631 | TFLOPs: 11.89 | 7: iteration 101900/ 173500 | consumed samples: 26086400 | consumed tokens: 53424947200 | elapsed time per iteration (s): 0.08 | learning rate: 8.676E-05 | global batch size: 256 | lm loss: 4.518941E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.242 | TFLOPs: 11.95 | 7: iteration 101910/ 173500 | consumed samples: 26088960 | consumed tokens: 53430190080 | elapsed time per iteration (s): 0.10 | learning rate: 8.674E-05 | global batch size: 256 | lm loss: 4.520340E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.000 | TFLOPs: 9.79 | 7: iteration 101920/ 173500 | consumed samples: 26091520 | consumed tokens: 53435432960 | elapsed time per iteration (s): 0.11 | learning rate: 8.672E-05 | global batch size: 256 | lm loss: 4.515078E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.940 | TFLOPs: 8.63 | 7: iteration 101930/ 173500 | consumed samples: 26094080 | consumed tokens: 53440675840 | elapsed time per iteration (s): 0.11 | learning rate: 8.671E-05 | global batch size: 256 | lm loss: 4.522128E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.901 | TFLOPs: 8.75 | 7: iteration 101940/ 173500 | consumed samples: 26096640 | consumed tokens: 53445918720 | elapsed time per iteration (s): 0.11 | learning rate: 8.669E-05 | global batch size: 256 | lm loss: 4.530398E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2434.987 | TFLOPs: 9.06 | 7: iteration 101950/ 173500 | consumed samples: 26099200 | consumed tokens: 53451161600 | elapsed time per iteration (s): 0.11 | learning rate: 8.668E-05 | global batch size: 256 | lm loss: 4.528496E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2367.311 | TFLOPs: 8.81 | 7: iteration 101960/ 173500 | consumed samples: 26101760 | consumed tokens: 53456404480 | elapsed time per iteration (s): 0.11 | learning rate: 8.666E-05 | global batch size: 256 | lm loss: 4.539191E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2303.702 | TFLOPs: 8.57 | 7: iteration 101970/ 173500 | consumed samples: 26104320 | consumed tokens: 53461647360 | elapsed time per iteration (s): 0.11 | learning rate: 8.665E-05 | global batch size: 256 | lm loss: 4.529647E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2294.306 | TFLOPs: 8.53 | 7: iteration 101980/ 173500 | consumed samples: 26106880 | consumed tokens: 53466890240 | elapsed time per iteration (s): 0.11 | learning rate: 8.663E-05 | global batch size: 256 | lm loss: 4.523177E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2320.084 | TFLOPs: 8.63 | 7: iteration 101990/ 173500 | consumed samples: 26109440 | consumed tokens: 53472133120 | elapsed time per iteration (s): 0.11 | learning rate: 8.661E-05 | global batch size: 256 | lm loss: 4.511202E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2345.230 | TFLOPs: 8.72 | 0: [2023-03-17 02:42:43,006] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=0, lr=[8.659751165175261e-05, 8.659751165175261e-05, 8.659751165175261e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 102000/ 173500 | consumed samples: 26112000 | consumed tokens: 53477376000 | elapsed time per iteration (s): 0.11 | learning rate: 8.660E-05 | global batch size: 256 | lm loss: 4.520670E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2308.771 | TFLOPs: 8.59 | 0: steps: 102000 loss: 4.5209 iter time (s): 0.089 samples/sec: 2878.827 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 102000 | lm loss value: 4.392929E+00 | lm loss PPL: 8.087697E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 102000 to checkpoints_14m91b100m 0: [2023-03-17 02:42:43,087] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step102000 is begin to save! 0: [2023-03-17 02:42:43,090] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:42:43,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:42:43,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:42:43,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:42:43,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:42:43,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:42:43,122] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:42:43,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:42:43,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:42:43,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:42:43,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:42:43,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:42:43,129] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step102000/mp_rank_00_model_states.pt 0: [2023-03-17 02:42:43,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:42:43,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:42:43,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:42:43,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:42:43,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2023-03-17 02:42:43,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: successfully saved checkpoint at iteration 102000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 87.53 7: iteration 102010/ 173500 | consumed samples: 26114560 | consumed tokens: 53482618880 | elapsed time per iteration (s): 0.12 | learning rate: 8.658E-05 | global batch size: 256 | lm loss: 4.517447E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.541 | TFLOPs: 7.75 | 7: iteration 102020/ 173500 | consumed samples: 26117120 | consumed tokens: 53487861760 | elapsed time per iteration (s): 0.11 | learning rate: 8.657E-05 | global batch size: 256 | lm loss: 4.527785E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.472 | TFLOPs: 8.56 | 7: iteration 102030/ 173500 | consumed samples: 26119680 | consumed tokens: 53493104640 | elapsed time per iteration (s): 0.11 | learning rate: 8.655E-05 | global batch size: 256 | lm loss: 4.516716E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.503 | TFLOPs: 8.95 | 7: iteration 102040/ 173500 | consumed samples: 26122240 | consumed tokens: 53498347520 | elapsed time per iteration (s): 0.09 | learning rate: 8.653E-05 | global batch size: 256 | lm loss: 4.509961E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2811.272 | TFLOPs: 10.46 | 7: iteration 102050/ 173500 | consumed samples: 26124800 | consumed tokens: 53503590400 | elapsed time per iteration (s): 0.08 | learning rate: 8.652E-05 | global batch size: 256 | lm loss: 4.528380E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.422 | TFLOPs: 11.55 | 7: iteration 102060/ 173500 | consumed samples: 26127360 | consumed tokens: 53508833280 | elapsed time per iteration (s): 0.08 | learning rate: 8.650E-05 | global batch size: 256 | lm loss: 4.515433E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.666 | TFLOPs: 11.89 | 7: iteration 102070/ 173500 | consumed samples: 26129920 | consumed tokens: 53514076160 | elapsed time per iteration (s): 0.08 | learning rate: 8.649E-05 | global batch size: 256 | lm loss: 4.515144E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.952 | TFLOPs: 11.49 | 7: iteration 102080/ 173500 | consumed samples: 26132480 | consumed tokens: 53519319040 | elapsed time per iteration (s): 0.11 | learning rate: 8.647E-05 | global batch size: 256 | lm loss: 4.519356E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.714 | TFLOPs: 8.78 | 7: iteration 102090/ 173500 | consumed samples: 26135040 | consumed tokens: 53524561920 | elapsed time per iteration (s): 0.11 | learning rate: 8.645E-05 | global batch size: 256 | lm loss: 4.535838E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.788 | TFLOPs: 8.85 | 7: iteration 102100/ 173500 | consumed samples: 26137600 | consumed tokens: 53529804800 | elapsed time per iteration (s): 0.11 | learning rate: 8.644E-05 | global batch size: 256 | lm loss: 4.515331E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2355.207 | TFLOPs: 8.76 | 7: iteration 102110/ 173500 | consumed samples: 26140160 | consumed tokens: 53535047680 | elapsed time per iteration (s): 0.13 | learning rate: 8.642E-05 | global batch size: 256 | lm loss: 4.519356E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1976.195 | TFLOPs: 7.35 | 7: iteration 102120/ 173500 | consumed samples: 26142720 | consumed tokens: 53540290560 | elapsed time per iteration (s): 0.11 | learning rate: 8.641E-05 | global batch size: 256 | lm loss: 4.523149E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2324.468 | TFLOPs: 8.65 | 7: iteration 102130/ 173500 | consumed samples: 26145280 | consumed tokens: 53545533440 | elapsed time per iteration (s): 0.11 | learning rate: 8.639E-05 | global batch size: 256 | lm loss: 4.532408E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.588 | TFLOPs: 8.46 | 7: iteration 102140/ 173500 | consumed samples: 26147840 | consumed tokens: 53550776320 | elapsed time per iteration (s): 0.11 | learning rate: 8.638E-05 | global batch size: 256 | lm loss: 4.516311E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2354.721 | TFLOPs: 8.76 | 7: iteration 102150/ 173500 | consumed samples: 26150400 | consumed tokens: 53556019200 | elapsed time per iteration (s): 0.11 | learning rate: 8.636E-05 | global batch size: 256 | lm loss: 4.515816E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.196 | TFLOPs: 9.05 | 7: iteration 102160/ 173500 | consumed samples: 26152960 | consumed tokens: 53561262080 | elapsed time per iteration (s): 0.11 | learning rate: 8.634E-05 | global batch size: 256 | lm loss: 4.520583E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.222 | TFLOPs: 8.75 | 7: iteration 102170/ 173500 | consumed samples: 26155520 | consumed tokens: 53566504960 | elapsed time per iteration (s): 0.11 | learning rate: 8.633E-05 | global batch size: 256 | lm loss: 4.522838E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.094 | TFLOPs: 8.41 | 7: iteration 102180/ 173500 | consumed samples: 26158080 | consumed tokens: 53571747840 | elapsed time per iteration (s): 0.11 | learning rate: 8.631E-05 | global batch size: 256 | lm loss: 4.513201E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2320.608 | TFLOPs: 8.63 | 7: iteration 102190/ 173500 | consumed samples: 26160640 | consumed tokens: 53576990720 | elapsed time per iteration (s): 0.11 | learning rate: 8.630E-05 | global batch size: 256 | lm loss: 4.521938E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2343.346 | TFLOPs: 8.72 | 7: iteration 102200/ 173500 | consumed samples: 26163200 | consumed tokens: 53582233600 | elapsed time per iteration (s): 0.11 | learning rate: 8.628E-05 | global batch size: 256 | lm loss: 4.518851E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2354.219 | TFLOPs: 8.76 | 7: iteration 102210/ 173500 | consumed samples: 26165760 | consumed tokens: 53587476480 | elapsed time per iteration (s): 0.10 | learning rate: 8.626E-05 | global batch size: 256 | lm loss: 4.528263E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.534 | TFLOPs: 9.42 | 7: iteration 102220/ 173500 | consumed samples: 26168320 | consumed tokens: 53592719360 | elapsed time per iteration (s): 0.08 | learning rate: 8.625E-05 | global batch size: 256 | lm loss: 4.535301E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.452 | TFLOPs: 11.66 | 7: iteration 102230/ 173500 | consumed samples: 26170880 | consumed tokens: 53597962240 | elapsed time per iteration (s): 0.08 | learning rate: 8.623E-05 | global batch size: 256 | lm loss: 4.508376E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.810 | TFLOPs: 11.78 | 7: iteration 102240/ 173500 | consumed samples: 26173440 | consumed tokens: 53603205120 | elapsed time per iteration (s): 0.08 | learning rate: 8.622E-05 | global batch size: 256 | lm loss: 4.518688E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.241 | TFLOPs: 11.66 | 7: iteration 102250/ 173500 | consumed samples: 26176000 | consumed tokens: 53608448000 | elapsed time per iteration (s): 0.08 | learning rate: 8.620E-05 | global batch size: 256 | lm loss: 4.513873E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.758 | TFLOPs: 11.96 | 7: iteration 102260/ 173500 | consumed samples: 26178560 | consumed tokens: 53613690880 | elapsed time per iteration (s): 0.08 | learning rate: 8.618E-05 | global batch size: 256 | lm loss: 4.524708E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.580 | TFLOPs: 11.96 | 7: iteration 102270/ 173500 | consumed samples: 26181120 | consumed tokens: 53618933760 | elapsed time per iteration (s): 0.08 | learning rate: 8.617E-05 | global batch size: 256 | lm loss: 4.504694E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.364 | TFLOPs: 11.92 | 7: iteration 102280/ 173500 | consumed samples: 26183680 | consumed tokens: 53624176640 | elapsed time per iteration (s): 0.08 | learning rate: 8.615E-05 | global batch size: 256 | lm loss: 4.519405E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.908 | TFLOPs: 11.44 | 7: iteration 102290/ 173500 | consumed samples: 26186240 | consumed tokens: 53629419520 | elapsed time per iteration (s): 0.08 | learning rate: 8.614E-05 | global batch size: 256 | lm loss: 4.503876E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.333 | TFLOPs: 11.94 | 7: iteration 102300/ 173500 | consumed samples: 26188800 | consumed tokens: 53634662400 | elapsed time per iteration (s): 0.08 | learning rate: 8.612E-05 | global batch size: 256 | lm loss: 4.519843E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.028 | TFLOPs: 11.95 | 7: iteration 102310/ 173500 | consumed samples: 26191360 | consumed tokens: 53639905280 | elapsed time per iteration (s): 0.08 | learning rate: 8.611E-05 | global batch size: 256 | lm loss: 4.511928E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.615 | TFLOPs: 11.95 | 7: iteration 102320/ 173500 | consumed samples: 26193920 | consumed tokens: 53645148160 | elapsed time per iteration (s): 0.08 | learning rate: 8.609E-05 | global batch size: 256 | lm loss: 4.520008E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.373 | TFLOPs: 11.92 | 7: iteration 102330/ 173500 | consumed samples: 26196480 | consumed tokens: 53650391040 | elapsed time per iteration (s): 0.08 | learning rate: 8.607E-05 | global batch size: 256 | lm loss: 4.508179E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.766 | TFLOPs: 11.51 | 7: iteration 102340/ 173500 | consumed samples: 26199040 | consumed tokens: 53655633920 | elapsed time per iteration (s): 0.12 | learning rate: 8.606E-05 | global batch size: 256 | lm loss: 4.536129E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.123 | TFLOPs: 7.80 | 7: iteration 102350/ 173500 | consumed samples: 26201600 | consumed tokens: 53660876800 | elapsed time per iteration (s): 0.12 | learning rate: 8.604E-05 | global batch size: 256 | lm loss: 4.521071E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.235 | TFLOPs: 7.82 | 7: iteration 102360/ 173500 | consumed samples: 26204160 | consumed tokens: 53666119680 | elapsed time per iteration (s): 0.12 | learning rate: 8.603E-05 | global batch size: 256 | lm loss: 4.527594E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2070.272 | TFLOPs: 7.70 | 7: iteration 102370/ 173500 | consumed samples: 26206720 | consumed tokens: 53671362560 | elapsed time per iteration (s): 0.18 | learning rate: 8.601E-05 | global batch size: 256 | lm loss: 4.515820E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1440.504 | TFLOPs: 5.36 | 7: iteration 102380/ 173500 | consumed samples: 26209280 | consumed tokens: 53676605440 | elapsed time per iteration (s): 0.12 | learning rate: 8.599E-05 | global batch size: 256 | lm loss: 4.512450E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.539 | TFLOPs: 8.04 | 7: iteration 102390/ 173500 | consumed samples: 26211840 | consumed tokens: 53681848320 | elapsed time per iteration (s): 0.12 | learning rate: 8.598E-05 | global batch size: 256 | lm loss: 4.521723E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.306 | TFLOPs: 7.70 | 7: iteration 102400/ 173500 | consumed samples: 26214400 | consumed tokens: 53687091200 | elapsed time per iteration (s): 0.12 | learning rate: 8.596E-05 | global batch size: 256 | lm loss: 4.525171E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.934 | TFLOPs: 8.21 | 7: iteration 102410/ 173500 | consumed samples: 26216960 | consumed tokens: 53692334080 | elapsed time per iteration (s): 0.11 | learning rate: 8.595E-05 | global batch size: 256 | lm loss: 4.519061E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.706 | TFLOPs: 8.53 | 7: iteration 102420/ 173500 | consumed samples: 26219520 | consumed tokens: 53697576960 | elapsed time per iteration (s): 0.13 | learning rate: 8.593E-05 | global batch size: 256 | lm loss: 4.518044E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.513 | TFLOPs: 7.58 | 7: iteration 102430/ 173500 | consumed samples: 26222080 | consumed tokens: 53702819840 | elapsed time per iteration (s): 0.13 | learning rate: 8.591E-05 | global batch size: 256 | lm loss: 4.513454E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.338 | TFLOPs: 7.58 | 7: iteration 102440/ 173500 | consumed samples: 26224640 | consumed tokens: 53708062720 | elapsed time per iteration (s): 0.12 | learning rate: 8.590E-05 | global batch size: 256 | lm loss: 4.523072E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.192 | TFLOPs: 7.80 | 7: iteration 102450/ 173500 | consumed samples: 26227200 | consumed tokens: 53713305600 | elapsed time per iteration (s): 0.12 | learning rate: 8.588E-05 | global batch size: 256 | lm loss: 4.522157E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2161.745 | TFLOPs: 8.04 | 7: iteration 102460/ 173500 | consumed samples: 26229760 | consumed tokens: 53718548480 | elapsed time per iteration (s): 0.12 | learning rate: 8.587E-05 | global batch size: 256 | lm loss: 4.524356E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.489 | TFLOPs: 8.21 | 7: iteration 102470/ 173500 | consumed samples: 26232320 | consumed tokens: 53723791360 | elapsed time per iteration (s): 0.13 | learning rate: 8.585E-05 | global batch size: 256 | lm loss: 4.526261E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.190 | TFLOPs: 7.53 | 7: iteration 102480/ 173500 | consumed samples: 26234880 | consumed tokens: 53729034240 | elapsed time per iteration (s): 0.11 | learning rate: 8.584E-05 | global batch size: 256 | lm loss: 4.522202E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2409.193 | TFLOPs: 8.96 | 7: iteration 102490/ 173500 | consumed samples: 26237440 | consumed tokens: 53734277120 | elapsed time per iteration (s): 0.09 | learning rate: 8.582E-05 | global batch size: 256 | lm loss: 4.514866E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.460 | TFLOPs: 10.76 | 7: iteration 102500/ 173500 | consumed samples: 26240000 | consumed tokens: 53739520000 | elapsed time per iteration (s): 0.11 | learning rate: 8.580E-05 | global batch size: 256 | lm loss: 4.511141E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2407.631 | TFLOPs: 8.96 | 7: iteration 102510/ 173500 | consumed samples: 26242560 | consumed tokens: 53744762880 | elapsed time per iteration (s): 0.11 | learning rate: 8.579E-05 | global batch size: 256 | lm loss: 4.511078E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.464 | TFLOPs: 8.69 | 7: iteration 102520/ 173500 | consumed samples: 26245120 | consumed tokens: 53750005760 | elapsed time per iteration (s): 0.11 | learning rate: 8.577E-05 | global batch size: 256 | lm loss: 4.523243E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2369.206 | TFLOPs: 8.81 | 7: iteration 102530/ 173500 | consumed samples: 26247680 | consumed tokens: 53755248640 | elapsed time per iteration (s): 0.11 | learning rate: 8.576E-05 | global batch size: 256 | lm loss: 4.517038E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2398.130 | TFLOPs: 8.92 | 7: iteration 102540/ 173500 | consumed samples: 26250240 | consumed tokens: 53760491520 | elapsed time per iteration (s): 0.11 | learning rate: 8.574E-05 | global batch size: 256 | lm loss: 4.523080E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2423.933 | TFLOPs: 9.02 | 7: iteration 102550/ 173500 | consumed samples: 26252800 | consumed tokens: 53765734400 | elapsed time per iteration (s): 0.09 | learning rate: 8.572E-05 | global batch size: 256 | lm loss: 4.519056E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.622 | TFLOPs: 10.29 | 7: iteration 102560/ 173500 | consumed samples: 26255360 | consumed tokens: 53770977280 | elapsed time per iteration (s): 0.08 | learning rate: 8.571E-05 | global batch size: 256 | lm loss: 4.501648E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.192 | TFLOPs: 11.74 | 7: iteration 102570/ 173500 | consumed samples: 26257920 | consumed tokens: 53776220160 | elapsed time per iteration (s): 0.08 | learning rate: 8.569E-05 | global batch size: 256 | lm loss: 4.516106E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.912 | TFLOPs: 11.99 | 7: iteration 102580/ 173500 | consumed samples: 26260480 | consumed tokens: 53781463040 | elapsed time per iteration (s): 0.08 | learning rate: 8.568E-05 | global batch size: 256 | lm loss: 4.523344E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.235 | TFLOPs: 11.89 | 7: iteration 102590/ 173500 | consumed samples: 26263040 | consumed tokens: 53786705920 | elapsed time per iteration (s): 0.08 | learning rate: 8.566E-05 | global batch size: 256 | lm loss: 4.515729E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.723 | TFLOPs: 11.30 | 7: iteration 102600/ 173500 | consumed samples: 26265600 | consumed tokens: 53791948800 | elapsed time per iteration (s): 0.08 | learning rate: 8.565E-05 | global batch size: 256 | lm loss: 4.523882E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.526 | TFLOPs: 11.79 | 7: iteration 102610/ 173500 | consumed samples: 26268160 | consumed tokens: 53797191680 | elapsed time per iteration (s): 0.08 | learning rate: 8.563E-05 | global batch size: 256 | lm loss: 4.520951E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.421 | TFLOPs: 11.94 | 7: iteration 102620/ 173500 | consumed samples: 26270720 | consumed tokens: 53802434560 | elapsed time per iteration (s): 0.08 | learning rate: 8.561E-05 | global batch size: 256 | lm loss: 4.515828E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.396 | TFLOPs: 11.90 | 7: iteration 102630/ 173500 | consumed samples: 26273280 | consumed tokens: 53807677440 | elapsed time per iteration (s): 0.08 | learning rate: 8.560E-05 | global batch size: 256 | lm loss: 4.509599E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.634 | TFLOPs: 11.91 | 7: iteration 102640/ 173500 | consumed samples: 26275840 | consumed tokens: 53812920320 | elapsed time per iteration (s): 0.08 | learning rate: 8.558E-05 | global batch size: 256 | lm loss: 4.521527E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.263 | TFLOPs: 11.96 | 7: iteration 102650/ 173500 | consumed samples: 26278400 | consumed tokens: 53818163200 | elapsed time per iteration (s): 0.08 | learning rate: 8.557E-05 | global batch size: 256 | lm loss: 4.533987E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.602 | TFLOPs: 11.88 | 7: iteration 102660/ 173500 | consumed samples: 26280960 | consumed tokens: 53823406080 | elapsed time per iteration (s): 0.08 | learning rate: 8.555E-05 | global batch size: 256 | lm loss: 4.528696E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.594 | TFLOPs: 11.97 | 7: iteration 102670/ 173500 | consumed samples: 26283520 | consumed tokens: 53828648960 | elapsed time per iteration (s): 0.08 | learning rate: 8.553E-05 | global batch size: 256 | lm loss: 4.518264E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.892 | TFLOPs: 11.97 | 7: iteration 102680/ 173500 | consumed samples: 26286080 | consumed tokens: 53833891840 | elapsed time per iteration (s): 0.08 | learning rate: 8.552E-05 | global batch size: 256 | lm loss: 4.529757E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.111 | TFLOPs: 12.00 | 7: iteration 102690/ 173500 | consumed samples: 26288640 | consumed tokens: 53839134720 | elapsed time per iteration (s): 0.08 | learning rate: 8.550E-05 | global batch size: 256 | lm loss: 4.509296E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.884 | TFLOPs: 12.00 | 7: iteration 102700/ 173500 | consumed samples: 26291200 | consumed tokens: 53844377600 | elapsed time per iteration (s): 0.08 | learning rate: 8.549E-05 | global batch size: 256 | lm loss: 4.514772E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.586 | TFLOPs: 11.59 | 7: iteration 102710/ 173500 | consumed samples: 26293760 | consumed tokens: 53849620480 | elapsed time per iteration (s): 0.08 | learning rate: 8.547E-05 | global batch size: 256 | lm loss: 4.523576E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.635 | TFLOPs: 11.73 | 7: iteration 102720/ 173500 | consumed samples: 26296320 | consumed tokens: 53854863360 | elapsed time per iteration (s): 0.08 | learning rate: 8.546E-05 | global batch size: 256 | lm loss: 4.525500E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.155 | TFLOPs: 11.78 | 7: iteration 102730/ 173500 | consumed samples: 26298880 | consumed tokens: 53860106240 | elapsed time per iteration (s): 0.08 | learning rate: 8.544E-05 | global batch size: 256 | lm loss: 4.525415E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.136 | TFLOPs: 11.75 | 7: iteration 102740/ 173500 | consumed samples: 26301440 | consumed tokens: 53865349120 | elapsed time per iteration (s): 0.08 | learning rate: 8.542E-05 | global batch size: 256 | lm loss: 4.519313E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.049 | TFLOPs: 11.76 | 7: iteration 102750/ 173500 | consumed samples: 26304000 | consumed tokens: 53870592000 | elapsed time per iteration (s): 0.08 | learning rate: 8.541E-05 | global batch size: 256 | lm loss: 4.527032E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.380 | TFLOPs: 11.80 | 7: iteration 102760/ 173500 | consumed samples: 26306560 | consumed tokens: 53875834880 | elapsed time per iteration (s): 0.09 | learning rate: 8.539E-05 | global batch size: 256 | lm loss: 4.519635E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2708.706 | TFLOPs: 10.08 | 7: iteration 102770/ 173500 | consumed samples: 26309120 | consumed tokens: 53881077760 | elapsed time per iteration (s): 0.10 | learning rate: 8.538E-05 | global batch size: 256 | lm loss: 4.515873E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2484.335 | TFLOPs: 9.24 | 7: iteration 102780/ 173500 | consumed samples: 26311680 | consumed tokens: 53886320640 | elapsed time per iteration (s): 0.11 | learning rate: 8.536E-05 | global batch size: 256 | lm loss: 4.532791E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.502 | TFLOPs: 8.95 | 7: iteration 102790/ 173500 | consumed samples: 26314240 | consumed tokens: 53891563520 | elapsed time per iteration (s): 0.10 | learning rate: 8.534E-05 | global batch size: 256 | lm loss: 4.520707E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.763 | TFLOPs: 9.33 | 7: iteration 102800/ 173500 | consumed samples: 26316800 | consumed tokens: 53896806400 | elapsed time per iteration (s): 0.08 | learning rate: 8.533E-05 | global batch size: 256 | lm loss: 4.512146E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.117 | TFLOPs: 11.26 | 7: iteration 102810/ 173500 | consumed samples: 26319360 | consumed tokens: 53902049280 | elapsed time per iteration (s): 0.08 | learning rate: 8.531E-05 | global batch size: 256 | lm loss: 4.520425E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.372 | TFLOPs: 11.58 | 7: iteration 102820/ 173500 | consumed samples: 26321920 | consumed tokens: 53907292160 | elapsed time per iteration (s): 0.08 | learning rate: 8.530E-05 | global batch size: 256 | lm loss: 4.524744E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.064 | TFLOPs: 11.55 | 7: iteration 102830/ 173500 | consumed samples: 26324480 | consumed tokens: 53912535040 | elapsed time per iteration (s): 0.09 | learning rate: 8.528E-05 | global batch size: 256 | lm loss: 4.514283E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2870.814 | TFLOPs: 10.68 | 7: iteration 102840/ 173500 | consumed samples: 26327040 | consumed tokens: 53917777920 | elapsed time per iteration (s): 0.08 | learning rate: 8.527E-05 | global batch size: 256 | lm loss: 4.508812E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.889 | TFLOPs: 11.33 | 7: iteration 102850/ 173500 | consumed samples: 26329600 | consumed tokens: 53923020800 | elapsed time per iteration (s): 0.08 | learning rate: 8.525E-05 | global batch size: 256 | lm loss: 4.526319E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.134 | TFLOPs: 11.89 | 7: iteration 102860/ 173500 | consumed samples: 26332160 | consumed tokens: 53928263680 | elapsed time per iteration (s): 0.08 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 4.524784E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.786 | TFLOPs: 11.91 | 7: iteration 102870/ 173500 | consumed samples: 26334720 | consumed tokens: 53933506560 | elapsed time per iteration (s): 0.08 | learning rate: 8.522E-05 | global batch size: 256 | lm loss: 4.505074E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.580 | TFLOPs: 11.94 | 7: iteration 102880/ 173500 | consumed samples: 26337280 | consumed tokens: 53938749440 | elapsed time per iteration (s): 0.08 | learning rate: 8.520E-05 | global batch size: 256 | lm loss: 4.533788E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.630 | TFLOPs: 11.90 | 7: iteration 102890/ 173500 | consumed samples: 26339840 | consumed tokens: 53943992320 | elapsed time per iteration (s): 0.08 | learning rate: 8.519E-05 | global batch size: 256 | lm loss: 4.517248E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.962 | TFLOPs: 11.94 | 7: iteration 102900/ 173500 | consumed samples: 26342400 | consumed tokens: 53949235200 | elapsed time per iteration (s): 0.08 | learning rate: 8.517E-05 | global batch size: 256 | lm loss: 4.504788E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.615 | TFLOPs: 11.90 | 7: iteration 102910/ 173500 | consumed samples: 26344960 | consumed tokens: 53954478080 | elapsed time per iteration (s): 0.08 | learning rate: 8.515E-05 | global batch size: 256 | lm loss: 4.510577E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.304 | TFLOPs: 11.65 | 7: iteration 102920/ 173500 | consumed samples: 26347520 | consumed tokens: 53959720960 | elapsed time per iteration (s): 0.08 | learning rate: 8.514E-05 | global batch size: 256 | lm loss: 4.518197E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.852 | TFLOPs: 11.86 | 7: iteration 102930/ 173500 | consumed samples: 26350080 | consumed tokens: 53964963840 | elapsed time per iteration (s): 0.08 | learning rate: 8.512E-05 | global batch size: 256 | lm loss: 4.518960E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.797 | TFLOPs: 11.64 | 7: iteration 102940/ 173500 | consumed samples: 26352640 | consumed tokens: 53970206720 | elapsed time per iteration (s): 0.08 | learning rate: 8.511E-05 | global batch size: 256 | lm loss: 4.533136E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.858 | TFLOPs: 11.70 | 7: iteration 102950/ 173500 | consumed samples: 26355200 | consumed tokens: 53975449600 | elapsed time per iteration (s): 0.08 | learning rate: 8.509E-05 | global batch size: 256 | lm loss: 4.523864E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.607 | TFLOPs: 11.88 | 7: iteration 102960/ 173500 | consumed samples: 26357760 | consumed tokens: 53980692480 | elapsed time per iteration (s): 0.08 | learning rate: 8.508E-05 | global batch size: 256 | lm loss: 4.511584E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.650 | TFLOPs: 11.63 | 7: iteration 102970/ 173500 | consumed samples: 26360320 | consumed tokens: 53985935360 | elapsed time per iteration (s): 0.08 | learning rate: 8.506E-05 | global batch size: 256 | lm loss: 4.529510E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.368 | TFLOPs: 11.86 | 7: iteration 102980/ 173500 | consumed samples: 26362880 | consumed tokens: 53991178240 | elapsed time per iteration (s): 0.08 | learning rate: 8.504E-05 | global batch size: 256 | lm loss: 4.499827E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.634 | TFLOPs: 11.86 | 7: iteration 102990/ 173500 | consumed samples: 26365440 | consumed tokens: 53996421120 | elapsed time per iteration (s): 0.08 | learning rate: 8.503E-05 | global batch size: 256 | lm loss: 4.506944E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.029 | TFLOPs: 11.85 | 7: iteration 103000/ 173500 | consumed samples: 26368000 | consumed tokens: 54001664000 | elapsed time per iteration (s): 0.08 | learning rate: 8.501E-05 | global batch size: 256 | lm loss: 4.515423E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.167 | TFLOPs: 11.64 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 103000 | lm loss value: 4.401703E+00 | lm loss PPL: 8.158973E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 103000 to checkpoints_14m91b100m 0: [2023-03-17 02:44:17,946] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step103000 is begin to save! 0: [2023-03-17 02:44:17,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:44:17,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:44:17,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:44:17,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:44:17,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:44:17,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:44:17,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:44:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:44:17,984] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:44:17,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:44:17,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:44:17,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:44:17,988] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step103000/mp_rank_00_model_states.pt 0: [2023-03-17 02:44:17,988] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:44:17,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:44:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:44:18,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:44:18,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2023-03-17 02:44:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:44:18,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:44:18,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2023-03-17 02:44:18,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:44:18,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:44:18,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: successfully saved checkpoint at iteration 103000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.56 7: iteration 103010/ 173500 | consumed samples: 26370560 | consumed tokens: 54006906880 | elapsed time per iteration (s): 0.09 | learning rate: 8.500E-05 | global batch size: 256 | lm loss: 4.525986E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.703 | TFLOPs: 10.44 | 7: iteration 103020/ 173500 | consumed samples: 26373120 | consumed tokens: 54012149760 | elapsed time per iteration (s): 0.08 | learning rate: 8.498E-05 | global batch size: 256 | lm loss: 4.528560E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.304 | TFLOPs: 11.88 | 7: iteration 103030/ 173500 | consumed samples: 26375680 | consumed tokens: 54017392640 | elapsed time per iteration (s): 0.08 | learning rate: 8.496E-05 | global batch size: 256 | lm loss: 4.510257E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.943 | TFLOPs: 11.87 | 7: iteration 103040/ 173500 | consumed samples: 26378240 | consumed tokens: 54022635520 | elapsed time per iteration (s): 0.08 | learning rate: 8.495E-05 | global batch size: 256 | lm loss: 4.518555E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.169 | TFLOPs: 11.85 | 7: iteration 103050/ 173500 | consumed samples: 26380800 | consumed tokens: 54027878400 | elapsed time per iteration (s): 0.08 | learning rate: 8.493E-05 | global batch size: 256 | lm loss: 4.520329E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.389 | TFLOPs: 11.85 | 7: iteration 103060/ 173500 | consumed samples: 26383360 | consumed tokens: 54033121280 | elapsed time per iteration (s): 0.08 | learning rate: 8.492E-05 | global batch size: 256 | lm loss: 4.518542E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.278 | TFLOPs: 11.86 | 7: iteration 103070/ 173500 | consumed samples: 26385920 | consumed tokens: 54038364160 | elapsed time per iteration (s): 0.08 | learning rate: 8.490E-05 | global batch size: 256 | lm loss: 4.513636E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.735 | TFLOPs: 11.91 | 7: iteration 103080/ 173500 | consumed samples: 26388480 | consumed tokens: 54043607040 | elapsed time per iteration (s): 0.08 | learning rate: 8.489E-05 | global batch size: 256 | lm loss: 4.505914E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.258 | TFLOPs: 11.60 | 7: iteration 103090/ 173500 | consumed samples: 26391040 | consumed tokens: 54048849920 | elapsed time per iteration (s): 0.08 | learning rate: 8.487E-05 | global batch size: 256 | lm loss: 4.532994E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.554 | TFLOPs: 11.79 | 7: iteration 103100/ 173500 | consumed samples: 26393600 | consumed tokens: 54054092800 | elapsed time per iteration (s): 0.08 | learning rate: 8.485E-05 | global batch size: 256 | lm loss: 4.534109E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.545 | TFLOPs: 11.89 | 7: iteration 103110/ 173500 | consumed samples: 26396160 | consumed tokens: 54059335680 | elapsed time per iteration (s): 0.08 | learning rate: 8.484E-05 | global batch size: 256 | lm loss: 4.518990E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.508 | TFLOPs: 11.86 | 7: iteration 103120/ 173500 | consumed samples: 26398720 | consumed tokens: 54064578560 | elapsed time per iteration (s): 0.08 | learning rate: 8.482E-05 | global batch size: 256 | lm loss: 4.521088E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.132 | TFLOPs: 11.43 | 7: iteration 103130/ 173500 | consumed samples: 26401280 | consumed tokens: 54069821440 | elapsed time per iteration (s): 0.08 | learning rate: 8.481E-05 | global batch size: 256 | lm loss: 4.522039E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.021 | TFLOPs: 11.88 | 7: iteration 103140/ 173500 | consumed samples: 26403840 | consumed tokens: 54075064320 | elapsed time per iteration (s): 0.08 | learning rate: 8.479E-05 | global batch size: 256 | lm loss: 4.517192E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.688 | TFLOPs: 11.90 | 7: iteration 103150/ 173500 | consumed samples: 26406400 | consumed tokens: 54080307200 | elapsed time per iteration (s): 0.08 | learning rate: 8.477E-05 | global batch size: 256 | lm loss: 4.521709E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.702 | TFLOPs: 11.62 | 7: iteration 103160/ 173500 | consumed samples: 26408960 | consumed tokens: 54085550080 | elapsed time per iteration (s): 0.09 | learning rate: 8.476E-05 | global batch size: 256 | lm loss: 4.519823E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2891.957 | TFLOPs: 10.76 | 7: iteration 103170/ 173500 | consumed samples: 26411520 | consumed tokens: 54090792960 | elapsed time per iteration (s): 0.08 | learning rate: 8.474E-05 | global batch size: 256 | lm loss: 4.526046E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.948 | TFLOPs: 11.89 | 7: iteration 103180/ 173500 | consumed samples: 26414080 | consumed tokens: 54096035840 | elapsed time per iteration (s): 0.08 | learning rate: 8.473E-05 | global batch size: 256 | lm loss: 4.525464E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.673 | TFLOPs: 11.81 | 7: iteration 103190/ 173500 | consumed samples: 26416640 | consumed tokens: 54101278720 | elapsed time per iteration (s): 0.08 | learning rate: 8.471E-05 | global batch size: 256 | lm loss: 4.523708E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.611 | TFLOPs: 11.90 | 7: iteration 103200/ 173500 | consumed samples: 26419200 | consumed tokens: 54106521600 | elapsed time per iteration (s): 0.08 | learning rate: 8.470E-05 | global batch size: 256 | lm loss: 4.524301E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.271 | TFLOPs: 11.42 | 7: iteration 103210/ 173500 | consumed samples: 26421760 | consumed tokens: 54111764480 | elapsed time per iteration (s): 0.08 | learning rate: 8.468E-05 | global batch size: 256 | lm loss: 4.514880E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.946 | TFLOPs: 11.69 | 7: iteration 103220/ 173500 | consumed samples: 26424320 | consumed tokens: 54117007360 | elapsed time per iteration (s): 0.08 | learning rate: 8.466E-05 | global batch size: 256 | lm loss: 4.507298E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.057 | TFLOPs: 11.88 | 7: iteration 103230/ 173500 | consumed samples: 26426880 | consumed tokens: 54122250240 | elapsed time per iteration (s): 0.08 | learning rate: 8.465E-05 | global batch size: 256 | lm loss: 4.513810E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.643 | TFLOPs: 11.89 | 7: iteration 103240/ 173500 | consumed samples: 26429440 | consumed tokens: 54127493120 | elapsed time per iteration (s): 0.08 | learning rate: 8.463E-05 | global batch size: 256 | lm loss: 4.530927E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.084 | TFLOPs: 11.85 | 7: iteration 103250/ 173500 | consumed samples: 26432000 | consumed tokens: 54132736000 | elapsed time per iteration (s): 0.08 | learning rate: 8.462E-05 | global batch size: 256 | lm loss: 4.501482E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.049 | TFLOPs: 11.88 | 7: iteration 103260/ 173500 | consumed samples: 26434560 | consumed tokens: 54137978880 | elapsed time per iteration (s): 0.09 | learning rate: 8.460E-05 | global batch size: 256 | lm loss: 4.521259E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.734 | TFLOPs: 10.18 | 7: iteration 103270/ 173500 | consumed samples: 26437120 | consumed tokens: 54143221760 | elapsed time per iteration (s): 0.09 | learning rate: 8.459E-05 | global batch size: 256 | lm loss: 4.528614E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.327 | TFLOPs: 10.82 | 7: iteration 103280/ 173500 | consumed samples: 26439680 | consumed tokens: 54148464640 | elapsed time per iteration (s): 0.08 | learning rate: 8.457E-05 | global batch size: 256 | lm loss: 4.528411E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.727 | TFLOPs: 11.65 | 7: iteration 103290/ 173500 | consumed samples: 26442240 | consumed tokens: 54153707520 | elapsed time per iteration (s): 0.08 | learning rate: 8.455E-05 | global batch size: 256 | lm loss: 4.518061E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.789 | TFLOPs: 12.06 | 7: iteration 103300/ 173500 | consumed samples: 26444800 | consumed tokens: 54158950400 | elapsed time per iteration (s): 0.08 | learning rate: 8.454E-05 | global batch size: 256 | lm loss: 4.520588E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.639 | TFLOPs: 11.50 | 7: iteration 103310/ 173500 | consumed samples: 26447360 | consumed tokens: 54164193280 | elapsed time per iteration (s): 0.08 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 4.524525E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.705 | TFLOPs: 11.76 | 7: iteration 103320/ 173500 | consumed samples: 26449920 | consumed tokens: 54169436160 | elapsed time per iteration (s): 0.08 | learning rate: 8.451E-05 | global batch size: 256 | lm loss: 4.523757E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.516 | TFLOPs: 11.22 | 7: iteration 103330/ 173500 | consumed samples: 26452480 | consumed tokens: 54174679040 | elapsed time per iteration (s): 0.10 | learning rate: 8.449E-05 | global batch size: 256 | lm loss: 4.516302E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2449.428 | TFLOPs: 9.11 | 7: iteration 103340/ 173500 | consumed samples: 26455040 | consumed tokens: 54179921920 | elapsed time per iteration (s): 0.10 | learning rate: 8.447E-05 | global batch size: 256 | lm loss: 4.519376E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2516.108 | TFLOPs: 9.36 | 7: iteration 103350/ 173500 | consumed samples: 26457600 | consumed tokens: 54185164800 | elapsed time per iteration (s): 0.11 | learning rate: 8.446E-05 | global batch size: 256 | lm loss: 4.520271E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.235 | TFLOPs: 8.75 | 7: iteration 103360/ 173500 | consumed samples: 26460160 | consumed tokens: 54190407680 | elapsed time per iteration (s): 0.10 | learning rate: 8.444E-05 | global batch size: 256 | lm loss: 4.518031E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2486.201 | TFLOPs: 9.25 | 7: iteration 103370/ 173500 | consumed samples: 26462720 | consumed tokens: 54195650560 | elapsed time per iteration (s): 0.10 | learning rate: 8.443E-05 | global batch size: 256 | lm loss: 4.507523E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.364 | TFLOPs: 9.37 | 7: iteration 103380/ 173500 | consumed samples: 26465280 | consumed tokens: 54200893440 | elapsed time per iteration (s): 0.10 | learning rate: 8.441E-05 | global batch size: 256 | lm loss: 4.518256E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.420 | TFLOPs: 9.42 | 7: iteration 103390/ 173500 | consumed samples: 26467840 | consumed tokens: 54206136320 | elapsed time per iteration (s): 0.10 | learning rate: 8.440E-05 | global batch size: 256 | lm loss: 4.506082E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.215 | TFLOPs: 9.33 | 7: iteration 103400/ 173500 | consumed samples: 26470400 | consumed tokens: 54211379200 | elapsed time per iteration (s): 0.11 | learning rate: 8.438E-05 | global batch size: 256 | lm loss: 4.518689E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2418.470 | TFLOPs: 9.00 | 7: iteration 103410/ 173500 | consumed samples: 26472960 | consumed tokens: 54216622080 | elapsed time per iteration (s): 0.11 | learning rate: 8.436E-05 | global batch size: 256 | lm loss: 4.519577E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2432.814 | TFLOPs: 9.05 | 7: iteration 103420/ 173500 | consumed samples: 26475520 | consumed tokens: 54221864960 | elapsed time per iteration (s): 0.10 | learning rate: 8.435E-05 | global batch size: 256 | lm loss: 4.518678E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2524.481 | TFLOPs: 9.39 | 7: iteration 103430/ 173500 | consumed samples: 26478080 | consumed tokens: 54227107840 | elapsed time per iteration (s): 0.09 | learning rate: 8.433E-05 | global batch size: 256 | lm loss: 4.510423E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.150 | TFLOPs: 10.21 | 7: iteration 103440/ 173500 | consumed samples: 26480640 | consumed tokens: 54232350720 | elapsed time per iteration (s): 0.09 | learning rate: 8.432E-05 | global batch size: 256 | lm loss: 4.512824E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2982.036 | TFLOPs: 11.09 | 7: iteration 103450/ 173500 | consumed samples: 26483200 | consumed tokens: 54237593600 | elapsed time per iteration (s): 0.08 | learning rate: 8.430E-05 | global batch size: 256 | lm loss: 4.514958E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.760 | TFLOPs: 11.72 | 7: iteration 103460/ 173500 | consumed samples: 26485760 | consumed tokens: 54242836480 | elapsed time per iteration (s): 0.08 | learning rate: 8.429E-05 | global batch size: 256 | lm loss: 4.538371E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.179 | TFLOPs: 11.80 | 7: iteration 103470/ 173500 | consumed samples: 26488320 | consumed tokens: 54248079360 | elapsed time per iteration (s): 0.08 | learning rate: 8.427E-05 | global batch size: 256 | lm loss: 4.523344E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.398 | TFLOPs: 11.49 | 7: iteration 103480/ 173500 | consumed samples: 26490880 | consumed tokens: 54253322240 | elapsed time per iteration (s): 0.08 | learning rate: 8.425E-05 | global batch size: 256 | lm loss: 4.525689E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.266 | TFLOPs: 11.38 | 7: iteration 103490/ 173500 | consumed samples: 26493440 | consumed tokens: 54258565120 | elapsed time per iteration (s): 0.08 | learning rate: 8.424E-05 | global batch size: 256 | lm loss: 4.519046E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.569 | TFLOPs: 11.85 | 7: iteration 103500/ 173500 | consumed samples: 26496000 | consumed tokens: 54263808000 | elapsed time per iteration (s): 0.08 | learning rate: 8.422E-05 | global batch size: 256 | lm loss: 4.519361E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.224 | TFLOPs: 11.54 | 7: iteration 103510/ 173500 | consumed samples: 26498560 | consumed tokens: 54269050880 | elapsed time per iteration (s): 0.08 | learning rate: 8.421E-05 | global batch size: 256 | lm loss: 4.522226E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.443 | TFLOPs: 11.80 | 7: iteration 103520/ 173500 | consumed samples: 26501120 | consumed tokens: 54274293760 | elapsed time per iteration (s): 0.08 | learning rate: 8.419E-05 | global batch size: 256 | lm loss: 4.519450E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.611 | TFLOPs: 11.88 | 7: iteration 103530/ 173500 | consumed samples: 26503680 | consumed tokens: 54279536640 | elapsed time per iteration (s): 0.08 | learning rate: 8.418E-05 | global batch size: 256 | lm loss: 4.523471E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.245 | TFLOPs: 11.59 | 7: iteration 103540/ 173500 | consumed samples: 26506240 | consumed tokens: 54284779520 | elapsed time per iteration (s): 0.08 | learning rate: 8.416E-05 | global batch size: 256 | lm loss: 4.519886E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.593 | TFLOPs: 11.86 | 7: iteration 103550/ 173500 | consumed samples: 26508800 | consumed tokens: 54290022400 | elapsed time per iteration (s): 0.08 | learning rate: 8.414E-05 | global batch size: 256 | lm loss: 4.516543E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.621 | TFLOPs: 11.84 | 7: iteration 103560/ 173500 | consumed samples: 26511360 | consumed tokens: 54295265280 | elapsed time per iteration (s): 0.08 | learning rate: 8.413E-05 | global batch size: 256 | lm loss: 4.531742E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.869 | TFLOPs: 11.85 | 7: iteration 103570/ 173500 | consumed samples: 26513920 | consumed tokens: 54300508160 | elapsed time per iteration (s): 0.08 | learning rate: 8.411E-05 | global batch size: 256 | lm loss: 4.526863E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.953 | TFLOPs: 11.61 | 7: iteration 103580/ 173500 | consumed samples: 26516480 | consumed tokens: 54305751040 | elapsed time per iteration (s): 0.08 | learning rate: 8.410E-05 | global batch size: 256 | lm loss: 4.506846E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.552 | TFLOPs: 11.63 | 7: iteration 103590/ 173500 | consumed samples: 26519040 | consumed tokens: 54310993920 | elapsed time per iteration (s): 0.08 | learning rate: 8.408E-05 | global batch size: 256 | lm loss: 4.513826E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.551 | TFLOPs: 11.86 | 7: iteration 103600/ 173500 | consumed samples: 26521600 | consumed tokens: 54316236800 | elapsed time per iteration (s): 0.08 | learning rate: 8.406E-05 | global batch size: 256 | lm loss: 4.527087E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.454 | TFLOPs: 11.89 | 7: iteration 103610/ 173500 | consumed samples: 26524160 | consumed tokens: 54321479680 | elapsed time per iteration (s): 0.08 | learning rate: 8.405E-05 | global batch size: 256 | lm loss: 4.522372E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.823 | TFLOPs: 11.81 | 7: iteration 103620/ 173500 | consumed samples: 26526720 | consumed tokens: 54326722560 | elapsed time per iteration (s): 0.08 | learning rate: 8.403E-05 | global batch size: 256 | lm loss: 4.515325E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.465 | TFLOPs: 11.83 | 7: iteration 103630/ 173500 | consumed samples: 26529280 | consumed tokens: 54331965440 | elapsed time per iteration (s): 0.08 | learning rate: 8.402E-05 | global batch size: 256 | lm loss: 4.509781E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.543 | TFLOPs: 11.82 | 7: iteration 103640/ 173500 | consumed samples: 26531840 | consumed tokens: 54337208320 | elapsed time per iteration (s): 0.08 | learning rate: 8.400E-05 | global batch size: 256 | lm loss: 4.518726E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.902 | TFLOPs: 11.76 | 7: iteration 103650/ 173500 | consumed samples: 26534400 | consumed tokens: 54342451200 | elapsed time per iteration (s): 0.08 | learning rate: 8.399E-05 | global batch size: 256 | lm loss: 4.509699E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.869 | TFLOPs: 11.86 | 7: iteration 103660/ 173500 | consumed samples: 26536960 | consumed tokens: 54347694080 | elapsed time per iteration (s): 0.08 | learning rate: 8.397E-05 | global batch size: 256 | lm loss: 4.512821E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.175 | TFLOPs: 11.72 | 7: iteration 103670/ 173500 | consumed samples: 26539520 | consumed tokens: 54352936960 | elapsed time per iteration (s): 0.08 | learning rate: 8.395E-05 | global batch size: 256 | lm loss: 4.513438E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.653 | TFLOPs: 11.82 | 7: iteration 103680/ 173500 | consumed samples: 26542080 | consumed tokens: 54358179840 | elapsed time per iteration (s): 0.08 | learning rate: 8.394E-05 | global batch size: 256 | lm loss: 4.510423E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.469 | TFLOPs: 11.83 | 7: iteration 103690/ 173500 | consumed samples: 26544640 | consumed tokens: 54363422720 | elapsed time per iteration (s): 0.08 | learning rate: 8.392E-05 | global batch size: 256 | lm loss: 4.530458E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.058 | TFLOPs: 11.85 | 7: iteration 103700/ 173500 | consumed samples: 26547200 | consumed tokens: 54368665600 | elapsed time per iteration (s): 0.08 | learning rate: 8.391E-05 | global batch size: 256 | lm loss: 4.518068E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.099 | TFLOPs: 11.82 | 7: iteration 103710/ 173500 | consumed samples: 26549760 | consumed tokens: 54373908480 | elapsed time per iteration (s): 0.08 | learning rate: 8.389E-05 | global batch size: 256 | lm loss: 4.503219E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.510 | TFLOPs: 11.81 | 7: iteration 103720/ 173500 | consumed samples: 26552320 | consumed tokens: 54379151360 | elapsed time per iteration (s): 0.08 | learning rate: 8.388E-05 | global batch size: 256 | lm loss: 4.520182E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.139 | TFLOPs: 11.82 | 7: iteration 103730/ 173500 | consumed samples: 26554880 | consumed tokens: 54384394240 | elapsed time per iteration (s): 0.08 | learning rate: 8.386E-05 | global batch size: 256 | lm loss: 4.517389E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.958 | TFLOPs: 11.79 | 7: iteration 103740/ 173500 | consumed samples: 26557440 | consumed tokens: 54389637120 | elapsed time per iteration (s): 0.08 | learning rate: 8.384E-05 | global batch size: 256 | lm loss: 4.522523E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.019 | TFLOPs: 11.84 | 7: iteration 103750/ 173500 | consumed samples: 26560000 | consumed tokens: 54394880000 | elapsed time per iteration (s): 0.10 | learning rate: 8.383E-05 | global batch size: 256 | lm loss: 4.511436E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.321 | TFLOPs: 9.81 | 7: iteration 103760/ 173500 | consumed samples: 26562560 | consumed tokens: 54400122880 | elapsed time per iteration (s): 0.08 | learning rate: 8.381E-05 | global batch size: 256 | lm loss: 4.511488E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.098 | TFLOPs: 11.29 | 7: iteration 103770/ 173500 | consumed samples: 26565120 | consumed tokens: 54405365760 | elapsed time per iteration (s): 0.08 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 4.515729E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.013 | TFLOPs: 11.78 | 7: iteration 103780/ 173500 | consumed samples: 26567680 | consumed tokens: 54410608640 | elapsed time per iteration (s): 0.08 | learning rate: 8.378E-05 | global batch size: 256 | lm loss: 4.523245E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.360 | TFLOPs: 11.87 | 7: iteration 103790/ 173500 | consumed samples: 26570240 | consumed tokens: 54415851520 | elapsed time per iteration (s): 0.08 | learning rate: 8.377E-05 | global batch size: 256 | lm loss: 4.515646E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.131 | TFLOPs: 11.88 | 7: iteration 103800/ 173500 | consumed samples: 26572800 | consumed tokens: 54421094400 | elapsed time per iteration (s): 0.08 | learning rate: 8.375E-05 | global batch size: 256 | lm loss: 4.519822E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.989 | TFLOPs: 11.88 | 7: iteration 103810/ 173500 | consumed samples: 26575360 | consumed tokens: 54426337280 | elapsed time per iteration (s): 0.08 | learning rate: 8.373E-05 | global batch size: 256 | lm loss: 4.517225E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.925 | TFLOPs: 11.90 | 7: iteration 103820/ 173500 | consumed samples: 26577920 | consumed tokens: 54431580160 | elapsed time per iteration (s): 0.08 | learning rate: 8.372E-05 | global batch size: 256 | lm loss: 4.499647E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.978 | TFLOPs: 11.88 | 7: iteration 103830/ 173500 | consumed samples: 26580480 | consumed tokens: 54436823040 | elapsed time per iteration (s): 0.08 | learning rate: 8.370E-05 | global batch size: 256 | lm loss: 4.500055E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.909 | TFLOPs: 11.85 | 7: iteration 103840/ 173500 | consumed samples: 26583040 | consumed tokens: 54442065920 | elapsed time per iteration (s): 0.11 | learning rate: 8.369E-05 | global batch size: 256 | lm loss: 4.518588E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2347.087 | TFLOPs: 8.73 | 7: iteration 103850/ 173500 | consumed samples: 26585600 | consumed tokens: 54447308800 | elapsed time per iteration (s): 0.08 | learning rate: 8.367E-05 | global batch size: 256 | lm loss: 4.509534E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.758 | TFLOPs: 11.86 | 7: iteration 103860/ 173500 | consumed samples: 26588160 | consumed tokens: 54452551680 | elapsed time per iteration (s): 0.10 | learning rate: 8.366E-05 | global batch size: 256 | lm loss: 4.517110E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2628.921 | TFLOPs: 9.78 | 7: iteration 103870/ 173500 | consumed samples: 26590720 | consumed tokens: 54457794560 | elapsed time per iteration (s): 0.08 | learning rate: 8.364E-05 | global batch size: 256 | lm loss: 4.518529E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.253 | TFLOPs: 11.67 | 7: iteration 103880/ 173500 | consumed samples: 26593280 | consumed tokens: 54463037440 | elapsed time per iteration (s): 0.08 | learning rate: 8.362E-05 | global batch size: 256 | lm loss: 4.526593E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.103 | TFLOPs: 11.92 | 7: iteration 103890/ 173500 | consumed samples: 26595840 | consumed tokens: 54468280320 | elapsed time per iteration (s): 0.08 | learning rate: 8.361E-05 | global batch size: 256 | lm loss: 4.514258E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.108 | TFLOPs: 11.92 | 7: iteration 103900/ 173500 | consumed samples: 26598400 | consumed tokens: 54473523200 | elapsed time per iteration (s): 0.08 | learning rate: 8.359E-05 | global batch size: 256 | lm loss: 4.522063E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.924 | TFLOPs: 11.54 | 7: iteration 103910/ 173500 | consumed samples: 26600960 | consumed tokens: 54478766080 | elapsed time per iteration (s): 0.08 | learning rate: 8.358E-05 | global batch size: 256 | lm loss: 4.512115E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.449 | TFLOPs: 11.87 | 7: iteration 103920/ 173500 | consumed samples: 26603520 | consumed tokens: 54484008960 | elapsed time per iteration (s): 0.08 | learning rate: 8.356E-05 | global batch size: 256 | lm loss: 4.521187E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.715 | TFLOPs: 11.86 | 7: iteration 103930/ 173500 | consumed samples: 26606080 | consumed tokens: 54489251840 | elapsed time per iteration (s): 0.08 | learning rate: 8.355E-05 | global batch size: 256 | lm loss: 4.515640E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.826 | TFLOPs: 11.59 | 7: iteration 103940/ 173500 | consumed samples: 26608640 | consumed tokens: 54494494720 | elapsed time per iteration (s): 0.08 | learning rate: 8.353E-05 | global batch size: 256 | lm loss: 4.506121E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.236 | TFLOPs: 11.64 | 7: iteration 103950/ 173500 | consumed samples: 26611200 | consumed tokens: 54499737600 | elapsed time per iteration (s): 0.08 | learning rate: 8.351E-05 | global batch size: 256 | lm loss: 4.516267E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.730 | TFLOPs: 11.67 | 7: iteration 103960/ 173500 | consumed samples: 26613760 | consumed tokens: 54504980480 | elapsed time per iteration (s): 0.08 | learning rate: 8.350E-05 | global batch size: 256 | lm loss: 4.528303E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.544 | TFLOPs: 11.84 | 7: iteration 103970/ 173500 | consumed samples: 26616320 | consumed tokens: 54510223360 | elapsed time per iteration (s): 0.08 | learning rate: 8.348E-05 | global batch size: 256 | lm loss: 4.521717E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.337 | TFLOPs: 11.69 | 7: iteration 103980/ 173500 | consumed samples: 26618880 | consumed tokens: 54515466240 | elapsed time per iteration (s): 0.08 | learning rate: 8.347E-05 | global batch size: 256 | lm loss: 4.519865E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.782 | TFLOPs: 11.76 | 7: iteration 103990/ 173500 | consumed samples: 26621440 | consumed tokens: 54520709120 | elapsed time per iteration (s): 0.08 | learning rate: 8.345E-05 | global batch size: 256 | lm loss: 4.518764E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.081 | TFLOPs: 11.73 | 0: [2023-03-17 02:45:42,258] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=0, lr=[8.343492337309329e-05, 8.343492337309329e-05, 8.343492337309329e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 104000/ 173500 | consumed samples: 26624000 | consumed tokens: 54525952000 | elapsed time per iteration (s): 0.08 | learning rate: 8.343E-05 | global batch size: 256 | lm loss: 4.523867E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.031 | TFLOPs: 11.69 | 0: steps: 104000 loss: 4.5148 iter time (s): 0.089 samples/sec: 2878.862 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 104000 | lm loss value: 4.387806E+00 | lm loss PPL: 8.046372E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 104000 to checkpoints_14m91b100m 0: [2023-03-17 02:45:42,314] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step104000 is begin to save! 0: [2023-03-17 02:45:42,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:45:42,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:45:42,342] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:45:42,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:45:42,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:45:42,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:45:42,348] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:45:42,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:45:42,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:45:42,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:45:42,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:45:42,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:45:42,355] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step104000/mp_rank_00_model_states.pt 0: [2023-03-17 02:45:42,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:45:42,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:45:42,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:45:42,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2023-03-17 02:45:42,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2023-03-17 02:45:42,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:45:42,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:45:42,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: successfully saved checkpoint at iteration 104000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.64 7: iteration 104010/ 173500 | consumed samples: 26626560 | consumed tokens: 54531194880 | elapsed time per iteration (s): 0.09 | learning rate: 8.342E-05 | global batch size: 256 | lm loss: 4.519633E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.501 | TFLOPs: 10.33 | 7: iteration 104020/ 173500 | consumed samples: 26629120 | consumed tokens: 54536437760 | elapsed time per iteration (s): 0.08 | learning rate: 8.340E-05 | global batch size: 256 | lm loss: 4.522632E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.017 | TFLOPs: 11.69 | 7: iteration 104030/ 173500 | consumed samples: 26631680 | consumed tokens: 54541680640 | elapsed time per iteration (s): 0.08 | learning rate: 8.339E-05 | global batch size: 256 | lm loss: 4.504241E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.767 | TFLOPs: 11.96 | 7: iteration 104040/ 173500 | consumed samples: 26634240 | consumed tokens: 54546923520 | elapsed time per iteration (s): 0.08 | learning rate: 8.337E-05 | global batch size: 256 | lm loss: 4.506368E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.374 | TFLOPs: 11.76 | 7: iteration 104050/ 173500 | consumed samples: 26636800 | consumed tokens: 54552166400 | elapsed time per iteration (s): 0.08 | learning rate: 8.336E-05 | global batch size: 256 | lm loss: 4.514340E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.242 | TFLOPs: 11.65 | 7: iteration 104060/ 173500 | consumed samples: 26639360 | consumed tokens: 54557409280 | elapsed time per iteration (s): 0.08 | learning rate: 8.334E-05 | global batch size: 256 | lm loss: 4.511887E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.305 | TFLOPs: 11.44 | 7: iteration 104070/ 173500 | consumed samples: 26641920 | consumed tokens: 54562652160 | elapsed time per iteration (s): 0.08 | learning rate: 8.332E-05 | global batch size: 256 | lm loss: 4.520438E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.082 | TFLOPs: 11.71 | 7: iteration 104080/ 173500 | consumed samples: 26644480 | consumed tokens: 54567895040 | elapsed time per iteration (s): 0.08 | learning rate: 8.331E-05 | global batch size: 256 | lm loss: 4.523887E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.569 | TFLOPs: 11.98 | 7: iteration 104090/ 173500 | consumed samples: 26647040 | consumed tokens: 54573137920 | elapsed time per iteration (s): 0.08 | learning rate: 8.329E-05 | global batch size: 256 | lm loss: 4.516848E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.410 | TFLOPs: 12.00 | 7: iteration 104100/ 173500 | consumed samples: 26649600 | consumed tokens: 54578380800 | elapsed time per iteration (s): 0.08 | learning rate: 8.328E-05 | global batch size: 256 | lm loss: 4.515332E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.146 | TFLOPs: 11.84 | 7: iteration 104110/ 173500 | consumed samples: 26652160 | consumed tokens: 54583623680 | elapsed time per iteration (s): 0.08 | learning rate: 8.326E-05 | global batch size: 256 | lm loss: 4.508873E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.344 | TFLOPs: 12.01 | 7: iteration 104120/ 173500 | consumed samples: 26654720 | consumed tokens: 54588866560 | elapsed time per iteration (s): 0.08 | learning rate: 8.325E-05 | global batch size: 256 | lm loss: 4.518754E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.604 | TFLOPs: 11.98 | 7: iteration 104130/ 173500 | consumed samples: 26657280 | consumed tokens: 54594109440 | elapsed time per iteration (s): 0.08 | learning rate: 8.323E-05 | global batch size: 256 | lm loss: 4.521991E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.348 | TFLOPs: 11.66 | 7: iteration 104140/ 173500 | consumed samples: 26659840 | consumed tokens: 54599352320 | elapsed time per iteration (s): 0.08 | learning rate: 8.321E-05 | global batch size: 256 | lm loss: 4.522189E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.518 | TFLOPs: 11.75 | 7: iteration 104150/ 173500 | consumed samples: 26662400 | consumed tokens: 54604595200 | elapsed time per iteration (s): 0.08 | learning rate: 8.320E-05 | global batch size: 256 | lm loss: 4.513702E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.965 | TFLOPs: 11.46 | 7: iteration 104160/ 173500 | consumed samples: 26664960 | consumed tokens: 54609838080 | elapsed time per iteration (s): 0.08 | learning rate: 8.318E-05 | global batch size: 256 | lm loss: 4.522502E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.891 | TFLOPs: 11.47 | 7: iteration 104170/ 173500 | consumed samples: 26667520 | consumed tokens: 54615080960 | elapsed time per iteration (s): 0.08 | learning rate: 8.317E-05 | global batch size: 256 | lm loss: 4.522990E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.344 | TFLOPs: 11.69 | 7: iteration 104180/ 173500 | consumed samples: 26670080 | consumed tokens: 54620323840 | elapsed time per iteration (s): 0.08 | learning rate: 8.315E-05 | global batch size: 256 | lm loss: 4.538544E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.319 | TFLOPs: 12.02 | 7: iteration 104190/ 173500 | consumed samples: 26672640 | consumed tokens: 54625566720 | elapsed time per iteration (s): 0.08 | learning rate: 8.314E-05 | global batch size: 256 | lm loss: 4.514829E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.097 | TFLOPs: 11.63 | 7: iteration 104200/ 173500 | consumed samples: 26675200 | consumed tokens: 54630809600 | elapsed time per iteration (s): 0.08 | learning rate: 8.312E-05 | global batch size: 256 | lm loss: 4.510401E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.553 | TFLOPs: 11.72 | 7: iteration 104210/ 173500 | consumed samples: 26677760 | consumed tokens: 54636052480 | elapsed time per iteration (s): 0.08 | learning rate: 8.310E-05 | global batch size: 256 | lm loss: 4.511859E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.915 | TFLOPs: 11.99 | 7: iteration 104220/ 173500 | consumed samples: 26680320 | consumed tokens: 54641295360 | elapsed time per iteration (s): 0.08 | learning rate: 8.309E-05 | global batch size: 256 | lm loss: 4.509251E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.488 | TFLOPs: 12.01 | 7: iteration 104230/ 173500 | consumed samples: 26682880 | consumed tokens: 54646538240 | elapsed time per iteration (s): 0.08 | learning rate: 8.307E-05 | global batch size: 256 | lm loss: 4.516762E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.885 | TFLOPs: 11.99 | 7: iteration 104240/ 173500 | consumed samples: 26685440 | consumed tokens: 54651781120 | elapsed time per iteration (s): 0.08 | learning rate: 8.306E-05 | global batch size: 256 | lm loss: 4.511843E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.943 | TFLOPs: 11.95 | 7: iteration 104250/ 173500 | consumed samples: 26688000 | consumed tokens: 54657024000 | elapsed time per iteration (s): 0.08 | learning rate: 8.304E-05 | global batch size: 256 | lm loss: 4.515710E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.116 | TFLOPs: 11.96 | 7: iteration 104260/ 173500 | consumed samples: 26690560 | consumed tokens: 54662266880 | elapsed time per iteration (s): 0.08 | learning rate: 8.303E-05 | global batch size: 256 | lm loss: 4.520079E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.992 | TFLOPs: 11.71 | 7: iteration 104270/ 173500 | consumed samples: 26693120 | consumed tokens: 54667509760 | elapsed time per iteration (s): 0.08 | learning rate: 8.301E-05 | global batch size: 256 | lm loss: 4.515974E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.156 | TFLOPs: 11.83 | 7: iteration 104280/ 173500 | consumed samples: 26695680 | consumed tokens: 54672752640 | elapsed time per iteration (s): 0.08 | learning rate: 8.299E-05 | global batch size: 256 | lm loss: 4.509570E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.489 | TFLOPs: 11.88 | 7: iteration 104290/ 173500 | consumed samples: 26698240 | consumed tokens: 54677995520 | elapsed time per iteration (s): 0.08 | learning rate: 8.298E-05 | global batch size: 256 | lm loss: 4.524529E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.500 | TFLOPs: 11.87 | 7: iteration 104300/ 173500 | consumed samples: 26700800 | consumed tokens: 54683238400 | elapsed time per iteration (s): 0.08 | learning rate: 8.296E-05 | global batch size: 256 | lm loss: 4.510002E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.877 | TFLOPs: 11.89 | 7: iteration 104310/ 173500 | consumed samples: 26703360 | consumed tokens: 54688481280 | elapsed time per iteration (s): 0.08 | learning rate: 8.295E-05 | global batch size: 256 | lm loss: 4.522322E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.521 | TFLOPs: 11.73 | 7: iteration 104320/ 173500 | consumed samples: 26705920 | consumed tokens: 54693724160 | elapsed time per iteration (s): 0.08 | learning rate: 8.293E-05 | global batch size: 256 | lm loss: 4.505215E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.117 | TFLOPs: 11.89 | 7: iteration 104330/ 173500 | consumed samples: 26708480 | consumed tokens: 54698967040 | elapsed time per iteration (s): 0.08 | learning rate: 8.292E-05 | global batch size: 256 | lm loss: 4.509931E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.249 | TFLOPs: 11.86 | 7: iteration 104340/ 173500 | consumed samples: 26711040 | consumed tokens: 54704209920 | elapsed time per iteration (s): 0.08 | learning rate: 8.290E-05 | global batch size: 256 | lm loss: 4.499875E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.501 | TFLOPs: 11.80 | 7: iteration 104350/ 173500 | consumed samples: 26713600 | consumed tokens: 54709452800 | elapsed time per iteration (s): 0.08 | learning rate: 8.289E-05 | global batch size: 256 | lm loss: 4.515705E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.558 | TFLOPs: 11.83 | 7: iteration 104360/ 173500 | consumed samples: 26716160 | consumed tokens: 54714695680 | elapsed time per iteration (s): 0.08 | learning rate: 8.287E-05 | global batch size: 256 | lm loss: 4.522387E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.781 | TFLOPs: 11.86 | 7: iteration 104370/ 173500 | consumed samples: 26718720 | consumed tokens: 54719938560 | elapsed time per iteration (s): 0.08 | learning rate: 8.285E-05 | global batch size: 256 | lm loss: 4.519271E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.698 | TFLOPs: 11.59 | 7: iteration 104380/ 173500 | consumed samples: 26721280 | consumed tokens: 54725181440 | elapsed time per iteration (s): 0.08 | learning rate: 8.284E-05 | global batch size: 256 | lm loss: 4.516543E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.882 | TFLOPs: 11.55 | 7: iteration 104390/ 173500 | consumed samples: 26723840 | consumed tokens: 54730424320 | elapsed time per iteration (s): 0.08 | learning rate: 8.282E-05 | global batch size: 256 | lm loss: 4.528453E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.945 | TFLOPs: 11.78 | 7: iteration 104400/ 173500 | consumed samples: 26726400 | consumed tokens: 54735667200 | elapsed time per iteration (s): 0.09 | learning rate: 8.281E-05 | global batch size: 256 | lm loss: 4.514011E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.075 | TFLOPs: 11.11 | 7: iteration 104410/ 173500 | consumed samples: 26728960 | consumed tokens: 54740910080 | elapsed time per iteration (s): 0.08 | learning rate: 8.279E-05 | global batch size: 256 | lm loss: 4.505033E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.338 | TFLOPs: 11.86 | 7: iteration 104420/ 173500 | consumed samples: 26731520 | consumed tokens: 54746152960 | elapsed time per iteration (s): 0.08 | learning rate: 8.278E-05 | global batch size: 256 | lm loss: 4.520695E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.783 | TFLOPs: 11.83 | 7: iteration 104430/ 173500 | consumed samples: 26734080 | consumed tokens: 54751395840 | elapsed time per iteration (s): 0.08 | learning rate: 8.276E-05 | global batch size: 256 | lm loss: 4.520493E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.379 | TFLOPs: 11.58 | 7: iteration 104440/ 173500 | consumed samples: 26736640 | consumed tokens: 54756638720 | elapsed time per iteration (s): 0.09 | learning rate: 8.274E-05 | global batch size: 256 | lm loss: 4.515966E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.078 | TFLOPs: 11.05 | 7: iteration 104450/ 173500 | consumed samples: 26739200 | consumed tokens: 54761881600 | elapsed time per iteration (s): 0.08 | learning rate: 8.273E-05 | global batch size: 256 | lm loss: 4.524722E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.308 | TFLOPs: 11.63 | 7: iteration 104460/ 173500 | consumed samples: 26741760 | consumed tokens: 54767124480 | elapsed time per iteration (s): 0.08 | learning rate: 8.271E-05 | global batch size: 256 | lm loss: 4.526661E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.645 | TFLOPs: 11.26 | 7: iteration 104470/ 173500 | consumed samples: 26744320 | consumed tokens: 54772367360 | elapsed time per iteration (s): 0.11 | learning rate: 8.270E-05 | global batch size: 256 | lm loss: 4.515945E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2432.273 | TFLOPs: 9.05 | 7: iteration 104480/ 173500 | consumed samples: 26746880 | consumed tokens: 54777610240 | elapsed time per iteration (s): 0.13 | learning rate: 8.268E-05 | global batch size: 256 | lm loss: 4.514886E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1982.271 | TFLOPs: 7.37 | 7: iteration 104490/ 173500 | consumed samples: 26749440 | consumed tokens: 54782853120 | elapsed time per iteration (s): 0.10 | learning rate: 8.267E-05 | global batch size: 256 | lm loss: 4.523549E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.016 | TFLOPs: 9.22 | 7: iteration 104500/ 173500 | consumed samples: 26752000 | consumed tokens: 54788096000 | elapsed time per iteration (s): 0.08 | learning rate: 8.265E-05 | global batch size: 256 | lm loss: 4.523985E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.841 | TFLOPs: 11.34 | 7: iteration 104510/ 173500 | consumed samples: 26754560 | consumed tokens: 54793338880 | elapsed time per iteration (s): 0.08 | learning rate: 8.263E-05 | global batch size: 256 | lm loss: 4.525235E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.833 | TFLOPs: 11.86 | 7: iteration 104520/ 173500 | consumed samples: 26757120 | consumed tokens: 54798581760 | elapsed time per iteration (s): 0.08 | learning rate: 8.262E-05 | global batch size: 256 | lm loss: 4.529589E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.407 | TFLOPs: 11.86 | 7: iteration 104530/ 173500 | consumed samples: 26759680 | consumed tokens: 54803824640 | elapsed time per iteration (s): 0.08 | learning rate: 8.260E-05 | global batch size: 256 | lm loss: 4.520329E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.901 | TFLOPs: 11.85 | 7: iteration 104540/ 173500 | consumed samples: 26762240 | consumed tokens: 54809067520 | elapsed time per iteration (s): 0.08 | learning rate: 8.259E-05 | global batch size: 256 | lm loss: 4.513633E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.545 | TFLOPs: 11.79 | 7: iteration 104550/ 173500 | consumed samples: 26764800 | consumed tokens: 54814310400 | elapsed time per iteration (s): 0.08 | learning rate: 8.257E-05 | global batch size: 256 | lm loss: 4.516174E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.673 | TFLOPs: 11.89 | 7: iteration 104560/ 173500 | consumed samples: 26767360 | consumed tokens: 54819553280 | elapsed time per iteration (s): 0.08 | learning rate: 8.256E-05 | global batch size: 256 | lm loss: 4.526802E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.901 | TFLOPs: 11.90 | 7: iteration 104570/ 173500 | consumed samples: 26769920 | consumed tokens: 54824796160 | elapsed time per iteration (s): 0.08 | learning rate: 8.254E-05 | global batch size: 256 | lm loss: 4.519090E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.302 | TFLOPs: 11.82 | 7: iteration 104580/ 173500 | consumed samples: 26772480 | consumed tokens: 54830039040 | elapsed time per iteration (s): 0.08 | learning rate: 8.252E-05 | global batch size: 256 | lm loss: 4.535888E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.670 | TFLOPs: 11.88 | 7: iteration 104590/ 173500 | consumed samples: 26775040 | consumed tokens: 54835281920 | elapsed time per iteration (s): 0.08 | learning rate: 8.251E-05 | global batch size: 256 | lm loss: 4.508687E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.724 | TFLOPs: 11.82 | 7: iteration 104600/ 173500 | consumed samples: 26777600 | consumed tokens: 54840524800 | elapsed time per iteration (s): 0.08 | learning rate: 8.249E-05 | global batch size: 256 | lm loss: 4.536958E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.243 | TFLOPs: 11.90 | 7: iteration 104610/ 173500 | consumed samples: 26780160 | consumed tokens: 54845767680 | elapsed time per iteration (s): 0.08 | learning rate: 8.248E-05 | global batch size: 256 | lm loss: 4.520148E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.290 | TFLOPs: 11.91 | 7: iteration 104620/ 173500 | consumed samples: 26782720 | consumed tokens: 54851010560 | elapsed time per iteration (s): 0.08 | learning rate: 8.246E-05 | global batch size: 256 | lm loss: 4.511879E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.723 | TFLOPs: 11.90 | 7: iteration 104630/ 173500 | consumed samples: 26785280 | consumed tokens: 54856253440 | elapsed time per iteration (s): 0.08 | learning rate: 8.245E-05 | global batch size: 256 | lm loss: 4.523970E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.121 | TFLOPs: 11.92 | 7: iteration 104640/ 173500 | consumed samples: 26787840 | consumed tokens: 54861496320 | elapsed time per iteration (s): 0.08 | learning rate: 8.243E-05 | global batch size: 256 | lm loss: 4.518566E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.610 | TFLOPs: 11.85 | 7: iteration 104650/ 173500 | consumed samples: 26790400 | consumed tokens: 54866739200 | elapsed time per iteration (s): 0.08 | learning rate: 8.241E-05 | global batch size: 256 | lm loss: 4.518250E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.350 | TFLOPs: 11.88 | 7: iteration 104660/ 173500 | consumed samples: 26792960 | consumed tokens: 54871982080 | elapsed time per iteration (s): 0.08 | learning rate: 8.240E-05 | global batch size: 256 | lm loss: 4.519576E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.207 | TFLOPs: 11.89 | 7: iteration 104670/ 173500 | consumed samples: 26795520 | consumed tokens: 54877224960 | elapsed time per iteration (s): 0.08 | learning rate: 8.238E-05 | global batch size: 256 | lm loss: 4.518878E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.312 | TFLOPs: 11.91 | 7: iteration 104680/ 173500 | consumed samples: 26798080 | consumed tokens: 54882467840 | elapsed time per iteration (s): 0.08 | learning rate: 8.237E-05 | global batch size: 256 | lm loss: 4.510455E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.413 | TFLOPs: 11.87 | 7: iteration 104690/ 173500 | consumed samples: 26800640 | consumed tokens: 54887710720 | elapsed time per iteration (s): 0.08 | learning rate: 8.235E-05 | global batch size: 256 | lm loss: 4.508599E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.274 | TFLOPs: 11.90 | 7: iteration 104700/ 173500 | consumed samples: 26803200 | consumed tokens: 54892953600 | elapsed time per iteration (s): 0.08 | learning rate: 8.234E-05 | global batch size: 256 | lm loss: 4.526276E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.116 | TFLOPs: 11.86 | 7: iteration 104710/ 173500 | consumed samples: 26805760 | consumed tokens: 54898196480 | elapsed time per iteration (s): 0.08 | learning rate: 8.232E-05 | global batch size: 256 | lm loss: 4.542915E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.149 | TFLOPs: 11.91 | 7: iteration 104720/ 173500 | consumed samples: 26808320 | consumed tokens: 54903439360 | elapsed time per iteration (s): 0.08 | learning rate: 8.230E-05 | global batch size: 256 | lm loss: 4.511314E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.407 | TFLOPs: 11.89 | 7: iteration 104730/ 173500 | consumed samples: 26810880 | consumed tokens: 54908682240 | elapsed time per iteration (s): 0.08 | learning rate: 8.229E-05 | global batch size: 256 | lm loss: 4.514104E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.313 | TFLOPs: 11.90 | 7: iteration 104740/ 173500 | consumed samples: 26813440 | consumed tokens: 54913925120 | elapsed time per iteration (s): 0.08 | learning rate: 8.227E-05 | global batch size: 256 | lm loss: 4.515451E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.473 | TFLOPs: 11.71 | 7: iteration 104750/ 173500 | consumed samples: 26816000 | consumed tokens: 54919168000 | elapsed time per iteration (s): 0.08 | learning rate: 8.226E-05 | global batch size: 256 | lm loss: 4.512333E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.284 | TFLOPs: 11.92 | 7: iteration 104760/ 173500 | consumed samples: 26818560 | consumed tokens: 54924410880 | elapsed time per iteration (s): 0.08 | learning rate: 8.224E-05 | global batch size: 256 | lm loss: 4.529439E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.361 | TFLOPs: 11.83 | 7: iteration 104770/ 173500 | consumed samples: 26821120 | consumed tokens: 54929653760 | elapsed time per iteration (s): 0.08 | learning rate: 8.223E-05 | global batch size: 256 | lm loss: 4.509832E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.121 | TFLOPs: 11.81 | 7: iteration 104780/ 173500 | consumed samples: 26823680 | consumed tokens: 54934896640 | elapsed time per iteration (s): 0.08 | learning rate: 8.221E-05 | global batch size: 256 | lm loss: 4.531728E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.044 | TFLOPs: 11.80 | 7: iteration 104790/ 173500 | consumed samples: 26826240 | consumed tokens: 54940139520 | elapsed time per iteration (s): 0.08 | learning rate: 8.220E-05 | global batch size: 256 | lm loss: 4.528239E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.978 | TFLOPs: 11.81 | 7: iteration 104800/ 173500 | consumed samples: 26828800 | consumed tokens: 54945382400 | elapsed time per iteration (s): 0.08 | learning rate: 8.218E-05 | global batch size: 256 | lm loss: 4.515334E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.049 | TFLOPs: 11.81 | 7: iteration 104810/ 173500 | consumed samples: 26831360 | consumed tokens: 54950625280 | elapsed time per iteration (s): 0.08 | learning rate: 8.216E-05 | global batch size: 256 | lm loss: 4.531984E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.320 | TFLOPs: 11.81 | 7: iteration 104820/ 173500 | consumed samples: 26833920 | consumed tokens: 54955868160 | elapsed time per iteration (s): 0.08 | learning rate: 8.215E-05 | global batch size: 256 | lm loss: 4.523681E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.356 | TFLOPs: 11.72 | 7: iteration 104830/ 173500 | consumed samples: 26836480 | consumed tokens: 54961111040 | elapsed time per iteration (s): 0.08 | learning rate: 8.213E-05 | global batch size: 256 | lm loss: 4.533913E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.056 | TFLOPs: 11.71 | 7: iteration 104840/ 173500 | consumed samples: 26839040 | consumed tokens: 54966353920 | elapsed time per iteration (s): 0.08 | learning rate: 8.212E-05 | global batch size: 256 | lm loss: 4.525748E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.965 | TFLOPs: 11.83 | 7: iteration 104850/ 173500 | consumed samples: 26841600 | consumed tokens: 54971596800 | elapsed time per iteration (s): 0.08 | learning rate: 8.210E-05 | global batch size: 256 | lm loss: 4.521225E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.300 | TFLOPs: 11.88 | 7: iteration 104860/ 173500 | consumed samples: 26844160 | consumed tokens: 54976839680 | elapsed time per iteration (s): 0.08 | learning rate: 8.209E-05 | global batch size: 256 | lm loss: 4.516927E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.571 | TFLOPs: 11.86 | 7: iteration 104870/ 173500 | consumed samples: 26846720 | consumed tokens: 54982082560 | elapsed time per iteration (s): 0.08 | learning rate: 8.207E-05 | global batch size: 256 | lm loss: 4.509368E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.331 | TFLOPs: 11.89 | 7: iteration 104880/ 173500 | consumed samples: 26849280 | consumed tokens: 54987325440 | elapsed time per iteration (s): 0.08 | learning rate: 8.205E-05 | global batch size: 256 | lm loss: 4.525930E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.647 | TFLOPs: 11.86 | 7: iteration 104890/ 173500 | consumed samples: 26851840 | consumed tokens: 54992568320 | elapsed time per iteration (s): 0.08 | learning rate: 8.204E-05 | global batch size: 256 | lm loss: 4.525042E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.823 | TFLOPs: 11.75 | 7: iteration 104900/ 173500 | consumed samples: 26854400 | consumed tokens: 54997811200 | elapsed time per iteration (s): 0.08 | learning rate: 8.202E-05 | global batch size: 256 | lm loss: 4.531985E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.635 | TFLOPs: 11.86 | 7: iteration 104910/ 173500 | consumed samples: 26856960 | consumed tokens: 55003054080 | elapsed time per iteration (s): 0.08 | learning rate: 8.201E-05 | global batch size: 256 | lm loss: 4.516118E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.479 | TFLOPs: 11.86 | 7: iteration 104920/ 173500 | consumed samples: 26859520 | consumed tokens: 55008296960 | elapsed time per iteration (s): 0.08 | learning rate: 8.199E-05 | global batch size: 256 | lm loss: 4.513689E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.425 | TFLOPs: 11.86 | 7: iteration 104930/ 173500 | consumed samples: 26862080 | consumed tokens: 55013539840 | elapsed time per iteration (s): 0.08 | learning rate: 8.198E-05 | global batch size: 256 | lm loss: 4.498905E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.630 | TFLOPs: 11.90 | 7: iteration 104940/ 173500 | consumed samples: 26864640 | consumed tokens: 55018782720 | elapsed time per iteration (s): 0.08 | learning rate: 8.196E-05 | global batch size: 256 | lm loss: 4.522002E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.811 | TFLOPs: 11.91 | 7: iteration 104950/ 173500 | consumed samples: 26867200 | consumed tokens: 55024025600 | elapsed time per iteration (s): 0.08 | learning rate: 8.194E-05 | global batch size: 256 | lm loss: 4.525398E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.921 | TFLOPs: 11.89 | 7: iteration 104960/ 173500 | consumed samples: 26869760 | consumed tokens: 55029268480 | elapsed time per iteration (s): 0.08 | learning rate: 8.193E-05 | global batch size: 256 | lm loss: 4.509109E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.798 | TFLOPs: 11.87 | 7: iteration 104970/ 173500 | consumed samples: 26872320 | consumed tokens: 55034511360 | elapsed time per iteration (s): 0.08 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 4.515003E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.237 | TFLOPs: 11.75 | 7: iteration 104980/ 173500 | consumed samples: 26874880 | consumed tokens: 55039754240 | elapsed time per iteration (s): 0.08 | learning rate: 8.190E-05 | global batch size: 256 | lm loss: 4.530001E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.923 | TFLOPs: 12.03 | 7: iteration 104990/ 173500 | consumed samples: 26877440 | consumed tokens: 55044997120 | elapsed time per iteration (s): 0.08 | learning rate: 8.188E-05 | global batch size: 256 | lm loss: 4.508595E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.481 | TFLOPs: 12.06 | 7: iteration 105000/ 173500 | consumed samples: 26880000 | consumed tokens: 55050240000 | elapsed time per iteration (s): 0.08 | learning rate: 8.187E-05 | global batch size: 256 | lm loss: 4.506666E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.217 | TFLOPs: 12.03 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 105000 | lm loss value: 4.396361E+00 | lm loss PPL: 8.115500E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 105000 to checkpoints_14m91b100m 0: [2023-03-17 02:47:04,049] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step105000 is begin to save! 0: [2023-03-17 02:47:04,054] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:47:04,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:47:04,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:47:04,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:47:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:47:04,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:47:04,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:47:04,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:47:04,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:47:04,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:47:04,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:47:04,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:47:04,096] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step105000/mp_rank_00_model_states.pt 0: [2023-03-17 02:47:04,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:47:04,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:47:04,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:47:04,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2023-03-17 02:47:04,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:47:04,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:47:04,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:47:04,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:47:04,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2023-03-17 02:47:04,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:47:04,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:47:04,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: successfully saved checkpoint at iteration 105000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 85.21 7: iteration 105010/ 173500 | consumed samples: 26882560 | consumed tokens: 55055482880 | elapsed time per iteration (s): 0.09 | learning rate: 8.185E-05 | global batch size: 256 | lm loss: 4.521194E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2705.970 | TFLOPs: 10.07 | 7: iteration 105020/ 173500 | consumed samples: 26885120 | consumed tokens: 55060725760 | elapsed time per iteration (s): 0.08 | learning rate: 8.184E-05 | global batch size: 256 | lm loss: 4.518020E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.285 | TFLOPs: 12.03 | 7: iteration 105030/ 173500 | consumed samples: 26887680 | consumed tokens: 55065968640 | elapsed time per iteration (s): 0.08 | learning rate: 8.182E-05 | global batch size: 256 | lm loss: 4.516676E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.618 | TFLOPs: 12.02 | 7: iteration 105040/ 173500 | consumed samples: 26890240 | consumed tokens: 55071211520 | elapsed time per iteration (s): 0.08 | learning rate: 8.180E-05 | global batch size: 256 | lm loss: 4.512928E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.544 | TFLOPs: 11.71 | 7: iteration 105050/ 173500 | consumed samples: 26892800 | consumed tokens: 55076454400 | elapsed time per iteration (s): 0.08 | learning rate: 8.179E-05 | global batch size: 256 | lm loss: 4.511271E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.591 | TFLOPs: 11.94 | 7: iteration 105060/ 173500 | consumed samples: 26895360 | consumed tokens: 55081697280 | elapsed time per iteration (s): 0.08 | learning rate: 8.177E-05 | global batch size: 256 | lm loss: 4.525758E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.664 | TFLOPs: 11.96 | 7: iteration 105070/ 173500 | consumed samples: 26897920 | consumed tokens: 55086940160 | elapsed time per iteration (s): 0.09 | learning rate: 8.176E-05 | global batch size: 256 | lm loss: 4.510199E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.366 | TFLOPs: 11.00 | 7: iteration 105080/ 173500 | consumed samples: 26900480 | consumed tokens: 55092183040 | elapsed time per iteration (s): 0.09 | learning rate: 8.174E-05 | global batch size: 256 | lm loss: 4.514091E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.828 | TFLOPs: 11.17 | 7: iteration 105090/ 173500 | consumed samples: 26903040 | consumed tokens: 55097425920 | elapsed time per iteration (s): 0.08 | learning rate: 8.173E-05 | global batch size: 256 | lm loss: 4.506319E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.692 | TFLOPs: 11.98 | 7: iteration 105100/ 173500 | consumed samples: 26905600 | consumed tokens: 55102668800 | elapsed time per iteration (s): 0.11 | learning rate: 8.171E-05 | global batch size: 256 | lm loss: 4.519366E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.863 | TFLOPs: 8.82 | 7: iteration 105110/ 173500 | consumed samples: 26908160 | consumed tokens: 55107911680 | elapsed time per iteration (s): 0.10 | learning rate: 8.169E-05 | global batch size: 256 | lm loss: 4.528439E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.674 | TFLOPs: 9.72 | 7: iteration 105120/ 173500 | consumed samples: 26910720 | consumed tokens: 55113154560 | elapsed time per iteration (s): 0.11 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 4.520854E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.757 | TFLOPs: 8.35 | 7: iteration 105130/ 173500 | consumed samples: 26913280 | consumed tokens: 55118397440 | elapsed time per iteration (s): 0.13 | learning rate: 8.166E-05 | global batch size: 256 | lm loss: 4.509623E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1934.941 | TFLOPs: 7.20 | 7: iteration 105140/ 173500 | consumed samples: 26915840 | consumed tokens: 55123640320 | elapsed time per iteration (s): 0.10 | learning rate: 8.165E-05 | global batch size: 256 | lm loss: 4.528461E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.195 | TFLOPs: 9.64 | 7: iteration 105150/ 173500 | consumed samples: 26918400 | consumed tokens: 55128883200 | elapsed time per iteration (s): 0.10 | learning rate: 8.163E-05 | global batch size: 256 | lm loss: 4.516995E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.759 | TFLOPs: 9.29 | 7: iteration 105160/ 173500 | consumed samples: 26920960 | consumed tokens: 55134126080 | elapsed time per iteration (s): 0.11 | learning rate: 8.162E-05 | global batch size: 256 | lm loss: 4.517625E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2371.578 | TFLOPs: 8.82 | 7: iteration 105170/ 173500 | consumed samples: 26923520 | consumed tokens: 55139368960 | elapsed time per iteration (s): 0.10 | learning rate: 8.160E-05 | global batch size: 256 | lm loss: 4.514131E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.030 | TFLOPs: 9.64 | 7: iteration 105180/ 173500 | consumed samples: 26926080 | consumed tokens: 55144611840 | elapsed time per iteration (s): 0.08 | learning rate: 8.159E-05 | global batch size: 256 | lm loss: 4.527351E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.033 | TFLOPs: 11.95 | 7: iteration 105190/ 173500 | consumed samples: 26928640 | consumed tokens: 55149854720 | elapsed time per iteration (s): 0.08 | learning rate: 8.157E-05 | global batch size: 256 | lm loss: 4.519805E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.678 | TFLOPs: 12.06 | 7: iteration 105200/ 173500 | consumed samples: 26931200 | consumed tokens: 55155097600 | elapsed time per iteration (s): 0.12 | learning rate: 8.155E-05 | global batch size: 256 | lm loss: 4.509850E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2151.122 | TFLOPs: 8.00 | 7: iteration 105210/ 173500 | consumed samples: 26933760 | consumed tokens: 55160340480 | elapsed time per iteration (s): 0.11 | learning rate: 8.154E-05 | global batch size: 256 | lm loss: 4.513583E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.965 | TFLOPs: 8.50 | 7: iteration 105220/ 173500 | consumed samples: 26936320 | consumed tokens: 55165583360 | elapsed time per iteration (s): 0.08 | learning rate: 8.152E-05 | global batch size: 256 | lm loss: 4.502908E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.830 | TFLOPs: 11.64 | 7: iteration 105230/ 173500 | consumed samples: 26938880 | consumed tokens: 55170826240 | elapsed time per iteration (s): 0.08 | learning rate: 8.151E-05 | global batch size: 256 | lm loss: 4.518761E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.205 | TFLOPs: 11.60 | 7: iteration 105240/ 173500 | consumed samples: 26941440 | consumed tokens: 55176069120 | elapsed time per iteration (s): 0.08 | learning rate: 8.149E-05 | global batch size: 256 | lm loss: 4.522082E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.060 | TFLOPs: 11.77 | 7: iteration 105250/ 173500 | consumed samples: 26944000 | consumed tokens: 55181312000 | elapsed time per iteration (s): 0.08 | learning rate: 8.148E-05 | global batch size: 256 | lm loss: 4.526532E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.431 | TFLOPs: 11.84 | 7: iteration 105260/ 173500 | consumed samples: 26946560 | consumed tokens: 55186554880 | elapsed time per iteration (s): 0.08 | learning rate: 8.146E-05 | global batch size: 256 | lm loss: 4.527197E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.740 | TFLOPs: 11.82 | 7: iteration 105270/ 173500 | consumed samples: 26949120 | consumed tokens: 55191797760 | elapsed time per iteration (s): 0.08 | learning rate: 8.144E-05 | global batch size: 256 | lm loss: 4.517710E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.628 | TFLOPs: 11.79 | 7: iteration 105280/ 173500 | consumed samples: 26951680 | consumed tokens: 55197040640 | elapsed time per iteration (s): 0.08 | learning rate: 8.143E-05 | global batch size: 256 | lm loss: 4.513618E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.216 | TFLOPs: 11.78 | 7: iteration 105290/ 173500 | consumed samples: 26954240 | consumed tokens: 55202283520 | elapsed time per iteration (s): 0.08 | learning rate: 8.141E-05 | global batch size: 256 | lm loss: 4.509896E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.439 | TFLOPs: 11.83 | 7: iteration 105300/ 173500 | consumed samples: 26956800 | consumed tokens: 55207526400 | elapsed time per iteration (s): 0.08 | learning rate: 8.140E-05 | global batch size: 256 | lm loss: 4.520397E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.124 | TFLOPs: 11.90 | 7: iteration 105310/ 173500 | consumed samples: 26959360 | consumed tokens: 55212769280 | elapsed time per iteration (s): 0.08 | learning rate: 8.138E-05 | global batch size: 256 | lm loss: 4.513956E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.207 | TFLOPs: 11.84 | 7: iteration 105320/ 173500 | consumed samples: 26961920 | consumed tokens: 55218012160 | elapsed time per iteration (s): 0.08 | learning rate: 8.137E-05 | global batch size: 256 | lm loss: 4.516555E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.628 | TFLOPs: 11.82 | 7: iteration 105330/ 173500 | consumed samples: 26964480 | consumed tokens: 55223255040 | elapsed time per iteration (s): 0.08 | learning rate: 8.135E-05 | global batch size: 256 | lm loss: 4.503869E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.403 | TFLOPs: 11.78 | 7: iteration 105340/ 173500 | consumed samples: 26967040 | consumed tokens: 55228497920 | elapsed time per iteration (s): 0.08 | learning rate: 8.134E-05 | global batch size: 256 | lm loss: 4.516928E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.575 | TFLOPs: 11.94 | 7: iteration 105350/ 173500 | consumed samples: 26969600 | consumed tokens: 55233740800 | elapsed time per iteration (s): 0.11 | learning rate: 8.132E-05 | global batch size: 256 | lm loss: 4.507687E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.895 | TFLOPs: 8.32 | 7: iteration 105360/ 173500 | consumed samples: 26972160 | consumed tokens: 55238983680 | elapsed time per iteration (s): 0.11 | learning rate: 8.130E-05 | global batch size: 256 | lm loss: 4.525869E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2236.681 | TFLOPs: 8.32 | 7: iteration 105370/ 173500 | consumed samples: 26974720 | consumed tokens: 55244226560 | elapsed time per iteration (s): 0.10 | learning rate: 8.129E-05 | global batch size: 256 | lm loss: 4.516642E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.423 | TFLOPs: 9.30 | 7: iteration 105380/ 173500 | consumed samples: 26977280 | consumed tokens: 55249469440 | elapsed time per iteration (s): 0.10 | learning rate: 8.127E-05 | global batch size: 256 | lm loss: 4.512552E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2654.529 | TFLOPs: 9.87 | 7: iteration 105390/ 173500 | consumed samples: 26979840 | consumed tokens: 55254712320 | elapsed time per iteration (s): 0.08 | learning rate: 8.126E-05 | global batch size: 256 | lm loss: 4.523170E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.831 | TFLOPs: 11.84 | 7: iteration 105400/ 173500 | consumed samples: 26982400 | consumed tokens: 55259955200 | elapsed time per iteration (s): 0.08 | learning rate: 8.124E-05 | global batch size: 256 | lm loss: 4.522852E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.142 | TFLOPs: 11.77 | 7: iteration 105410/ 173500 | consumed samples: 26984960 | consumed tokens: 55265198080 | elapsed time per iteration (s): 0.08 | learning rate: 8.123E-05 | global batch size: 256 | lm loss: 4.510208E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.020 | TFLOPs: 11.91 | 7: iteration 105420/ 173500 | consumed samples: 26987520 | consumed tokens: 55270440960 | elapsed time per iteration (s): 0.08 | learning rate: 8.121E-05 | global batch size: 256 | lm loss: 4.508723E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.468 | TFLOPs: 11.96 | 7: iteration 105430/ 173500 | consumed samples: 26990080 | consumed tokens: 55275683840 | elapsed time per iteration (s): 0.08 | learning rate: 8.120E-05 | global batch size: 256 | lm loss: 4.511466E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.034 | TFLOPs: 11.94 | 7: iteration 105440/ 173500 | consumed samples: 26992640 | consumed tokens: 55280926720 | elapsed time per iteration (s): 0.08 | learning rate: 8.118E-05 | global batch size: 256 | lm loss: 4.517317E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.388 | TFLOPs: 11.87 | 7: iteration 105450/ 173500 | consumed samples: 26995200 | consumed tokens: 55286169600 | elapsed time per iteration (s): 0.08 | learning rate: 8.116E-05 | global batch size: 256 | lm loss: 4.527633E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.455 | TFLOPs: 11.71 | 7: iteration 105460/ 173500 | consumed samples: 26997760 | consumed tokens: 55291412480 | elapsed time per iteration (s): 0.08 | learning rate: 8.115E-05 | global batch size: 256 | lm loss: 4.509211E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.953 | TFLOPs: 11.82 | 7: iteration 105470/ 173500 | consumed samples: 27000320 | consumed tokens: 55296655360 | elapsed time per iteration (s): 0.08 | learning rate: 8.113E-05 | global batch size: 256 | lm loss: 4.517440E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.167 | TFLOPs: 12.01 | 7: iteration 105480/ 173500 | consumed samples: 27002880 | consumed tokens: 55301898240 | elapsed time per iteration (s): 0.08 | learning rate: 8.112E-05 | global batch size: 256 | lm loss: 4.521589E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.350 | TFLOPs: 11.74 | 7: iteration 105490/ 173500 | consumed samples: 27005440 | consumed tokens: 55307141120 | elapsed time per iteration (s): 0.08 | learning rate: 8.110E-05 | global batch size: 256 | lm loss: 4.519180E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.115 | TFLOPs: 11.66 | 7: iteration 105500/ 173500 | consumed samples: 27008000 | consumed tokens: 55312384000 | elapsed time per iteration (s): 0.08 | learning rate: 8.109E-05 | global batch size: 256 | lm loss: 4.516750E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.255 | TFLOPs: 11.77 | 7: iteration 105510/ 173500 | consumed samples: 27010560 | consumed tokens: 55317626880 | elapsed time per iteration (s): 0.08 | learning rate: 8.107E-05 | global batch size: 256 | lm loss: 4.515491E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.966 | TFLOPs: 11.87 | 7: iteration 105520/ 173500 | consumed samples: 27013120 | consumed tokens: 55322869760 | elapsed time per iteration (s): 0.08 | learning rate: 8.105E-05 | global batch size: 256 | lm loss: 4.526903E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.903 | TFLOPs: 11.82 | 7: iteration 105530/ 173500 | consumed samples: 27015680 | consumed tokens: 55328112640 | elapsed time per iteration (s): 0.08 | learning rate: 8.104E-05 | global batch size: 256 | lm loss: 4.515344E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.786 | TFLOPs: 11.88 | 7: iteration 105540/ 173500 | consumed samples: 27018240 | consumed tokens: 55333355520 | elapsed time per iteration (s): 0.08 | learning rate: 8.102E-05 | global batch size: 256 | lm loss: 4.508877E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.182 | TFLOPs: 11.85 | 7: iteration 105550/ 173500 | consumed samples: 27020800 | consumed tokens: 55338598400 | elapsed time per iteration (s): 0.08 | learning rate: 8.101E-05 | global batch size: 256 | lm loss: 4.529200E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.355 | TFLOPs: 11.90 | 7: iteration 105560/ 173500 | consumed samples: 27023360 | consumed tokens: 55343841280 | elapsed time per iteration (s): 0.08 | learning rate: 8.099E-05 | global batch size: 256 | lm loss: 4.524753E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.979 | TFLOPs: 11.67 | 7: iteration 105570/ 173500 | consumed samples: 27025920 | consumed tokens: 55349084160 | elapsed time per iteration (s): 0.08 | learning rate: 8.098E-05 | global batch size: 256 | lm loss: 4.531112E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.649 | TFLOPs: 11.61 | 7: iteration 105580/ 173500 | consumed samples: 27028480 | consumed tokens: 55354327040 | elapsed time per iteration (s): 0.12 | learning rate: 8.096E-05 | global batch size: 256 | lm loss: 4.514334E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2159.842 | TFLOPs: 8.03 | 7: iteration 105590/ 173500 | consumed samples: 27031040 | consumed tokens: 55359569920 | elapsed time per iteration (s): 0.10 | learning rate: 8.095E-05 | global batch size: 256 | lm loss: 4.516007E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.562 | TFLOPs: 9.76 | 7: iteration 105600/ 173500 | consumed samples: 27033600 | consumed tokens: 55364812800 | elapsed time per iteration (s): 0.08 | learning rate: 8.093E-05 | global batch size: 256 | lm loss: 4.530839E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.665 | TFLOPs: 11.92 | 7: iteration 105610/ 173500 | consumed samples: 27036160 | consumed tokens: 55370055680 | elapsed time per iteration (s): 0.08 | learning rate: 8.091E-05 | global batch size: 256 | lm loss: 4.525126E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.493 | TFLOPs: 11.87 | 7: iteration 105620/ 173500 | consumed samples: 27038720 | consumed tokens: 55375298560 | elapsed time per iteration (s): 0.08 | learning rate: 8.090E-05 | global batch size: 256 | lm loss: 4.513281E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.612 | TFLOPs: 11.92 | 7: iteration 105630/ 173500 | consumed samples: 27041280 | consumed tokens: 55380541440 | elapsed time per iteration (s): 0.08 | learning rate: 8.088E-05 | global batch size: 256 | lm loss: 4.519687E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.900 | TFLOPs: 11.62 | 7: iteration 105640/ 173500 | consumed samples: 27043840 | consumed tokens: 55385784320 | elapsed time per iteration (s): 0.08 | learning rate: 8.087E-05 | global batch size: 256 | lm loss: 4.529367E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.324 | TFLOPs: 11.91 | 7: iteration 105650/ 173500 | consumed samples: 27046400 | consumed tokens: 55391027200 | elapsed time per iteration (s): 0.08 | learning rate: 8.085E-05 | global batch size: 256 | lm loss: 4.522662E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.074 | TFLOPs: 11.87 | 7: iteration 105660/ 173500 | consumed samples: 27048960 | consumed tokens: 55396270080 | elapsed time per iteration (s): 0.08 | learning rate: 8.084E-05 | global batch size: 256 | lm loss: 4.514626E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.255 | TFLOPs: 11.90 | 7: iteration 105670/ 173500 | consumed samples: 27051520 | consumed tokens: 55401512960 | elapsed time per iteration (s): 0.08 | learning rate: 8.082E-05 | global batch size: 256 | lm loss: 4.501749E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.056 | TFLOPs: 11.92 | 7: iteration 105680/ 173500 | consumed samples: 27054080 | consumed tokens: 55406755840 | elapsed time per iteration (s): 0.08 | learning rate: 8.081E-05 | global batch size: 256 | lm loss: 4.523098E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.371 | TFLOPs: 11.92 | 7: iteration 105690/ 173500 | consumed samples: 27056640 | consumed tokens: 55411998720 | elapsed time per iteration (s): 0.08 | learning rate: 8.079E-05 | global batch size: 256 | lm loss: 4.510961E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.375 | TFLOPs: 11.87 | 7: iteration 105700/ 173500 | consumed samples: 27059200 | consumed tokens: 55417241600 | elapsed time per iteration (s): 0.08 | learning rate: 8.077E-05 | global batch size: 256 | lm loss: 4.518558E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.269 | TFLOPs: 11.83 | 7: iteration 105710/ 173500 | consumed samples: 27061760 | consumed tokens: 55422484480 | elapsed time per iteration (s): 0.08 | learning rate: 8.076E-05 | global batch size: 256 | lm loss: 4.522684E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.944 | TFLOPs: 11.83 | 7: iteration 105720/ 173500 | consumed samples: 27064320 | consumed tokens: 55427727360 | elapsed time per iteration (s): 0.08 | learning rate: 8.074E-05 | global batch size: 256 | lm loss: 4.508511E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.205 | TFLOPs: 11.80 | 7: iteration 105730/ 173500 | consumed samples: 27066880 | consumed tokens: 55432970240 | elapsed time per iteration (s): 0.08 | learning rate: 8.073E-05 | global batch size: 256 | lm loss: 4.522381E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.685 | TFLOPs: 11.79 | 7: iteration 105740/ 173500 | consumed samples: 27069440 | consumed tokens: 55438213120 | elapsed time per iteration (s): 0.08 | learning rate: 8.071E-05 | global batch size: 256 | lm loss: 4.514836E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.668 | TFLOPs: 11.78 | 7: iteration 105750/ 173500 | consumed samples: 27072000 | consumed tokens: 55443456000 | elapsed time per iteration (s): 0.08 | learning rate: 8.070E-05 | global batch size: 256 | lm loss: 4.513430E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.902 | TFLOPs: 11.75 | 7: iteration 105760/ 173500 | consumed samples: 27074560 | consumed tokens: 55448698880 | elapsed time per iteration (s): 0.08 | learning rate: 8.068E-05 | global batch size: 256 | lm loss: 4.501762E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.743 | TFLOPs: 11.76 | 7: iteration 105770/ 173500 | consumed samples: 27077120 | consumed tokens: 55453941760 | elapsed time per iteration (s): 0.08 | learning rate: 8.067E-05 | global batch size: 256 | lm loss: 4.527308E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.966 | TFLOPs: 11.82 | 7: iteration 105780/ 173500 | consumed samples: 27079680 | consumed tokens: 55459184640 | elapsed time per iteration (s): 0.08 | learning rate: 8.065E-05 | global batch size: 256 | lm loss: 4.515833E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.781 | TFLOPs: 11.83 | 7: iteration 105790/ 173500 | consumed samples: 27082240 | consumed tokens: 55464427520 | elapsed time per iteration (s): 0.08 | learning rate: 8.063E-05 | global batch size: 256 | lm loss: 4.519461E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.674 | TFLOPs: 11.76 | 7: iteration 105800/ 173500 | consumed samples: 27084800 | consumed tokens: 55469670400 | elapsed time per iteration (s): 0.08 | learning rate: 8.062E-05 | global batch size: 256 | lm loss: 4.521210E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.454 | TFLOPs: 11.78 | 7: iteration 105810/ 173500 | consumed samples: 27087360 | consumed tokens: 55474913280 | elapsed time per iteration (s): 0.08 | learning rate: 8.060E-05 | global batch size: 256 | lm loss: 4.516686E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.690 | TFLOPs: 11.79 | 7: iteration 105820/ 173500 | consumed samples: 27089920 | consumed tokens: 55480156160 | elapsed time per iteration (s): 0.08 | learning rate: 8.059E-05 | global batch size: 256 | lm loss: 4.518254E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.876 | TFLOPs: 11.70 | 7: iteration 105830/ 173500 | consumed samples: 27092480 | consumed tokens: 55485399040 | elapsed time per iteration (s): 0.08 | learning rate: 8.057E-05 | global batch size: 256 | lm loss: 4.515082E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.398 | TFLOPs: 11.76 | 7: iteration 105840/ 173500 | consumed samples: 27095040 | consumed tokens: 55490641920 | elapsed time per iteration (s): 0.08 | learning rate: 8.056E-05 | global batch size: 256 | lm loss: 4.517686E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.256 | TFLOPs: 11.65 | 7: iteration 105850/ 173500 | consumed samples: 27097600 | consumed tokens: 55495884800 | elapsed time per iteration (s): 0.08 | learning rate: 8.054E-05 | global batch size: 256 | lm loss: 4.522264E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.327 | TFLOPs: 11.78 | 7: iteration 105860/ 173500 | consumed samples: 27100160 | consumed tokens: 55501127680 | elapsed time per iteration (s): 0.08 | learning rate: 8.053E-05 | global batch size: 256 | lm loss: 4.521354E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.180 | TFLOPs: 11.79 | 7: iteration 105870/ 173500 | consumed samples: 27102720 | consumed tokens: 55506370560 | elapsed time per iteration (s): 0.08 | learning rate: 8.051E-05 | global batch size: 256 | lm loss: 4.507643E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.824 | TFLOPs: 11.77 | 7: iteration 105880/ 173500 | consumed samples: 27105280 | consumed tokens: 55511613440 | elapsed time per iteration (s): 0.08 | learning rate: 8.049E-05 | global batch size: 256 | lm loss: 4.514501E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.545 | TFLOPs: 11.79 | 7: iteration 105890/ 173500 | consumed samples: 27107840 | consumed tokens: 55516856320 | elapsed time per iteration (s): 0.08 | learning rate: 8.048E-05 | global batch size: 256 | lm loss: 4.520317E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.858 | TFLOPs: 11.78 | 7: iteration 105900/ 173500 | consumed samples: 27110400 | consumed tokens: 55522099200 | elapsed time per iteration (s): 0.08 | learning rate: 8.046E-05 | global batch size: 256 | lm loss: 4.506253E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.245 | TFLOPs: 11.80 | 7: iteration 105910/ 173500 | consumed samples: 27112960 | consumed tokens: 55527342080 | elapsed time per iteration (s): 0.08 | learning rate: 8.045E-05 | global batch size: 256 | lm loss: 4.504742E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.052 | TFLOPs: 11.76 | 7: iteration 105920/ 173500 | consumed samples: 27115520 | consumed tokens: 55532584960 | elapsed time per iteration (s): 0.08 | learning rate: 8.043E-05 | global batch size: 256 | lm loss: 4.525502E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.791 | TFLOPs: 11.79 | 7: iteration 105930/ 173500 | consumed samples: 27118080 | consumed tokens: 55537827840 | elapsed time per iteration (s): 0.08 | learning rate: 8.042E-05 | global batch size: 256 | lm loss: 4.511364E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.161 | TFLOPs: 11.78 | 7: iteration 105940/ 173500 | consumed samples: 27120640 | consumed tokens: 55543070720 | elapsed time per iteration (s): 0.08 | learning rate: 8.040E-05 | global batch size: 256 | lm loss: 4.521578E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.004 | TFLOPs: 11.79 | 7: iteration 105950/ 173500 | consumed samples: 27123200 | consumed tokens: 55548313600 | elapsed time per iteration (s): 0.08 | learning rate: 8.039E-05 | global batch size: 256 | lm loss: 4.525812E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.764 | TFLOPs: 11.41 | 7: iteration 105960/ 173500 | consumed samples: 27125760 | consumed tokens: 55553556480 | elapsed time per iteration (s): 0.09 | learning rate: 8.037E-05 | global batch size: 256 | lm loss: 4.512363E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.180 | TFLOPs: 11.17 | 7: iteration 105970/ 173500 | consumed samples: 27128320 | consumed tokens: 55558799360 | elapsed time per iteration (s): 0.08 | learning rate: 8.035E-05 | global batch size: 256 | lm loss: 4.523795E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.791 | TFLOPs: 11.80 | 7: iteration 105980/ 173500 | consumed samples: 27130880 | consumed tokens: 55564042240 | elapsed time per iteration (s): 0.08 | learning rate: 8.034E-05 | global batch size: 256 | lm loss: 4.516465E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.697 | TFLOPs: 11.74 | 7: iteration 105990/ 173500 | consumed samples: 27133440 | consumed tokens: 55569285120 | elapsed time per iteration (s): 0.08 | learning rate: 8.032E-05 | global batch size: 256 | lm loss: 4.513539E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.653 | TFLOPs: 11.75 | 0: [2023-03-17 02:48:29,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=0, lr=[8.030787777917086e-05, 8.030787777917086e-05, 8.030787777917086e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 106000/ 173500 | consumed samples: 27136000 | consumed tokens: 55574528000 | elapsed time per iteration (s): 0.08 | learning rate: 8.031E-05 | global batch size: 256 | lm loss: 4.499207E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.996 | TFLOPs: 11.78 | 0: steps: 106000 loss: 4.4856 iter time (s): 0.083 samples/sec: 3101.148 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 106000 | lm loss value: 4.396091E+00 | lm loss PPL: 8.113314E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 106000 to checkpoints_14m91b100m 0: [2023-03-17 02:48:29,415] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step106000 is begin to save! 0: [2023-03-17 02:48:29,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:48:29,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:48:29,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:48:29,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:48:29,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:48:29,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:48:29,452] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:48:29,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:48:29,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:48:29,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:48:29,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:48:29,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:48:29,459] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step106000/mp_rank_00_model_states.pt 0: [2023-03-17 02:48:29,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:48:29,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:48:29,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:48:29,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2023-03-17 02:48:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:48:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2023-03-17 02:48:29,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:48:29,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:48:29,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: successfully saved checkpoint at iteration 106000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.58 7: iteration 106010/ 173500 | consumed samples: 27138560 | consumed tokens: 55579770880 | elapsed time per iteration (s): 0.10 | learning rate: 8.029E-05 | global batch size: 256 | lm loss: 4.522526E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.861 | TFLOPs: 9.92 | 7: iteration 106020/ 173500 | consumed samples: 27141120 | consumed tokens: 55585013760 | elapsed time per iteration (s): 0.08 | learning rate: 8.028E-05 | global batch size: 256 | lm loss: 4.513125E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.041 | TFLOPs: 11.83 | 7: iteration 106030/ 173500 | consumed samples: 27143680 | consumed tokens: 55590256640 | elapsed time per iteration (s): 0.08 | learning rate: 8.026E-05 | global batch size: 256 | lm loss: 4.522975E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.432 | TFLOPs: 11.81 | 7: iteration 106040/ 173500 | consumed samples: 27146240 | consumed tokens: 55595499520 | elapsed time per iteration (s): 0.08 | learning rate: 8.025E-05 | global batch size: 256 | lm loss: 4.514295E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.212 | TFLOPs: 11.73 | 7: iteration 106050/ 173500 | consumed samples: 27148800 | consumed tokens: 55600742400 | elapsed time per iteration (s): 0.08 | learning rate: 8.023E-05 | global batch size: 256 | lm loss: 4.505547E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.861 | TFLOPs: 11.83 | 7: iteration 106060/ 173500 | consumed samples: 27151360 | consumed tokens: 55605985280 | elapsed time per iteration (s): 0.08 | learning rate: 8.021E-05 | global batch size: 256 | lm loss: 4.519079E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.811 | TFLOPs: 11.80 | 7: iteration 106070/ 173500 | consumed samples: 27153920 | consumed tokens: 55611228160 | elapsed time per iteration (s): 0.08 | learning rate: 8.020E-05 | global batch size: 256 | lm loss: 4.525534E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.381 | TFLOPs: 11.83 | 7: iteration 106080/ 173500 | consumed samples: 27156480 | consumed tokens: 55616471040 | elapsed time per iteration (s): 0.08 | learning rate: 8.018E-05 | global batch size: 256 | lm loss: 4.502610E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.826 | TFLOPs: 11.73 | 7: iteration 106090/ 173500 | consumed samples: 27159040 | consumed tokens: 55621713920 | elapsed time per iteration (s): 0.08 | learning rate: 8.017E-05 | global batch size: 256 | lm loss: 4.534378E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.844 | TFLOPs: 11.81 | 7: iteration 106100/ 173500 | consumed samples: 27161600 | consumed tokens: 55626956800 | elapsed time per iteration (s): 0.08 | learning rate: 8.015E-05 | global batch size: 256 | lm loss: 4.523437E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.449 | TFLOPs: 11.80 | 7: iteration 106110/ 173500 | consumed samples: 27164160 | consumed tokens: 55632199680 | elapsed time per iteration (s): 0.08 | learning rate: 8.014E-05 | global batch size: 256 | lm loss: 4.505663E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.923 | TFLOPs: 11.78 | 7: iteration 106120/ 173500 | consumed samples: 27166720 | consumed tokens: 55637442560 | elapsed time per iteration (s): 0.08 | learning rate: 8.012E-05 | global batch size: 256 | lm loss: 4.527101E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.416 | TFLOPs: 11.77 | 7: iteration 106130/ 173500 | consumed samples: 27169280 | consumed tokens: 55642685440 | elapsed time per iteration (s): 0.08 | learning rate: 8.011E-05 | global batch size: 256 | lm loss: 4.510393E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.271 | TFLOPs: 11.77 | 7: iteration 106140/ 173500 | consumed samples: 27171840 | consumed tokens: 55647928320 | elapsed time per iteration (s): 0.08 | learning rate: 8.009E-05 | global batch size: 256 | lm loss: 4.519664E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.588 | TFLOPs: 11.81 | 7: iteration 106150/ 173500 | consumed samples: 27174400 | consumed tokens: 55653171200 | elapsed time per iteration (s): 0.08 | learning rate: 8.007E-05 | global batch size: 256 | lm loss: 4.517265E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.944 | TFLOPs: 11.81 | 7: iteration 106160/ 173500 | consumed samples: 27176960 | consumed tokens: 55658414080 | elapsed time per iteration (s): 0.08 | learning rate: 8.006E-05 | global batch size: 256 | lm loss: 4.525850E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.299 | TFLOPs: 11.77 | 7: iteration 106170/ 173500 | consumed samples: 27179520 | consumed tokens: 55663656960 | elapsed time per iteration (s): 0.08 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 4.512194E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.218 | TFLOPs: 11.74 | 7: iteration 106180/ 173500 | consumed samples: 27182080 | consumed tokens: 55668899840 | elapsed time per iteration (s): 0.08 | learning rate: 8.003E-05 | global batch size: 256 | lm loss: 4.527148E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.699 | TFLOPs: 11.78 | 7: iteration 106190/ 173500 | consumed samples: 27184640 | consumed tokens: 55674142720 | elapsed time per iteration (s): 0.09 | learning rate: 8.001E-05 | global batch size: 256 | lm loss: 4.508316E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2849.211 | TFLOPs: 10.60 | 7: iteration 106200/ 173500 | consumed samples: 27187200 | consumed tokens: 55679385600 | elapsed time per iteration (s): 0.08 | learning rate: 8.000E-05 | global batch size: 256 | lm loss: 4.517575E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.009 | TFLOPs: 11.81 | 7: iteration 106210/ 173500 | consumed samples: 27189760 | consumed tokens: 55684628480 | elapsed time per iteration (s): 0.10 | learning rate: 7.998E-05 | global batch size: 256 | lm loss: 4.521649E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2537.624 | TFLOPs: 9.44 | 7: iteration 106220/ 173500 | consumed samples: 27192320 | consumed tokens: 55689871360 | elapsed time per iteration (s): 0.09 | learning rate: 7.997E-05 | global batch size: 256 | lm loss: 4.515572E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.567 | TFLOPs: 10.22 | 7: iteration 106230/ 173500 | consumed samples: 27194880 | consumed tokens: 55695114240 | elapsed time per iteration (s): 0.08 | learning rate: 7.995E-05 | global batch size: 256 | lm loss: 4.515186E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.167 | TFLOPs: 11.83 | 7: iteration 106240/ 173500 | consumed samples: 27197440 | consumed tokens: 55700357120 | elapsed time per iteration (s): 0.08 | learning rate: 7.994E-05 | global batch size: 256 | lm loss: 4.526308E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.992 | TFLOPs: 11.84 | 7: iteration 106250/ 173500 | consumed samples: 27200000 | consumed tokens: 55705600000 | elapsed time per iteration (s): 0.08 | learning rate: 7.992E-05 | global batch size: 256 | lm loss: 4.511252E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.390 | TFLOPs: 11.57 | 7: iteration 106260/ 173500 | consumed samples: 27202560 | consumed tokens: 55710842880 | elapsed time per iteration (s): 0.08 | learning rate: 7.990E-05 | global batch size: 256 | lm loss: 4.517587E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.005 | TFLOPs: 11.83 | 7: iteration 106270/ 173500 | consumed samples: 27205120 | consumed tokens: 55716085760 | elapsed time per iteration (s): 0.08 | learning rate: 7.989E-05 | global batch size: 256 | lm loss: 4.516122E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.861 | TFLOPs: 11.86 | 7: iteration 106280/ 173500 | consumed samples: 27207680 | consumed tokens: 55721328640 | elapsed time per iteration (s): 0.08 | learning rate: 7.987E-05 | global batch size: 256 | lm loss: 4.515837E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.114 | TFLOPs: 11.81 | 7: iteration 106290/ 173500 | consumed samples: 27210240 | consumed tokens: 55726571520 | elapsed time per iteration (s): 0.08 | learning rate: 7.986E-05 | global batch size: 256 | lm loss: 4.515129E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.700 | TFLOPs: 11.79 | 7: iteration 106300/ 173500 | consumed samples: 27212800 | consumed tokens: 55731814400 | elapsed time per iteration (s): 0.08 | learning rate: 7.984E-05 | global batch size: 256 | lm loss: 4.517548E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.822 | TFLOPs: 11.79 | 7: iteration 106310/ 173500 | consumed samples: 27215360 | consumed tokens: 55737057280 | elapsed time per iteration (s): 0.08 | learning rate: 7.983E-05 | global batch size: 256 | lm loss: 4.528001E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.027 | TFLOPs: 11.84 | 7: iteration 106320/ 173500 | consumed samples: 27217920 | consumed tokens: 55742300160 | elapsed time per iteration (s): 0.08 | learning rate: 7.981E-05 | global batch size: 256 | lm loss: 4.515846E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.777 | TFLOPs: 11.64 | 7: iteration 106330/ 173500 | consumed samples: 27220480 | consumed tokens: 55747543040 | elapsed time per iteration (s): 0.09 | learning rate: 7.980E-05 | global batch size: 256 | lm loss: 4.522217E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2853.006 | TFLOPs: 10.61 | 7: iteration 106340/ 173500 | consumed samples: 27223040 | consumed tokens: 55752785920 | elapsed time per iteration (s): 0.08 | learning rate: 7.978E-05 | global batch size: 256 | lm loss: 4.521555E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.220 | TFLOPs: 11.20 | 7: iteration 106350/ 173500 | consumed samples: 27225600 | consumed tokens: 55758028800 | elapsed time per iteration (s): 0.09 | learning rate: 7.976E-05 | global batch size: 256 | lm loss: 4.520176E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2944.865 | TFLOPs: 10.95 | 7: iteration 106360/ 173500 | consumed samples: 27228160 | consumed tokens: 55763271680 | elapsed time per iteration (s): 0.08 | learning rate: 7.975E-05 | global batch size: 256 | lm loss: 4.515878E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.706 | TFLOPs: 11.57 | 7: iteration 106370/ 173500 | consumed samples: 27230720 | consumed tokens: 55768514560 | elapsed time per iteration (s): 0.08 | learning rate: 7.973E-05 | global batch size: 256 | lm loss: 4.509147E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.670 | TFLOPs: 11.83 | 7: iteration 106380/ 173500 | consumed samples: 27233280 | consumed tokens: 55773757440 | elapsed time per iteration (s): 0.08 | learning rate: 7.972E-05 | global batch size: 256 | lm loss: 4.513883E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.876 | TFLOPs: 11.85 | 7: iteration 106390/ 173500 | consumed samples: 27235840 | consumed tokens: 55779000320 | elapsed time per iteration (s): 0.08 | learning rate: 7.970E-05 | global batch size: 256 | lm loss: 4.524758E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.170 | TFLOPs: 11.85 | 7: iteration 106400/ 173500 | consumed samples: 27238400 | consumed tokens: 55784243200 | elapsed time per iteration (s): 0.08 | learning rate: 7.969E-05 | global batch size: 256 | lm loss: 4.519210E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.767 | TFLOPs: 11.84 | 7: iteration 106410/ 173500 | consumed samples: 27240960 | consumed tokens: 55789486080 | elapsed time per iteration (s): 0.08 | learning rate: 7.967E-05 | global batch size: 256 | lm loss: 4.517718E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.116 | TFLOPs: 11.85 | 7: iteration 106420/ 173500 | consumed samples: 27243520 | consumed tokens: 55794728960 | elapsed time per iteration (s): 0.08 | learning rate: 7.966E-05 | global batch size: 256 | lm loss: 4.523429E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.390 | TFLOPs: 11.84 | 7: iteration 106430/ 173500 | consumed samples: 27246080 | consumed tokens: 55799971840 | elapsed time per iteration (s): 0.08 | learning rate: 7.964E-05 | global batch size: 256 | lm loss: 4.524491E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.621 | TFLOPs: 11.85 | 7: iteration 106440/ 173500 | consumed samples: 27248640 | consumed tokens: 55805214720 | elapsed time per iteration (s): 0.08 | learning rate: 7.963E-05 | global batch size: 256 | lm loss: 4.518151E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.110 | TFLOPs: 11.84 | 7: iteration 106450/ 173500 | consumed samples: 27251200 | consumed tokens: 55810457600 | elapsed time per iteration (s): 0.08 | learning rate: 7.961E-05 | global batch size: 256 | lm loss: 4.516207E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.040 | TFLOPs: 11.87 | 7: iteration 106460/ 173500 | consumed samples: 27253760 | consumed tokens: 55815700480 | elapsed time per iteration (s): 0.08 | learning rate: 7.959E-05 | global batch size: 256 | lm loss: 4.525534E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.227 | TFLOPs: 11.82 | 7: iteration 106470/ 173500 | consumed samples: 27256320 | consumed tokens: 55820943360 | elapsed time per iteration (s): 0.08 | learning rate: 7.958E-05 | global batch size: 256 | lm loss: 4.514025E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.446 | TFLOPs: 11.84 | 7: iteration 106480/ 173500 | consumed samples: 27258880 | consumed tokens: 55826186240 | elapsed time per iteration (s): 0.08 | learning rate: 7.956E-05 | global batch size: 256 | lm loss: 4.517794E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.110 | TFLOPs: 11.83 | 7: iteration 106490/ 173500 | consumed samples: 27261440 | consumed tokens: 55831429120 | elapsed time per iteration (s): 0.08 | learning rate: 7.955E-05 | global batch size: 256 | lm loss: 4.514977E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.920 | TFLOPs: 11.79 | 7: iteration 106500/ 173500 | consumed samples: 27264000 | consumed tokens: 55836672000 | elapsed time per iteration (s): 0.08 | learning rate: 7.953E-05 | global batch size: 256 | lm loss: 4.517957E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.992 | TFLOPs: 11.76 | 7: iteration 106510/ 173500 | consumed samples: 27266560 | consumed tokens: 55841914880 | elapsed time per iteration (s): 0.08 | learning rate: 7.952E-05 | global batch size: 256 | lm loss: 4.524645E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.211 | TFLOPs: 11.83 | 7: iteration 106520/ 173500 | consumed samples: 27269120 | consumed tokens: 55847157760 | elapsed time per iteration (s): 0.08 | learning rate: 7.950E-05 | global batch size: 256 | lm loss: 4.516136E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.203 | TFLOPs: 11.84 | 7: iteration 106530/ 173500 | consumed samples: 27271680 | consumed tokens: 55852400640 | elapsed time per iteration (s): 0.08 | learning rate: 7.949E-05 | global batch size: 256 | lm loss: 4.519356E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.597 | TFLOPs: 11.77 | 7: iteration 106540/ 173500 | consumed samples: 27274240 | consumed tokens: 55857643520 | elapsed time per iteration (s): 0.08 | learning rate: 7.947E-05 | global batch size: 256 | lm loss: 4.514334E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.775 | TFLOPs: 11.79 | 7: iteration 106550/ 173500 | consumed samples: 27276800 | consumed tokens: 55862886400 | elapsed time per iteration (s): 0.08 | learning rate: 7.945E-05 | global batch size: 256 | lm loss: 4.521914E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.874 | TFLOPs: 11.21 | 7: iteration 106560/ 173500 | consumed samples: 27279360 | consumed tokens: 55868129280 | elapsed time per iteration (s): 0.09 | learning rate: 7.944E-05 | global batch size: 256 | lm loss: 4.521561E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.237 | TFLOPs: 10.68 | 7: iteration 106570/ 173500 | consumed samples: 27281920 | consumed tokens: 55873372160 | elapsed time per iteration (s): 0.09 | learning rate: 7.942E-05 | global batch size: 256 | lm loss: 4.500041E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.848 | TFLOPs: 10.31 | 7: iteration 106580/ 173500 | consumed samples: 27284480 | consumed tokens: 55878615040 | elapsed time per iteration (s): 0.08 | learning rate: 7.941E-05 | global batch size: 256 | lm loss: 4.501278E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.671 | TFLOPs: 11.37 | 7: iteration 106590/ 173500 | consumed samples: 27287040 | consumed tokens: 55883857920 | elapsed time per iteration (s): 0.08 | learning rate: 7.939E-05 | global batch size: 256 | lm loss: 4.522078E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.539 | TFLOPs: 11.96 | 7: iteration 106600/ 173500 | consumed samples: 27289600 | consumed tokens: 55889100800 | elapsed time per iteration (s): 0.08 | learning rate: 7.938E-05 | global batch size: 256 | lm loss: 4.520009E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.006 | TFLOPs: 11.88 | 7: iteration 106610/ 173500 | consumed samples: 27292160 | consumed tokens: 55894343680 | elapsed time per iteration (s): 0.08 | learning rate: 7.936E-05 | global batch size: 256 | lm loss: 4.507120E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.135 | TFLOPs: 11.78 | 7: iteration 106620/ 173500 | consumed samples: 27294720 | consumed tokens: 55899586560 | elapsed time per iteration (s): 0.08 | learning rate: 7.935E-05 | global batch size: 256 | lm loss: 4.516774E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.982 | TFLOPs: 12.03 | 7: iteration 106630/ 173500 | consumed samples: 27297280 | consumed tokens: 55904829440 | elapsed time per iteration (s): 0.09 | learning rate: 7.933E-05 | global batch size: 256 | lm loss: 4.523367E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.176 | TFLOPs: 10.95 | 7: iteration 106640/ 173500 | consumed samples: 27299840 | consumed tokens: 55910072320 | elapsed time per iteration (s): 0.08 | learning rate: 7.932E-05 | global batch size: 256 | lm loss: 4.510550E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.665 | TFLOPs: 12.01 | 7: iteration 106650/ 173500 | consumed samples: 27302400 | consumed tokens: 55915315200 | elapsed time per iteration (s): 0.08 | learning rate: 7.930E-05 | global batch size: 256 | lm loss: 4.522398E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.401 | TFLOPs: 11.29 | 7: iteration 106660/ 173500 | consumed samples: 27304960 | consumed tokens: 55920558080 | elapsed time per iteration (s): 0.08 | learning rate: 7.928E-05 | global batch size: 256 | lm loss: 4.523284E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.642 | TFLOPs: 11.52 | 7: iteration 106670/ 173500 | consumed samples: 27307520 | consumed tokens: 55925800960 | elapsed time per iteration (s): 0.08 | learning rate: 7.927E-05 | global batch size: 256 | lm loss: 4.531680E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.501 | TFLOPs: 12.02 | 7: iteration 106680/ 173500 | consumed samples: 27310080 | consumed tokens: 55931043840 | elapsed time per iteration (s): 0.08 | learning rate: 7.925E-05 | global batch size: 256 | lm loss: 4.507545E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.556 | TFLOPs: 11.71 | 7: iteration 106690/ 173500 | consumed samples: 27312640 | consumed tokens: 55936286720 | elapsed time per iteration (s): 0.08 | learning rate: 7.924E-05 | global batch size: 256 | lm loss: 4.523543E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.501 | TFLOPs: 11.82 | 7: iteration 106700/ 173500 | consumed samples: 27315200 | consumed tokens: 55941529600 | elapsed time per iteration (s): 0.08 | learning rate: 7.922E-05 | global batch size: 256 | lm loss: 4.515091E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.755 | TFLOPs: 11.97 | 7: iteration 106710/ 173500 | consumed samples: 27317760 | consumed tokens: 55946772480 | elapsed time per iteration (s): 0.08 | learning rate: 7.921E-05 | global batch size: 256 | lm loss: 4.521108E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.383 | TFLOPs: 11.99 | 7: iteration 106720/ 173500 | consumed samples: 27320320 | consumed tokens: 55952015360 | elapsed time per iteration (s): 0.08 | learning rate: 7.919E-05 | global batch size: 256 | lm loss: 4.517268E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.057 | TFLOPs: 12.01 | 7: iteration 106730/ 173500 | consumed samples: 27322880 | consumed tokens: 55957258240 | elapsed time per iteration (s): 0.08 | learning rate: 7.918E-05 | global batch size: 256 | lm loss: 4.516125E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.451 | TFLOPs: 12.01 | 7: iteration 106740/ 173500 | consumed samples: 27325440 | consumed tokens: 55962501120 | elapsed time per iteration (s): 0.08 | learning rate: 7.916E-05 | global batch size: 256 | lm loss: 4.517024E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.455 | TFLOPs: 12.00 | 7: iteration 106750/ 173500 | consumed samples: 27328000 | consumed tokens: 55967744000 | elapsed time per iteration (s): 0.08 | learning rate: 7.915E-05 | global batch size: 256 | lm loss: 4.498876E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.636 | TFLOPs: 11.47 | 7: iteration 106760/ 173500 | consumed samples: 27330560 | consumed tokens: 55972986880 | elapsed time per iteration (s): 0.08 | learning rate: 7.913E-05 | global batch size: 256 | lm loss: 4.528470E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.862 | TFLOPs: 11.98 | 7: iteration 106770/ 173500 | consumed samples: 27333120 | consumed tokens: 55978229760 | elapsed time per iteration (s): 0.08 | learning rate: 7.911E-05 | global batch size: 256 | lm loss: 4.522732E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.030 | TFLOPs: 11.63 | 7: iteration 106780/ 173500 | consumed samples: 27335680 | consumed tokens: 55983472640 | elapsed time per iteration (s): 0.08 | learning rate: 7.910E-05 | global batch size: 256 | lm loss: 4.492144E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.085 | TFLOPs: 11.88 | 7: iteration 106790/ 173500 | consumed samples: 27338240 | consumed tokens: 55988715520 | elapsed time per iteration (s): 0.08 | learning rate: 7.908E-05 | global batch size: 256 | lm loss: 4.525339E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.400 | TFLOPs: 11.97 | 7: iteration 106800/ 173500 | consumed samples: 27340800 | consumed tokens: 55993958400 | elapsed time per iteration (s): 0.08 | learning rate: 7.907E-05 | global batch size: 256 | lm loss: 4.523994E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.561 | TFLOPs: 11.82 | 7: iteration 106810/ 173500 | consumed samples: 27343360 | consumed tokens: 55999201280 | elapsed time per iteration (s): 0.09 | learning rate: 7.905E-05 | global batch size: 256 | lm loss: 4.520691E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.726 | TFLOPs: 11.03 | 7: iteration 106820/ 173500 | consumed samples: 27345920 | consumed tokens: 56004444160 | elapsed time per iteration (s): 0.09 | learning rate: 7.904E-05 | global batch size: 256 | lm loss: 4.506675E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.785 | TFLOPs: 10.38 | 7: iteration 106830/ 173500 | consumed samples: 27348480 | consumed tokens: 56009687040 | elapsed time per iteration (s): 0.08 | learning rate: 7.902E-05 | global batch size: 256 | lm loss: 4.513939E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.376 | TFLOPs: 11.86 | 7: iteration 106840/ 173500 | consumed samples: 27351040 | consumed tokens: 56014929920 | elapsed time per iteration (s): 0.08 | learning rate: 7.901E-05 | global batch size: 256 | lm loss: 4.518617E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.491 | TFLOPs: 11.60 | 7: iteration 106850/ 173500 | consumed samples: 27353600 | consumed tokens: 56020172800 | elapsed time per iteration (s): 0.09 | learning rate: 7.899E-05 | global batch size: 256 | lm loss: 4.524435E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.488 | TFLOPs: 11.20 | 7: iteration 106860/ 173500 | consumed samples: 27356160 | consumed tokens: 56025415680 | elapsed time per iteration (s): 0.08 | learning rate: 7.898E-05 | global batch size: 256 | lm loss: 4.518581E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.053 | TFLOPs: 11.79 | 7: iteration 106870/ 173500 | consumed samples: 27358720 | consumed tokens: 56030658560 | elapsed time per iteration (s): 0.08 | learning rate: 7.896E-05 | global batch size: 256 | lm loss: 4.523711E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.955 | TFLOPs: 11.24 | 7: iteration 106880/ 173500 | consumed samples: 27361280 | consumed tokens: 56035901440 | elapsed time per iteration (s): 0.09 | learning rate: 7.894E-05 | global batch size: 256 | lm loss: 4.522518E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2917.345 | TFLOPs: 10.85 | 7: iteration 106890/ 173500 | consumed samples: 27363840 | consumed tokens: 56041144320 | elapsed time per iteration (s): 0.09 | learning rate: 7.893E-05 | global batch size: 256 | lm loss: 4.518196E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2709.298 | TFLOPs: 10.08 | 7: iteration 106900/ 173500 | consumed samples: 27366400 | consumed tokens: 56046387200 | elapsed time per iteration (s): 0.10 | learning rate: 7.891E-05 | global batch size: 256 | lm loss: 4.507071E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.126 | TFLOPs: 9.76 | 7: iteration 106910/ 173500 | consumed samples: 27368960 | consumed tokens: 56051630080 | elapsed time per iteration (s): 0.10 | learning rate: 7.890E-05 | global batch size: 256 | lm loss: 4.518221E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2656.917 | TFLOPs: 9.88 | 7: iteration 106920/ 173500 | consumed samples: 27371520 | consumed tokens: 56056872960 | elapsed time per iteration (s): 0.09 | learning rate: 7.888E-05 | global batch size: 256 | lm loss: 4.528789E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2894.258 | TFLOPs: 10.77 | 7: iteration 106930/ 173500 | consumed samples: 27374080 | consumed tokens: 56062115840 | elapsed time per iteration (s): 0.08 | learning rate: 7.887E-05 | global batch size: 256 | lm loss: 4.522388E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.078 | TFLOPs: 11.85 | 7: iteration 106940/ 173500 | consumed samples: 27376640 | consumed tokens: 56067358720 | elapsed time per iteration (s): 0.08 | learning rate: 7.885E-05 | global batch size: 256 | lm loss: 4.519925E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.619 | TFLOPs: 11.54 | 7: iteration 106950/ 173500 | consumed samples: 27379200 | consumed tokens: 56072601600 | elapsed time per iteration (s): 0.08 | learning rate: 7.884E-05 | global batch size: 256 | lm loss: 4.515366E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.749 | TFLOPs: 11.84 | 7: iteration 106960/ 173500 | consumed samples: 27381760 | consumed tokens: 56077844480 | elapsed time per iteration (s): 0.08 | learning rate: 7.882E-05 | global batch size: 256 | lm loss: 4.517030E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.651 | TFLOPs: 11.86 | 7: iteration 106970/ 173500 | consumed samples: 27384320 | consumed tokens: 56083087360 | elapsed time per iteration (s): 0.08 | learning rate: 7.881E-05 | global batch size: 256 | lm loss: 4.529179E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.955 | TFLOPs: 11.87 | 7: iteration 106980/ 173500 | consumed samples: 27386880 | consumed tokens: 56088330240 | elapsed time per iteration (s): 0.08 | learning rate: 7.879E-05 | global batch size: 256 | lm loss: 4.516507E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.700 | TFLOPs: 11.84 | 7: iteration 106990/ 173500 | consumed samples: 27389440 | consumed tokens: 56093573120 | elapsed time per iteration (s): 0.08 | learning rate: 7.877E-05 | global batch size: 256 | lm loss: 4.522292E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.287 | TFLOPs: 11.87 | 7: iteration 107000/ 173500 | consumed samples: 27392000 | consumed tokens: 56098816000 | elapsed time per iteration (s): 0.09 | learning rate: 7.876E-05 | global batch size: 256 | lm loss: 4.504599E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.634 | TFLOPs: 10.40 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 107000 | lm loss value: 4.385661E+00 | lm loss PPL: 8.029129E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 107000 to checkpoints_14m91b100m 0: [2023-03-17 02:49:52,125] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step107000 is begin to save! 0: [2023-03-17 02:49:52,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:49:52,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:49:52,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:49:52,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:49:52,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:49:52,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:49:52,161] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:49:52,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:49:52,164] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:49:52,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:49:52,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:49:52,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:49:52,168] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step107000/mp_rank_00_model_states.pt 0: [2023-03-17 02:49:52,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:49:52,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:49:52,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:49:52,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2023-03-17 02:49:52,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2023-03-17 02:49:52,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:49:52,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2023-03-17 02:49:52,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:49:52,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:49:52,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2023-03-17 02:49:52,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:49:52,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:49:52,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: successfully saved checkpoint at iteration 107000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.23 7: iteration 107010/ 173500 | consumed samples: 27394560 | consumed tokens: 56104058880 | elapsed time per iteration (s): 0.10 | learning rate: 7.874E-05 | global batch size: 256 | lm loss: 4.523277E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2444.456 | TFLOPs: 9.09 | 7: iteration 107020/ 173500 | consumed samples: 27397120 | consumed tokens: 56109301760 | elapsed time per iteration (s): 0.09 | learning rate: 7.873E-05 | global batch size: 256 | lm loss: 4.513063E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2818.176 | TFLOPs: 10.48 | 7: iteration 107030/ 173500 | consumed samples: 27399680 | consumed tokens: 56114544640 | elapsed time per iteration (s): 0.10 | learning rate: 7.871E-05 | global batch size: 256 | lm loss: 4.511018E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.853 | TFLOPs: 9.37 | 7: iteration 107040/ 173500 | consumed samples: 27402240 | consumed tokens: 56119787520 | elapsed time per iteration (s): 0.09 | learning rate: 7.870E-05 | global batch size: 256 | lm loss: 4.515424E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.559 | TFLOPs: 11.19 | 7: iteration 107050/ 173500 | consumed samples: 27404800 | consumed tokens: 56125030400 | elapsed time per iteration (s): 0.08 | learning rate: 7.868E-05 | global batch size: 256 | lm loss: 4.520313E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.790 | TFLOPs: 11.85 | 7: iteration 107060/ 173500 | consumed samples: 27407360 | consumed tokens: 56130273280 | elapsed time per iteration (s): 0.08 | learning rate: 7.867E-05 | global batch size: 256 | lm loss: 4.528394E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.458 | TFLOPs: 11.85 | 7: iteration 107070/ 173500 | consumed samples: 27409920 | consumed tokens: 56135516160 | elapsed time per iteration (s): 0.08 | learning rate: 7.865E-05 | global batch size: 256 | lm loss: 4.520847E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.069 | TFLOPs: 11.36 | 7: iteration 107080/ 173500 | consumed samples: 27412480 | consumed tokens: 56140759040 | elapsed time per iteration (s): 0.08 | learning rate: 7.864E-05 | global batch size: 256 | lm loss: 4.519509E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.911 | TFLOPs: 11.90 | 7: iteration 107090/ 173500 | consumed samples: 27415040 | consumed tokens: 56146001920 | elapsed time per iteration (s): 0.09 | learning rate: 7.862E-05 | global batch size: 256 | lm loss: 4.533667E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2809.793 | TFLOPs: 10.45 | 7: iteration 107100/ 173500 | consumed samples: 27417600 | consumed tokens: 56151244800 | elapsed time per iteration (s): 0.10 | learning rate: 7.860E-05 | global batch size: 256 | lm loss: 4.519357E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2642.390 | TFLOPs: 9.83 | 7: iteration 107110/ 173500 | consumed samples: 27420160 | consumed tokens: 56156487680 | elapsed time per iteration (s): 0.08 | learning rate: 7.859E-05 | global batch size: 256 | lm loss: 4.511256E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.426 | TFLOPs: 11.93 | 7: iteration 107120/ 173500 | consumed samples: 27422720 | consumed tokens: 56161730560 | elapsed time per iteration (s): 0.09 | learning rate: 7.857E-05 | global batch size: 256 | lm loss: 4.517904E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.411 | TFLOPs: 10.81 | 7: iteration 107130/ 173500 | consumed samples: 27425280 | consumed tokens: 56166973440 | elapsed time per iteration (s): 0.08 | learning rate: 7.856E-05 | global batch size: 256 | lm loss: 4.527007E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.984 | TFLOPs: 11.63 | 7: iteration 107140/ 173500 | consumed samples: 27427840 | consumed tokens: 56172216320 | elapsed time per iteration (s): 0.09 | learning rate: 7.854E-05 | global batch size: 256 | lm loss: 4.512733E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.583 | TFLOPs: 10.56 | 7: iteration 107150/ 173500 | consumed samples: 27430400 | consumed tokens: 56177459200 | elapsed time per iteration (s): 0.08 | learning rate: 7.853E-05 | global batch size: 256 | lm loss: 4.495688E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.769 | TFLOPs: 11.76 | 7: iteration 107160/ 173500 | consumed samples: 27432960 | consumed tokens: 56182702080 | elapsed time per iteration (s): 0.09 | learning rate: 7.851E-05 | global batch size: 256 | lm loss: 4.515782E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.042 | TFLOPs: 10.51 | 7: iteration 107170/ 173500 | consumed samples: 27435520 | consumed tokens: 56187944960 | elapsed time per iteration (s): 0.10 | learning rate: 7.850E-05 | global batch size: 256 | lm loss: 4.510825E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2654.124 | TFLOPs: 9.87 | 7: iteration 107180/ 173500 | consumed samples: 27438080 | consumed tokens: 56193187840 | elapsed time per iteration (s): 0.08 | learning rate: 7.848E-05 | global batch size: 256 | lm loss: 4.522608E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.061 | TFLOPs: 11.87 | 7: iteration 107190/ 173500 | consumed samples: 27440640 | consumed tokens: 56198430720 | elapsed time per iteration (s): 0.09 | learning rate: 7.847E-05 | global batch size: 256 | lm loss: 4.519455E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.028 | TFLOPs: 10.69 | 7: iteration 107200/ 173500 | consumed samples: 27443200 | consumed tokens: 56203673600 | elapsed time per iteration (s): 0.09 | learning rate: 7.845E-05 | global batch size: 256 | lm loss: 4.507690E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.387 | TFLOPs: 11.09 | 7: iteration 107210/ 173500 | consumed samples: 27445760 | consumed tokens: 56208916480 | elapsed time per iteration (s): 0.09 | learning rate: 7.844E-05 | global batch size: 256 | lm loss: 4.507172E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.337 | TFLOPs: 10.44 | 7: iteration 107220/ 173500 | consumed samples: 27448320 | consumed tokens: 56214159360 | elapsed time per iteration (s): 0.08 | learning rate: 7.842E-05 | global batch size: 256 | lm loss: 4.508760E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.891 | TFLOPs: 11.82 | 7: iteration 107230/ 173500 | consumed samples: 27450880 | consumed tokens: 56219402240 | elapsed time per iteration (s): 0.08 | learning rate: 7.840E-05 | global batch size: 256 | lm loss: 4.510152E+00 | grad norm: 0.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.612 | TFLOPs: 11.34 | 7: iteration 107240/ 173500 | consumed samples: 27453440 | consumed tokens: 56224645120 | elapsed time per iteration (s): 0.08 | learning rate: 7.839E-05 | global batch size: 256 | lm loss: 4.528071E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.263 | TFLOPs: 11.83 | 7: iteration 107250/ 173500 | consumed samples: 27456000 | consumed tokens: 56229888000 | elapsed time per iteration (s): 0.08 | learning rate: 7.837E-05 | global batch size: 256 | lm loss: 4.526015E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.394 | TFLOPs: 11.86 | 7: iteration 107260/ 173500 | consumed samples: 27458560 | consumed tokens: 56235130880 | elapsed time per iteration (s): 0.08 | learning rate: 7.836E-05 | global batch size: 256 | lm loss: 4.524949E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.860 | TFLOPs: 11.36 | 7: iteration 107270/ 173500 | consumed samples: 27461120 | consumed tokens: 56240373760 | elapsed time per iteration (s): 0.08 | learning rate: 7.834E-05 | global batch size: 256 | lm loss: 4.522181E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.940 | TFLOPs: 11.40 | 7: iteration 107280/ 173500 | consumed samples: 27463680 | consumed tokens: 56245616640 | elapsed time per iteration (s): 0.09 | learning rate: 7.833E-05 | global batch size: 256 | lm loss: 4.520654E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2859.845 | TFLOPs: 10.64 | 7: iteration 107290/ 173500 | consumed samples: 27466240 | consumed tokens: 56250859520 | elapsed time per iteration (s): 0.10 | learning rate: 7.831E-05 | global batch size: 256 | lm loss: 4.513761E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2647.247 | TFLOPs: 9.85 | 7: iteration 107300/ 173500 | consumed samples: 27468800 | consumed tokens: 56256102400 | elapsed time per iteration (s): 0.08 | learning rate: 7.830E-05 | global batch size: 256 | lm loss: 4.520728E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.814 | TFLOPs: 11.71 | 7: iteration 107310/ 173500 | consumed samples: 27471360 | consumed tokens: 56261345280 | elapsed time per iteration (s): 0.09 | learning rate: 7.828E-05 | global batch size: 256 | lm loss: 4.507035E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.734 | TFLOPs: 11.05 | 7: iteration 107320/ 173500 | consumed samples: 27473920 | consumed tokens: 56266588160 | elapsed time per iteration (s): 0.08 | learning rate: 7.827E-05 | global batch size: 256 | lm loss: 4.518684E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.279 | TFLOPs: 11.81 | 7: iteration 107330/ 173500 | consumed samples: 27476480 | consumed tokens: 56271831040 | elapsed time per iteration (s): 0.08 | learning rate: 7.825E-05 | global batch size: 256 | lm loss: 4.518884E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.254 | TFLOPs: 11.83 | 7: iteration 107340/ 173500 | consumed samples: 27479040 | consumed tokens: 56277073920 | elapsed time per iteration (s): 0.08 | learning rate: 7.823E-05 | global batch size: 256 | lm loss: 4.507314E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.506 | TFLOPs: 11.82 | 7: iteration 107350/ 173500 | consumed samples: 27481600 | consumed tokens: 56282316800 | elapsed time per iteration (s): 0.08 | learning rate: 7.822E-05 | global batch size: 256 | lm loss: 4.517805E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.086 | TFLOPs: 11.71 | 7: iteration 107360/ 173500 | consumed samples: 27484160 | consumed tokens: 56287559680 | elapsed time per iteration (s): 0.09 | learning rate: 7.820E-05 | global batch size: 256 | lm loss: 4.512323E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.715 | TFLOPs: 10.83 | 7: iteration 107370/ 173500 | consumed samples: 27486720 | consumed tokens: 56292802560 | elapsed time per iteration (s): 0.08 | learning rate: 7.819E-05 | global batch size: 256 | lm loss: 4.523208E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.099 | TFLOPs: 11.87 | 7: iteration 107380/ 173500 | consumed samples: 27489280 | consumed tokens: 56298045440 | elapsed time per iteration (s): 0.08 | learning rate: 7.817E-05 | global batch size: 256 | lm loss: 4.520510E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.018 | TFLOPs: 11.79 | 7: iteration 107390/ 173500 | consumed samples: 27491840 | consumed tokens: 56303288320 | elapsed time per iteration (s): 0.08 | learning rate: 7.816E-05 | global batch size: 256 | lm loss: 4.515986E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.192 | TFLOPs: 11.59 | 7: iteration 107400/ 173500 | consumed samples: 27494400 | consumed tokens: 56308531200 | elapsed time per iteration (s): 0.08 | learning rate: 7.814E-05 | global batch size: 256 | lm loss: 4.512309E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.887 | TFLOPs: 11.80 | 7: iteration 107410/ 173500 | consumed samples: 27496960 | consumed tokens: 56313774080 | elapsed time per iteration (s): 0.08 | learning rate: 7.813E-05 | global batch size: 256 | lm loss: 4.520434E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.546 | TFLOPs: 11.83 | 7: iteration 107420/ 173500 | consumed samples: 27499520 | consumed tokens: 56319016960 | elapsed time per iteration (s): 0.08 | learning rate: 7.811E-05 | global batch size: 256 | lm loss: 4.510425E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.532 | TFLOPs: 11.85 | 7: iteration 107430/ 173500 | consumed samples: 27502080 | consumed tokens: 56324259840 | elapsed time per iteration (s): 0.08 | learning rate: 7.810E-05 | global batch size: 256 | lm loss: 4.514019E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.513 | TFLOPs: 11.83 | 7: iteration 107440/ 173500 | consumed samples: 27504640 | consumed tokens: 56329502720 | elapsed time per iteration (s): 0.09 | learning rate: 7.808E-05 | global batch size: 256 | lm loss: 4.521676E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.726 | TFLOPs: 10.66 | 7: iteration 107450/ 173500 | consumed samples: 27507200 | consumed tokens: 56334745600 | elapsed time per iteration (s): 0.09 | learning rate: 7.807E-05 | global batch size: 256 | lm loss: 4.524821E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2944.559 | TFLOPs: 10.95 | 7: iteration 107460/ 173500 | consumed samples: 27509760 | consumed tokens: 56339988480 | elapsed time per iteration (s): 0.10 | learning rate: 7.805E-05 | global batch size: 256 | lm loss: 4.521825E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.590 | TFLOPs: 9.16 | 7: iteration 107470/ 173500 | consumed samples: 27512320 | consumed tokens: 56345231360 | elapsed time per iteration (s): 0.09 | learning rate: 7.803E-05 | global batch size: 256 | lm loss: 4.503693E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.505 | TFLOPs: 10.87 | 7: iteration 107480/ 173500 | consumed samples: 27514880 | consumed tokens: 56350474240 | elapsed time per iteration (s): 0.09 | learning rate: 7.802E-05 | global batch size: 256 | lm loss: 4.514206E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.475 | TFLOPs: 10.39 | 7: iteration 107490/ 173500 | consumed samples: 27517440 | consumed tokens: 56355717120 | elapsed time per iteration (s): 0.09 | learning rate: 7.800E-05 | global batch size: 256 | lm loss: 4.506910E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.516 | TFLOPs: 10.09 | 7: iteration 107500/ 173500 | consumed samples: 27520000 | consumed tokens: 56360960000 | elapsed time per iteration (s): 0.10 | learning rate: 7.799E-05 | global batch size: 256 | lm loss: 4.506924E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.978 | TFLOPs: 9.41 | 7: iteration 107510/ 173500 | consumed samples: 27522560 | consumed tokens: 56366202880 | elapsed time per iteration (s): 0.09 | learning rate: 7.797E-05 | global batch size: 256 | lm loss: 4.529309E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2780.465 | TFLOPs: 10.34 | 7: iteration 107520/ 173500 | consumed samples: 27525120 | consumed tokens: 56371445760 | elapsed time per iteration (s): 0.08 | learning rate: 7.796E-05 | global batch size: 256 | lm loss: 4.513139E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.174 | TFLOPs: 11.64 | 7: iteration 107530/ 173500 | consumed samples: 27527680 | consumed tokens: 56376688640 | elapsed time per iteration (s): 0.12 | learning rate: 7.794E-05 | global batch size: 256 | lm loss: 4.517159E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2095.232 | TFLOPs: 7.79 | 7: iteration 107540/ 173500 | consumed samples: 27530240 | consumed tokens: 56381931520 | elapsed time per iteration (s): 0.12 | learning rate: 7.793E-05 | global batch size: 256 | lm loss: 4.523842E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2102.372 | TFLOPs: 7.82 | 7: iteration 107550/ 173500 | consumed samples: 27532800 | consumed tokens: 56387174400 | elapsed time per iteration (s): 0.13 | learning rate: 7.791E-05 | global batch size: 256 | lm loss: 4.527422E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1942.140 | TFLOPs: 7.22 | 7: iteration 107560/ 173500 | consumed samples: 27535360 | consumed tokens: 56392417280 | elapsed time per iteration (s): 0.13 | learning rate: 7.790E-05 | global batch size: 256 | lm loss: 4.505120E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1984.292 | TFLOPs: 7.38 | 7: iteration 107570/ 173500 | consumed samples: 27537920 | consumed tokens: 56397660160 | elapsed time per iteration (s): 0.12 | learning rate: 7.788E-05 | global batch size: 256 | lm loss: 4.512215E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2078.315 | TFLOPs: 7.73 | 7: iteration 107580/ 173500 | consumed samples: 27540480 | consumed tokens: 56402903040 | elapsed time per iteration (s): 0.12 | learning rate: 7.787E-05 | global batch size: 256 | lm loss: 4.505849E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.304 | TFLOPs: 7.63 | 7: iteration 107590/ 173500 | consumed samples: 27543040 | consumed tokens: 56408145920 | elapsed time per iteration (s): 0.11 | learning rate: 7.785E-05 | global batch size: 256 | lm loss: 4.521519E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2405.993 | TFLOPs: 8.95 | 7: iteration 107600/ 173500 | consumed samples: 27545600 | consumed tokens: 56413388800 | elapsed time per iteration (s): 0.13 | learning rate: 7.783E-05 | global batch size: 256 | lm loss: 4.515139E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2018.818 | TFLOPs: 7.51 | 7: iteration 107610/ 173500 | consumed samples: 27548160 | consumed tokens: 56418631680 | elapsed time per iteration (s): 0.11 | learning rate: 7.782E-05 | global batch size: 256 | lm loss: 4.518906E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.104 | TFLOPs: 8.63 | 7: iteration 107620/ 173500 | consumed samples: 27550720 | consumed tokens: 56423874560 | elapsed time per iteration (s): 0.12 | learning rate: 7.780E-05 | global batch size: 256 | lm loss: 4.517079E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2070.831 | TFLOPs: 7.70 | 7: iteration 107630/ 173500 | consumed samples: 27553280 | consumed tokens: 56429117440 | elapsed time per iteration (s): 0.12 | learning rate: 7.779E-05 | global batch size: 256 | lm loss: 4.522980E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.316 | TFLOPs: 8.02 | 7: iteration 107640/ 173500 | consumed samples: 27555840 | consumed tokens: 56434360320 | elapsed time per iteration (s): 0.12 | learning rate: 7.777E-05 | global batch size: 256 | lm loss: 4.519567E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.297 | TFLOPs: 8.04 | 7: iteration 107650/ 173500 | consumed samples: 27558400 | consumed tokens: 56439603200 | elapsed time per iteration (s): 0.12 | learning rate: 7.776E-05 | global batch size: 256 | lm loss: 4.519402E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.791 | TFLOPs: 8.21 | 7: iteration 107660/ 173500 | consumed samples: 27560960 | consumed tokens: 56444846080 | elapsed time per iteration (s): 0.12 | learning rate: 7.774E-05 | global batch size: 256 | lm loss: 4.507271E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.816 | TFLOPs: 7.83 | 7: iteration 107670/ 173500 | consumed samples: 27563520 | consumed tokens: 56450088960 | elapsed time per iteration (s): 0.14 | learning rate: 7.773E-05 | global batch size: 256 | lm loss: 4.520371E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1859.826 | TFLOPs: 6.92 | 7: iteration 107680/ 173500 | consumed samples: 27566080 | consumed tokens: 56455331840 | elapsed time per iteration (s): 0.12 | learning rate: 7.771E-05 | global batch size: 256 | lm loss: 4.512279E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.448 | TFLOPs: 7.81 | 7: iteration 107690/ 173500 | consumed samples: 27568640 | consumed tokens: 56460574720 | elapsed time per iteration (s): 0.12 | learning rate: 7.770E-05 | global batch size: 256 | lm loss: 4.511568E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.653 | TFLOPs: 7.63 | 7: iteration 107700/ 173500 | consumed samples: 27571200 | consumed tokens: 56465817600 | elapsed time per iteration (s): 0.13 | learning rate: 7.768E-05 | global batch size: 256 | lm loss: 4.520755E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.364 | TFLOPs: 7.53 | 7: iteration 107710/ 173500 | consumed samples: 27573760 | consumed tokens: 56471060480 | elapsed time per iteration (s): 0.14 | learning rate: 7.767E-05 | global batch size: 256 | lm loss: 4.519445E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1889.483 | TFLOPs: 7.03 | 7: iteration 107720/ 173500 | consumed samples: 27576320 | consumed tokens: 56476303360 | elapsed time per iteration (s): 0.13 | learning rate: 7.765E-05 | global batch size: 256 | lm loss: 4.518647E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2004.622 | TFLOPs: 7.46 | 7: iteration 107730/ 173500 | consumed samples: 27578880 | consumed tokens: 56481546240 | elapsed time per iteration (s): 0.12 | learning rate: 7.763E-05 | global batch size: 256 | lm loss: 4.522262E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.347 | TFLOPs: 7.63 | 7: iteration 107740/ 173500 | consumed samples: 27581440 | consumed tokens: 56486789120 | elapsed time per iteration (s): 0.14 | learning rate: 7.762E-05 | global batch size: 256 | lm loss: 4.512452E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1868.942 | TFLOPs: 6.95 | 7: iteration 107750/ 173500 | consumed samples: 27584000 | consumed tokens: 56492032000 | elapsed time per iteration (s): 0.13 | learning rate: 7.760E-05 | global batch size: 256 | lm loss: 4.511493E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1907.173 | TFLOPs: 7.09 | 7: iteration 107760/ 173500 | consumed samples: 27586560 | consumed tokens: 56497274880 | elapsed time per iteration (s): 0.13 | learning rate: 7.759E-05 | global batch size: 256 | lm loss: 4.533366E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.439 | TFLOPs: 7.37 | 7: iteration 107770/ 173500 | consumed samples: 27589120 | consumed tokens: 56502517760 | elapsed time per iteration (s): 0.14 | learning rate: 7.757E-05 | global batch size: 256 | lm loss: 4.526780E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1882.244 | TFLOPs: 7.00 | 7: iteration 107780/ 173500 | consumed samples: 27591680 | consumed tokens: 56507760640 | elapsed time per iteration (s): 0.12 | learning rate: 7.756E-05 | global batch size: 256 | lm loss: 4.515446E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.585 | TFLOPs: 7.63 | 7: iteration 107790/ 173500 | consumed samples: 27594240 | consumed tokens: 56513003520 | elapsed time per iteration (s): 0.12 | learning rate: 7.754E-05 | global batch size: 256 | lm loss: 4.509694E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2168.715 | TFLOPs: 8.07 | 7: iteration 107800/ 173500 | consumed samples: 27596800 | consumed tokens: 56518246400 | elapsed time per iteration (s): 0.11 | learning rate: 7.753E-05 | global batch size: 256 | lm loss: 4.509483E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2303.159 | TFLOPs: 8.57 | 7: iteration 107810/ 173500 | consumed samples: 27599360 | consumed tokens: 56523489280 | elapsed time per iteration (s): 0.12 | learning rate: 7.751E-05 | global batch size: 256 | lm loss: 4.508488E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.176 | TFLOPs: 8.18 | 7: iteration 107820/ 173500 | consumed samples: 27601920 | consumed tokens: 56528732160 | elapsed time per iteration (s): 0.12 | learning rate: 7.750E-05 | global batch size: 256 | lm loss: 4.513264E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.927 | TFLOPs: 8.05 | 7: iteration 107830/ 173500 | consumed samples: 27604480 | consumed tokens: 56533975040 | elapsed time per iteration (s): 0.11 | learning rate: 7.748E-05 | global batch size: 256 | lm loss: 4.526124E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2313.288 | TFLOPs: 8.60 | 7: iteration 107840/ 173500 | consumed samples: 27607040 | consumed tokens: 56539217920 | elapsed time per iteration (s): 0.12 | learning rate: 7.747E-05 | global batch size: 256 | lm loss: 4.525498E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2081.916 | TFLOPs: 7.74 | 7: iteration 107850/ 173500 | consumed samples: 27609600 | consumed tokens: 56544460800 | elapsed time per iteration (s): 0.12 | learning rate: 7.745E-05 | global batch size: 256 | lm loss: 4.522232E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.578 | TFLOPs: 7.83 | 7: iteration 107860/ 173500 | consumed samples: 27612160 | consumed tokens: 56549703680 | elapsed time per iteration (s): 0.13 | learning rate: 7.744E-05 | global batch size: 256 | lm loss: 4.522345E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.418 | TFLOPs: 7.26 | 7: iteration 107870/ 173500 | consumed samples: 27614720 | consumed tokens: 56554946560 | elapsed time per iteration (s): 0.12 | learning rate: 7.742E-05 | global batch size: 256 | lm loss: 4.497193E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.296 | TFLOPs: 7.86 | 7: iteration 107880/ 173500 | consumed samples: 27617280 | consumed tokens: 56560189440 | elapsed time per iteration (s): 0.13 | learning rate: 7.740E-05 | global batch size: 256 | lm loss: 4.520025E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.817 | TFLOPs: 7.42 | 7: iteration 107890/ 173500 | consumed samples: 27619840 | consumed tokens: 56565432320 | elapsed time per iteration (s): 0.13 | learning rate: 7.739E-05 | global batch size: 256 | lm loss: 4.515795E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.781 | TFLOPs: 7.42 | 7: iteration 107900/ 173500 | consumed samples: 27622400 | consumed tokens: 56570675200 | elapsed time per iteration (s): 0.13 | learning rate: 7.737E-05 | global batch size: 256 | lm loss: 4.517941E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.287 | TFLOPs: 7.46 | 7: iteration 107910/ 173500 | consumed samples: 27624960 | consumed tokens: 56575918080 | elapsed time per iteration (s): 0.13 | learning rate: 7.736E-05 | global batch size: 256 | lm loss: 4.515047E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.169 | TFLOPs: 7.58 | 7: iteration 107920/ 173500 | consumed samples: 27627520 | consumed tokens: 56581160960 | elapsed time per iteration (s): 0.13 | learning rate: 7.734E-05 | global batch size: 256 | lm loss: 4.516534E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2019.158 | TFLOPs: 7.51 | 7: iteration 107930/ 173500 | consumed samples: 27630080 | consumed tokens: 56586403840 | elapsed time per iteration (s): 0.09 | learning rate: 7.733E-05 | global batch size: 256 | lm loss: 4.518584E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2742.385 | TFLOPs: 10.20 | 7: iteration 107940/ 173500 | consumed samples: 27632640 | consumed tokens: 56591646720 | elapsed time per iteration (s): 0.08 | learning rate: 7.731E-05 | global batch size: 256 | lm loss: 4.517635E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.979 | TFLOPs: 11.85 | 7: iteration 107950/ 173500 | consumed samples: 27635200 | consumed tokens: 56596889600 | elapsed time per iteration (s): 0.08 | learning rate: 7.730E-05 | global batch size: 256 | lm loss: 4.520706E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.759 | TFLOPs: 11.60 | 7: iteration 107960/ 173500 | consumed samples: 27637760 | consumed tokens: 56602132480 | elapsed time per iteration (s): 0.10 | learning rate: 7.728E-05 | global batch size: 256 | lm loss: 4.518203E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.279 | TFLOPs: 9.41 | 7: iteration 107970/ 173500 | consumed samples: 27640320 | consumed tokens: 56607375360 | elapsed time per iteration (s): 0.08 | learning rate: 7.727E-05 | global batch size: 256 | lm loss: 4.524213E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.957 | TFLOPs: 11.25 | 7: iteration 107980/ 173500 | consumed samples: 27642880 | consumed tokens: 56612618240 | elapsed time per iteration (s): 0.08 | learning rate: 7.725E-05 | global batch size: 256 | lm loss: 4.515020E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.654 | TFLOPs: 11.88 | 7: iteration 107990/ 173500 | consumed samples: 27645440 | consumed tokens: 56617861120 | elapsed time per iteration (s): 0.08 | learning rate: 7.724E-05 | global batch size: 256 | lm loss: 4.521947E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.611 | TFLOPs: 11.28 | 0: [2023-03-17 02:51:33,730] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=0, lr=[7.722055869362951e-05, 7.722055869362951e-05, 7.722055869362951e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 108000/ 173500 | consumed samples: 27648000 | consumed tokens: 56623104000 | elapsed time per iteration (s): 0.09 | learning rate: 7.722E-05 | global batch size: 256 | lm loss: 4.522732E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2710.963 | TFLOPs: 10.08 | 0: steps: 108000 loss: 4.4962 iter time (s): 0.091 samples/sec: 2798.774 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 108000 | lm loss value: 4.451663E+00 | lm loss PPL: 8.576950E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 108000 to checkpoints_14m91b100m 0: [2023-03-17 02:51:33,787] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step108000 is begin to save! 0: [2023-03-17 02:51:33,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:51:33,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:51:33,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:51:33,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:51:33,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:51:33,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:51:33,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:51:33,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:51:33,826] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:51:33,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:51:33,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:51:33,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:51:33,830] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step108000/mp_rank_00_model_states.pt 0: [2023-03-17 02:51:33,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:51:33,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:51:33,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:51:33,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 02:51:33,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2023-03-17 02:51:33,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2023-03-17 02:51:33,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:51:33,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:51:33,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: successfully saved checkpoint at iteration 108000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.25 7: iteration 108010/ 173500 | consumed samples: 27650560 | consumed tokens: 56628346880 | elapsed time per iteration (s): 0.09 | learning rate: 7.721E-05 | global batch size: 256 | lm loss: 4.518756E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2703.737 | TFLOPs: 10.06 | 7: iteration 108020/ 173500 | consumed samples: 27653120 | consumed tokens: 56633589760 | elapsed time per iteration (s): 0.08 | learning rate: 7.719E-05 | global batch size: 256 | lm loss: 4.523850E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.414 | TFLOPs: 11.67 | 7: iteration 108030/ 173500 | consumed samples: 27655680 | consumed tokens: 56638832640 | elapsed time per iteration (s): 0.08 | learning rate: 7.717E-05 | global batch size: 256 | lm loss: 4.505349E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.058 | TFLOPs: 11.98 | 7: iteration 108040/ 173500 | consumed samples: 27658240 | consumed tokens: 56644075520 | elapsed time per iteration (s): 0.08 | learning rate: 7.716E-05 | global batch size: 256 | lm loss: 4.531109E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.540 | TFLOPs: 11.97 | 7: iteration 108050/ 173500 | consumed samples: 27660800 | consumed tokens: 56649318400 | elapsed time per iteration (s): 0.08 | learning rate: 7.714E-05 | global batch size: 256 | lm loss: 4.505537E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3054.886 | TFLOPs: 11.36 | 7: iteration 108060/ 173500 | consumed samples: 27663360 | consumed tokens: 56654561280 | elapsed time per iteration (s): 0.08 | learning rate: 7.713E-05 | global batch size: 256 | lm loss: 4.507684E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.394 | TFLOPs: 11.88 | 7: iteration 108070/ 173500 | consumed samples: 27665920 | consumed tokens: 56659804160 | elapsed time per iteration (s): 0.08 | learning rate: 7.711E-05 | global batch size: 256 | lm loss: 4.522351E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.630 | TFLOPs: 11.85 | 7: iteration 108080/ 173500 | consumed samples: 27668480 | consumed tokens: 56665047040 | elapsed time per iteration (s): 0.08 | learning rate: 7.710E-05 | global batch size: 256 | lm loss: 4.518419E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.568 | TFLOPs: 11.44 | 7: iteration 108090/ 173500 | consumed samples: 27671040 | consumed tokens: 56670289920 | elapsed time per iteration (s): 0.08 | learning rate: 7.708E-05 | global batch size: 256 | lm loss: 4.510258E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.395 | TFLOPs: 11.54 | 7: iteration 108100/ 173500 | consumed samples: 27673600 | consumed tokens: 56675532800 | elapsed time per iteration (s): 0.09 | learning rate: 7.707E-05 | global batch size: 256 | lm loss: 4.512976E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.792 | TFLOPs: 10.54 | 7: iteration 108110/ 173500 | consumed samples: 27676160 | consumed tokens: 56680775680 | elapsed time per iteration (s): 0.08 | learning rate: 7.705E-05 | global batch size: 256 | lm loss: 4.526516E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.968 | TFLOPs: 11.55 | 7: iteration 108120/ 173500 | consumed samples: 27678720 | consumed tokens: 56686018560 | elapsed time per iteration (s): 0.08 | learning rate: 7.704E-05 | global batch size: 256 | lm loss: 4.514410E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.404 | TFLOPs: 11.65 | 7: iteration 108130/ 173500 | consumed samples: 27681280 | consumed tokens: 56691261440 | elapsed time per iteration (s): 0.10 | learning rate: 7.702E-05 | global batch size: 256 | lm loss: 4.510016E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2518.198 | TFLOPs: 9.37 | 7: iteration 108140/ 173500 | consumed samples: 27683840 | consumed tokens: 56696504320 | elapsed time per iteration (s): 0.12 | learning rate: 7.701E-05 | global batch size: 256 | lm loss: 4.527692E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2178.405 | TFLOPs: 8.10 | 7: iteration 108150/ 173500 | consumed samples: 27686400 | consumed tokens: 56701747200 | elapsed time per iteration (s): 0.16 | learning rate: 7.699E-05 | global batch size: 256 | lm loss: 4.515310E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1593.657 | TFLOPs: 5.93 | 7: iteration 108160/ 173500 | consumed samples: 27688960 | consumed tokens: 56706990080 | elapsed time per iteration (s): 0.10 | learning rate: 7.698E-05 | global batch size: 256 | lm loss: 4.509393E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2557.898 | TFLOPs: 9.51 | 7: iteration 108170/ 173500 | consumed samples: 27691520 | consumed tokens: 56712232960 | elapsed time per iteration (s): 0.08 | learning rate: 7.696E-05 | global batch size: 256 | lm loss: 4.524063E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.341 | TFLOPs: 12.04 | 7: iteration 108180/ 173500 | consumed samples: 27694080 | consumed tokens: 56717475840 | elapsed time per iteration (s): 0.08 | learning rate: 7.694E-05 | global batch size: 256 | lm loss: 4.518622E+00 | grad norm: 0.471 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.822 | TFLOPs: 11.78 | 7: iteration 108190/ 173500 | consumed samples: 27696640 | consumed tokens: 56722718720 | elapsed time per iteration (s): 0.08 | learning rate: 7.693E-05 | global batch size: 256 | lm loss: 4.516359E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.322 | TFLOPs: 11.41 | 7: iteration 108200/ 173500 | consumed samples: 27699200 | consumed tokens: 56727961600 | elapsed time per iteration (s): 0.08 | learning rate: 7.691E-05 | global batch size: 256 | lm loss: 4.504803E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.352 | TFLOPs: 11.89 | 7: iteration 108210/ 173500 | consumed samples: 27701760 | consumed tokens: 56733204480 | elapsed time per iteration (s): 0.08 | learning rate: 7.690E-05 | global batch size: 256 | lm loss: 4.521473E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.329 | TFLOPs: 11.85 | 7: iteration 108220/ 173500 | consumed samples: 27704320 | consumed tokens: 56738447360 | elapsed time per iteration (s): 0.08 | learning rate: 7.688E-05 | global batch size: 256 | lm loss: 4.498070E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.401 | TFLOPs: 11.88 | 7: iteration 108230/ 173500 | consumed samples: 27706880 | consumed tokens: 56743690240 | elapsed time per iteration (s): 0.09 | learning rate: 7.687E-05 | global batch size: 256 | lm loss: 4.522126E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.498 | TFLOPs: 10.97 | 7: iteration 108240/ 173500 | consumed samples: 27709440 | consumed tokens: 56748933120 | elapsed time per iteration (s): 0.08 | learning rate: 7.685E-05 | global batch size: 256 | lm loss: 4.527687E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.035 | TFLOPs: 11.74 | 7: iteration 108250/ 173500 | consumed samples: 27712000 | consumed tokens: 56754176000 | elapsed time per iteration (s): 0.09 | learning rate: 7.684E-05 | global batch size: 256 | lm loss: 4.518278E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.314 | TFLOPs: 10.09 | 7: iteration 108260/ 173500 | consumed samples: 27714560 | consumed tokens: 56759418880 | elapsed time per iteration (s): 0.08 | learning rate: 7.682E-05 | global batch size: 256 | lm loss: 4.527916E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.527 | TFLOPs: 11.83 | 7: iteration 108270/ 173500 | consumed samples: 27717120 | consumed tokens: 56764661760 | elapsed time per iteration (s): 0.08 | learning rate: 7.681E-05 | global batch size: 256 | lm loss: 4.533160E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.681 | TFLOPs: 12.00 | 7: iteration 108280/ 173500 | consumed samples: 27719680 | consumed tokens: 56769904640 | elapsed time per iteration (s): 0.09 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 4.509331E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.403 | TFLOPs: 10.82 | 7: iteration 108290/ 173500 | consumed samples: 27722240 | consumed tokens: 56775147520 | elapsed time per iteration (s): 0.09 | learning rate: 7.678E-05 | global batch size: 256 | lm loss: 4.525840E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.971 | TFLOPs: 10.25 | 7: iteration 108300/ 173500 | consumed samples: 27724800 | consumed tokens: 56780390400 | elapsed time per iteration (s): 0.09 | learning rate: 7.676E-05 | global batch size: 256 | lm loss: 4.518279E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.172 | TFLOPs: 11.00 | 7: iteration 108310/ 173500 | consumed samples: 27727360 | consumed tokens: 56785633280 | elapsed time per iteration (s): 0.08 | learning rate: 7.675E-05 | global batch size: 256 | lm loss: 4.513741E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.742 | TFLOPs: 11.97 | 7: iteration 108320/ 173500 | consumed samples: 27729920 | consumed tokens: 56790876160 | elapsed time per iteration (s): 0.08 | learning rate: 7.673E-05 | global batch size: 256 | lm loss: 4.500619E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.532 | TFLOPs: 11.80 | 7: iteration 108330/ 173500 | consumed samples: 27732480 | consumed tokens: 56796119040 | elapsed time per iteration (s): 0.09 | learning rate: 7.672E-05 | global batch size: 256 | lm loss: 4.521646E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.878 | TFLOPs: 10.82 | 7: iteration 108340/ 173500 | consumed samples: 27735040 | consumed tokens: 56801361920 | elapsed time per iteration (s): 0.10 | learning rate: 7.670E-05 | global batch size: 256 | lm loss: 4.530469E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.865 | TFLOPs: 9.56 | 7: iteration 108350/ 173500 | consumed samples: 27737600 | consumed tokens: 56806604800 | elapsed time per iteration (s): 0.09 | learning rate: 7.668E-05 | global batch size: 256 | lm loss: 4.529494E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.776 | TFLOPs: 10.15 | 7: iteration 108360/ 173500 | consumed samples: 27740160 | consumed tokens: 56811847680 | elapsed time per iteration (s): 0.08 | learning rate: 7.667E-05 | global batch size: 256 | lm loss: 4.517900E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.874 | TFLOPs: 11.94 | 7: iteration 108370/ 173500 | consumed samples: 27742720 | consumed tokens: 56817090560 | elapsed time per iteration (s): 0.08 | learning rate: 7.665E-05 | global batch size: 256 | lm loss: 4.514154E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.519 | TFLOPs: 11.69 | 7: iteration 108380/ 173500 | consumed samples: 27745280 | consumed tokens: 56822333440 | elapsed time per iteration (s): 0.08 | learning rate: 7.664E-05 | global batch size: 256 | lm loss: 4.512627E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.351 | TFLOPs: 11.90 | 7: iteration 108390/ 173500 | consumed samples: 27747840 | consumed tokens: 56827576320 | elapsed time per iteration (s): 0.09 | learning rate: 7.662E-05 | global batch size: 256 | lm loss: 4.511955E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2740.461 | TFLOPs: 10.19 | 7: iteration 108400/ 173500 | consumed samples: 27750400 | consumed tokens: 56832819200 | elapsed time per iteration (s): 0.09 | learning rate: 7.661E-05 | global batch size: 256 | lm loss: 4.519061E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.810 | TFLOPs: 11.17 | 7: iteration 108410/ 173500 | consumed samples: 27752960 | consumed tokens: 56838062080 | elapsed time per iteration (s): 0.12 | learning rate: 7.659E-05 | global batch size: 256 | lm loss: 4.523905E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2050.534 | TFLOPs: 7.63 | 7: iteration 108420/ 173500 | consumed samples: 27755520 | consumed tokens: 56843304960 | elapsed time per iteration (s): 0.08 | learning rate: 7.658E-05 | global batch size: 256 | lm loss: 4.506922E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.525 | TFLOPs: 11.85 | 7: iteration 108430/ 173500 | consumed samples: 27758080 | consumed tokens: 56848547840 | elapsed time per iteration (s): 0.08 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 4.509230E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.149 | TFLOPs: 11.41 | 7: iteration 108440/ 173500 | consumed samples: 27760640 | consumed tokens: 56853790720 | elapsed time per iteration (s): 0.08 | learning rate: 7.655E-05 | global batch size: 256 | lm loss: 4.505498E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.383 | TFLOPs: 11.99 | 7: iteration 108450/ 173500 | consumed samples: 27763200 | consumed tokens: 56859033600 | elapsed time per iteration (s): 0.08 | learning rate: 7.653E-05 | global batch size: 256 | lm loss: 4.522734E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.301 | TFLOPs: 11.86 | 7: iteration 108460/ 173500 | consumed samples: 27765760 | consumed tokens: 56864276480 | elapsed time per iteration (s): 0.08 | learning rate: 7.652E-05 | global batch size: 256 | lm loss: 4.524285E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.255 | TFLOPs: 11.88 | 7: iteration 108470/ 173500 | consumed samples: 27768320 | consumed tokens: 56869519360 | elapsed time per iteration (s): 0.08 | learning rate: 7.650E-05 | global batch size: 256 | lm loss: 4.512074E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.330 | TFLOPs: 11.84 | 7: iteration 108480/ 173500 | consumed samples: 27770880 | consumed tokens: 56874762240 | elapsed time per iteration (s): 0.09 | learning rate: 7.649E-05 | global batch size: 256 | lm loss: 4.495831E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.949 | TFLOPs: 10.07 | 7: iteration 108490/ 173500 | consumed samples: 27773440 | consumed tokens: 56880005120 | elapsed time per iteration (s): 0.08 | learning rate: 7.647E-05 | global batch size: 256 | lm loss: 4.511060E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.925 | TFLOPs: 11.81 | 7: iteration 108500/ 173500 | consumed samples: 27776000 | consumed tokens: 56885248000 | elapsed time per iteration (s): 0.08 | learning rate: 7.646E-05 | global batch size: 256 | lm loss: 4.516388E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.347 | TFLOPs: 11.60 | 7: iteration 108510/ 173500 | consumed samples: 27778560 | consumed tokens: 56890490880 | elapsed time per iteration (s): 0.10 | learning rate: 7.644E-05 | global batch size: 256 | lm loss: 4.515973E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.766 | TFLOPs: 10.01 | 7: iteration 108520/ 173500 | consumed samples: 27781120 | consumed tokens: 56895733760 | elapsed time per iteration (s): 0.08 | learning rate: 7.642E-05 | global batch size: 256 | lm loss: 4.516715E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.860 | TFLOPs: 11.77 | 7: iteration 108530/ 173500 | consumed samples: 27783680 | consumed tokens: 56900976640 | elapsed time per iteration (s): 0.08 | learning rate: 7.641E-05 | global batch size: 256 | lm loss: 4.513420E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.890 | TFLOPs: 11.66 | 7: iteration 108540/ 173500 | consumed samples: 27786240 | consumed tokens: 56906219520 | elapsed time per iteration (s): 0.08 | learning rate: 7.639E-05 | global batch size: 256 | lm loss: 4.516106E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.310 | TFLOPs: 11.80 | 7: iteration 108550/ 173500 | consumed samples: 27788800 | consumed tokens: 56911462400 | elapsed time per iteration (s): 0.09 | learning rate: 7.638E-05 | global batch size: 256 | lm loss: 4.511587E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.105 | TFLOPs: 10.91 | 7: iteration 108560/ 173500 | consumed samples: 27791360 | consumed tokens: 56916705280 | elapsed time per iteration (s): 0.08 | learning rate: 7.636E-05 | global batch size: 256 | lm loss: 4.500963E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.424 | TFLOPs: 11.54 | 7: iteration 108570/ 173500 | consumed samples: 27793920 | consumed tokens: 56921948160 | elapsed time per iteration (s): 0.08 | learning rate: 7.635E-05 | global batch size: 256 | lm loss: 4.516624E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.013 | TFLOPs: 11.94 | 7: iteration 108580/ 173500 | consumed samples: 27796480 | consumed tokens: 56927191040 | elapsed time per iteration (s): 0.09 | learning rate: 7.633E-05 | global batch size: 256 | lm loss: 4.517819E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.084 | TFLOPs: 10.22 | 7: iteration 108590/ 173500 | consumed samples: 27799040 | consumed tokens: 56932433920 | elapsed time per iteration (s): 0.09 | learning rate: 7.632E-05 | global batch size: 256 | lm loss: 4.533713E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.413 | TFLOPs: 10.40 | 7: iteration 108600/ 173500 | consumed samples: 27801600 | consumed tokens: 56937676800 | elapsed time per iteration (s): 0.10 | learning rate: 7.630E-05 | global batch size: 256 | lm loss: 4.527886E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.078 | TFLOPs: 9.19 | 7: iteration 108610/ 173500 | consumed samples: 27804160 | consumed tokens: 56942919680 | elapsed time per iteration (s): 0.11 | learning rate: 7.629E-05 | global batch size: 256 | lm loss: 4.520708E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2345.235 | TFLOPs: 8.72 | 7: iteration 108620/ 173500 | consumed samples: 27806720 | consumed tokens: 56948162560 | elapsed time per iteration (s): 0.10 | learning rate: 7.627E-05 | global batch size: 256 | lm loss: 4.518312E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.737 | TFLOPs: 9.16 | 7: iteration 108630/ 173500 | consumed samples: 27809280 | consumed tokens: 56953405440 | elapsed time per iteration (s): 0.09 | learning rate: 7.626E-05 | global batch size: 256 | lm loss: 4.511484E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2814.550 | TFLOPs: 10.47 | 7: iteration 108640/ 173500 | consumed samples: 27811840 | consumed tokens: 56958648320 | elapsed time per iteration (s): 0.08 | learning rate: 7.624E-05 | global batch size: 256 | lm loss: 4.509647E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.881 | TFLOPs: 11.28 | 7: iteration 108650/ 173500 | consumed samples: 27814400 | consumed tokens: 56963891200 | elapsed time per iteration (s): 0.08 | learning rate: 7.623E-05 | global batch size: 256 | lm loss: 4.518355E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.188 | TFLOPs: 11.94 | 7: iteration 108660/ 173500 | consumed samples: 27816960 | consumed tokens: 56969134080 | elapsed time per iteration (s): 0.08 | learning rate: 7.621E-05 | global batch size: 256 | lm loss: 4.517955E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.151 | TFLOPs: 11.47 | 7: iteration 108670/ 173500 | consumed samples: 27819520 | consumed tokens: 56974376960 | elapsed time per iteration (s): 0.08 | learning rate: 7.620E-05 | global batch size: 256 | lm loss: 4.524931E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.402 | TFLOPs: 11.84 | 7: iteration 108680/ 173500 | consumed samples: 27822080 | consumed tokens: 56979619840 | elapsed time per iteration (s): 0.08 | learning rate: 7.618E-05 | global batch size: 256 | lm loss: 4.520948E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.749 | TFLOPs: 11.93 | 7: iteration 108690/ 173500 | consumed samples: 27824640 | consumed tokens: 56984862720 | elapsed time per iteration (s): 0.10 | learning rate: 7.617E-05 | global batch size: 256 | lm loss: 4.521214E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2673.405 | TFLOPs: 9.94 | 7: iteration 108700/ 173500 | consumed samples: 27827200 | consumed tokens: 56990105600 | elapsed time per iteration (s): 0.08 | learning rate: 7.615E-05 | global batch size: 256 | lm loss: 4.518361E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.307 | TFLOPs: 11.94 | 7: iteration 108710/ 173500 | consumed samples: 27829760 | consumed tokens: 56995348480 | elapsed time per iteration (s): 0.08 | learning rate: 7.613E-05 | global batch size: 256 | lm loss: 4.509435E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.024 | TFLOPs: 11.93 | 7: iteration 108720/ 173500 | consumed samples: 27832320 | consumed tokens: 57000591360 | elapsed time per iteration (s): 0.08 | learning rate: 7.612E-05 | global batch size: 256 | lm loss: 4.508102E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.989 | TFLOPs: 11.91 | 7: iteration 108730/ 173500 | consumed samples: 27834880 | consumed tokens: 57005834240 | elapsed time per iteration (s): 0.08 | learning rate: 7.610E-05 | global batch size: 256 | lm loss: 4.512493E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.619 | TFLOPs: 11.96 | 7: iteration 108740/ 173500 | consumed samples: 27837440 | consumed tokens: 57011077120 | elapsed time per iteration (s): 0.09 | learning rate: 7.609E-05 | global batch size: 256 | lm loss: 4.517076E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.525 | TFLOPs: 10.61 | 7: iteration 108750/ 173500 | consumed samples: 27840000 | consumed tokens: 57016320000 | elapsed time per iteration (s): 0.08 | learning rate: 7.607E-05 | global batch size: 256 | lm loss: 4.518150E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.312 | TFLOPs: 11.95 | 7: iteration 108760/ 173500 | consumed samples: 27842560 | consumed tokens: 57021562880 | elapsed time per iteration (s): 0.09 | learning rate: 7.606E-05 | global batch size: 256 | lm loss: 4.518778E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.381 | TFLOPs: 10.99 | 7: iteration 108770/ 173500 | consumed samples: 27845120 | consumed tokens: 57026805760 | elapsed time per iteration (s): 0.08 | learning rate: 7.604E-05 | global batch size: 256 | lm loss: 4.511671E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.956 | TFLOPs: 12.05 | 7: iteration 108780/ 173500 | consumed samples: 27847680 | consumed tokens: 57032048640 | elapsed time per iteration (s): 0.08 | learning rate: 7.603E-05 | global batch size: 256 | lm loss: 4.505449E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.309 | TFLOPs: 11.23 | 7: iteration 108790/ 173500 | consumed samples: 27850240 | consumed tokens: 57037291520 | elapsed time per iteration (s): 0.08 | learning rate: 7.601E-05 | global batch size: 256 | lm loss: 4.512296E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.853 | TFLOPs: 12.02 | 7: iteration 108800/ 173500 | consumed samples: 27852800 | consumed tokens: 57042534400 | elapsed time per iteration (s): 0.10 | learning rate: 7.600E-05 | global batch size: 256 | lm loss: 4.512371E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2547.824 | TFLOPs: 9.48 | 7: iteration 108810/ 173500 | consumed samples: 27855360 | consumed tokens: 57047777280 | elapsed time per iteration (s): 0.08 | learning rate: 7.598E-05 | global batch size: 256 | lm loss: 4.519164E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.772 | TFLOPs: 11.91 | 7: iteration 108820/ 173500 | consumed samples: 27857920 | consumed tokens: 57053020160 | elapsed time per iteration (s): 0.09 | learning rate: 7.597E-05 | global batch size: 256 | lm loss: 4.525665E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2733.779 | TFLOPs: 10.17 | 7: iteration 108830/ 173500 | consumed samples: 27860480 | consumed tokens: 57058263040 | elapsed time per iteration (s): 0.10 | learning rate: 7.595E-05 | global batch size: 256 | lm loss: 4.509470E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2590.265 | TFLOPs: 9.63 | 7: iteration 108840/ 173500 | consumed samples: 27863040 | consumed tokens: 57063505920 | elapsed time per iteration (s): 0.08 | learning rate: 7.594E-05 | global batch size: 256 | lm loss: 4.520148E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.166 | TFLOPs: 11.87 | 7: iteration 108850/ 173500 | consumed samples: 27865600 | consumed tokens: 57068748800 | elapsed time per iteration (s): 0.10 | learning rate: 7.592E-05 | global batch size: 256 | lm loss: 4.536850E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.562 | TFLOPs: 9.82 | 7: iteration 108860/ 173500 | consumed samples: 27868160 | consumed tokens: 57073991680 | elapsed time per iteration (s): 0.09 | learning rate: 7.591E-05 | global batch size: 256 | lm loss: 4.506838E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.260 | TFLOPs: 11.03 | 7: iteration 108870/ 173500 | consumed samples: 27870720 | consumed tokens: 57079234560 | elapsed time per iteration (s): 0.08 | learning rate: 7.589E-05 | global batch size: 256 | lm loss: 4.533692E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.081 | TFLOPs: 11.89 | 7: iteration 108880/ 173500 | consumed samples: 27873280 | consumed tokens: 57084477440 | elapsed time per iteration (s): 0.09 | learning rate: 7.588E-05 | global batch size: 256 | lm loss: 4.517146E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.720 | TFLOPs: 11.20 | 7: iteration 108890/ 173500 | consumed samples: 27875840 | consumed tokens: 57089720320 | elapsed time per iteration (s): 0.10 | learning rate: 7.586E-05 | global batch size: 256 | lm loss: 4.509531E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2538.596 | TFLOPs: 9.44 | 7: iteration 108900/ 173500 | consumed samples: 27878400 | consumed tokens: 57094963200 | elapsed time per iteration (s): 0.10 | learning rate: 7.585E-05 | global batch size: 256 | lm loss: 4.510207E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.214 | TFLOPs: 9.92 | 7: iteration 108910/ 173500 | consumed samples: 27880960 | consumed tokens: 57100206080 | elapsed time per iteration (s): 0.08 | learning rate: 7.583E-05 | global batch size: 256 | lm loss: 4.520900E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.256 | TFLOPs: 11.89 | 7: iteration 108920/ 173500 | consumed samples: 27883520 | consumed tokens: 57105448960 | elapsed time per iteration (s): 0.09 | learning rate: 7.581E-05 | global batch size: 256 | lm loss: 4.523764E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.538 | TFLOPs: 10.78 | 7: iteration 108930/ 173500 | consumed samples: 27886080 | consumed tokens: 57110691840 | elapsed time per iteration (s): 0.08 | learning rate: 7.580E-05 | global batch size: 256 | lm loss: 4.528362E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.302 | TFLOPs: 11.89 | 7: iteration 108940/ 173500 | consumed samples: 27888640 | consumed tokens: 57115934720 | elapsed time per iteration (s): 0.08 | learning rate: 7.578E-05 | global batch size: 256 | lm loss: 4.525646E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.541 | TFLOPs: 11.90 | 7: iteration 108950/ 173500 | consumed samples: 27891200 | consumed tokens: 57121177600 | elapsed time per iteration (s): 0.10 | learning rate: 7.577E-05 | global batch size: 256 | lm loss: 4.516819E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2653.458 | TFLOPs: 9.87 | 7: iteration 108960/ 173500 | consumed samples: 27893760 | consumed tokens: 57126420480 | elapsed time per iteration (s): 0.25 | learning rate: 7.575E-05 | global batch size: 256 | lm loss: 4.521918E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1034.232 | TFLOPs: 3.85 | 7: iteration 108970/ 173500 | consumed samples: 27896320 | consumed tokens: 57131663360 | elapsed time per iteration (s): 0.09 | learning rate: 7.574E-05 | global batch size: 256 | lm loss: 4.522523E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.819 | TFLOPs: 11.20 | 7: iteration 108980/ 173500 | consumed samples: 27898880 | consumed tokens: 57136906240 | elapsed time per iteration (s): 0.08 | learning rate: 7.572E-05 | global batch size: 256 | lm loss: 4.527328E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.466 | TFLOPs: 11.58 | 7: iteration 108990/ 173500 | consumed samples: 27901440 | consumed tokens: 57142149120 | elapsed time per iteration (s): 0.10 | learning rate: 7.571E-05 | global batch size: 256 | lm loss: 4.513810E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.582 | TFLOPs: 10.00 | 7: iteration 109000/ 173500 | consumed samples: 27904000 | consumed tokens: 57147392000 | elapsed time per iteration (s): 0.10 | learning rate: 7.569E-05 | global batch size: 256 | lm loss: 4.512930E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.343 | TFLOPs: 9.84 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 109000 | lm loss value: 4.388772E+00 | lm loss PPL: 8.054145E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 109000 to checkpoints_14m91b100m 0: [2023-03-17 02:53:02,901] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step109000 is begin to save! 0: [2023-03-17 02:53:02,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:53:02,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:53:02,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:53:02,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:53:02,934] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:53:02,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:53:02,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:53:02,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:53:02,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:53:02,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:53:02,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:53:02,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:53:02,944] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step109000/mp_rank_00_model_states.pt 0: [2023-03-17 02:53:02,944] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:53:02,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:53:02,963] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:53:02,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2023-03-17 02:53:02,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:53:02,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:53:02,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:53:02,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:53:02,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: successfully saved checkpoint at iteration 109000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.26 7: iteration 109010/ 173500 | consumed samples: 27906560 | consumed tokens: 57152634880 | elapsed time per iteration (s): 0.11 | learning rate: 7.568E-05 | global batch size: 256 | lm loss: 4.510627E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2350.274 | TFLOPs: 8.74 | 7: iteration 109020/ 173500 | consumed samples: 27909120 | consumed tokens: 57157877760 | elapsed time per iteration (s): 0.08 | learning rate: 7.566E-05 | global batch size: 256 | lm loss: 4.518274E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.192 | TFLOPs: 11.82 | 7: iteration 109030/ 173500 | consumed samples: 27911680 | consumed tokens: 57163120640 | elapsed time per iteration (s): 0.09 | learning rate: 7.565E-05 | global batch size: 256 | lm loss: 4.505685E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.993 | TFLOPs: 10.99 | 7: iteration 109040/ 173500 | consumed samples: 27914240 | consumed tokens: 57168363520 | elapsed time per iteration (s): 0.08 | learning rate: 7.563E-05 | global batch size: 256 | lm loss: 4.527193E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.655 | TFLOPs: 11.87 | 7: iteration 109050/ 173500 | consumed samples: 27916800 | consumed tokens: 57173606400 | elapsed time per iteration (s): 0.09 | learning rate: 7.562E-05 | global batch size: 256 | lm loss: 4.521529E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.710 | TFLOPs: 10.90 | 7: iteration 109060/ 173500 | consumed samples: 27919360 | consumed tokens: 57178849280 | elapsed time per iteration (s): 0.08 | learning rate: 7.560E-05 | global batch size: 256 | lm loss: 4.510495E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.404 | TFLOPs: 11.53 | 7: iteration 109070/ 173500 | consumed samples: 27921920 | consumed tokens: 57184092160 | elapsed time per iteration (s): 0.08 | learning rate: 7.559E-05 | global batch size: 256 | lm loss: 4.508514E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.797 | TFLOPs: 11.85 | 7: iteration 109080/ 173500 | consumed samples: 27924480 | consumed tokens: 57189335040 | elapsed time per iteration (s): 0.09 | learning rate: 7.557E-05 | global batch size: 256 | lm loss: 4.512434E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.300 | TFLOPs: 10.10 | 7: iteration 109090/ 173500 | consumed samples: 27927040 | consumed tokens: 57194577920 | elapsed time per iteration (s): 0.09 | learning rate: 7.556E-05 | global batch size: 256 | lm loss: 4.504807E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.843 | TFLOPs: 11.18 | 7: iteration 109100/ 173500 | consumed samples: 27929600 | consumed tokens: 57199820800 | elapsed time per iteration (s): 0.08 | learning rate: 7.554E-05 | global batch size: 256 | lm loss: 4.519862E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.043 | TFLOPs: 11.47 | 7: iteration 109110/ 173500 | consumed samples: 27932160 | consumed tokens: 57205063680 | elapsed time per iteration (s): 0.09 | learning rate: 7.553E-05 | global batch size: 256 | lm loss: 4.519183E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2710.679 | TFLOPs: 10.08 | 7: iteration 109120/ 173500 | consumed samples: 27934720 | consumed tokens: 57210306560 | elapsed time per iteration (s): 0.08 | learning rate: 7.551E-05 | global batch size: 256 | lm loss: 4.492304E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.897 | TFLOPs: 11.87 | 7: iteration 109130/ 173500 | consumed samples: 27937280 | consumed tokens: 57215549440 | elapsed time per iteration (s): 0.08 | learning rate: 7.550E-05 | global batch size: 256 | lm loss: 4.516941E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.281 | TFLOPs: 11.79 | 7: iteration 109140/ 173500 | consumed samples: 27939840 | consumed tokens: 57220792320 | elapsed time per iteration (s): 0.08 | learning rate: 7.548E-05 | global batch size: 256 | lm loss: 4.519291E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.296 | TFLOPs: 11.49 | 7: iteration 109150/ 173500 | consumed samples: 27942400 | consumed tokens: 57226035200 | elapsed time per iteration (s): 0.08 | learning rate: 7.546E-05 | global batch size: 256 | lm loss: 4.520737E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.518 | TFLOPs: 11.89 | 7: iteration 109160/ 173500 | consumed samples: 27944960 | consumed tokens: 57231278080 | elapsed time per iteration (s): 0.10 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 4.523887E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.457 | TFLOPs: 9.53 | 7: iteration 109170/ 173500 | consumed samples: 27947520 | consumed tokens: 57236520960 | elapsed time per iteration (s): 0.10 | learning rate: 7.543E-05 | global batch size: 256 | lm loss: 4.511406E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2504.439 | TFLOPs: 9.32 | 7: iteration 109180/ 173500 | consumed samples: 27950080 | consumed tokens: 57241763840 | elapsed time per iteration (s): 0.08 | learning rate: 7.542E-05 | global batch size: 256 | lm loss: 4.508204E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.611 | TFLOPs: 12.03 | 7: iteration 109190/ 173500 | consumed samples: 27952640 | consumed tokens: 57247006720 | elapsed time per iteration (s): 0.08 | learning rate: 7.540E-05 | global batch size: 256 | lm loss: 4.522166E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.771 | TFLOPs: 11.71 | 7: iteration 109200/ 173500 | consumed samples: 27955200 | consumed tokens: 57252249600 | elapsed time per iteration (s): 0.08 | learning rate: 7.539E-05 | global batch size: 256 | lm loss: 4.516082E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.199 | TFLOPs: 11.80 | 7: iteration 109210/ 173500 | consumed samples: 27957760 | consumed tokens: 57257492480 | elapsed time per iteration (s): 0.08 | learning rate: 7.537E-05 | global batch size: 256 | lm loss: 4.510518E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.040 | TFLOPs: 11.95 | 7: iteration 109220/ 173500 | consumed samples: 27960320 | consumed tokens: 57262735360 | elapsed time per iteration (s): 0.08 | learning rate: 7.536E-05 | global batch size: 256 | lm loss: 4.511658E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.784 | TFLOPs: 11.92 | 7: iteration 109230/ 173500 | consumed samples: 27962880 | consumed tokens: 57267978240 | elapsed time per iteration (s): 0.08 | learning rate: 7.534E-05 | global batch size: 256 | lm loss: 4.500645E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.790 | TFLOPs: 11.93 | 7: iteration 109240/ 173500 | consumed samples: 27965440 | consumed tokens: 57273221120 | elapsed time per iteration (s): 0.09 | learning rate: 7.533E-05 | global batch size: 256 | lm loss: 4.519804E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2895.600 | TFLOPs: 10.77 | 7: iteration 109250/ 173500 | consumed samples: 27968000 | consumed tokens: 57278464000 | elapsed time per iteration (s): 0.09 | learning rate: 7.531E-05 | global batch size: 256 | lm loss: 4.518023E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.499 | TFLOPs: 10.54 | 7: iteration 109260/ 173500 | consumed samples: 27970560 | consumed tokens: 57283706880 | elapsed time per iteration (s): 0.08 | learning rate: 7.530E-05 | global batch size: 256 | lm loss: 4.515870E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.679 | TFLOPs: 11.95 | 7: iteration 109270/ 173500 | consumed samples: 27973120 | consumed tokens: 57288949760 | elapsed time per iteration (s): 0.09 | learning rate: 7.528E-05 | global batch size: 256 | lm loss: 4.504433E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.943 | TFLOPs: 11.08 | 7: iteration 109280/ 173500 | consumed samples: 27975680 | consumed tokens: 57294192640 | elapsed time per iteration (s): 0.10 | learning rate: 7.527E-05 | global batch size: 256 | lm loss: 4.504324E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.634 | TFLOPs: 10.02 | 7: iteration 109290/ 173500 | consumed samples: 27978240 | consumed tokens: 57299435520 | elapsed time per iteration (s): 0.08 | learning rate: 7.525E-05 | global batch size: 256 | lm loss: 4.517573E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.079 | TFLOPs: 11.86 | 7: iteration 109300/ 173500 | consumed samples: 27980800 | consumed tokens: 57304678400 | elapsed time per iteration (s): 0.09 | learning rate: 7.524E-05 | global batch size: 256 | lm loss: 4.530221E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.965 | TFLOPs: 10.53 | 7: iteration 109310/ 173500 | consumed samples: 27983360 | consumed tokens: 57309921280 | elapsed time per iteration (s): 0.10 | learning rate: 7.522E-05 | global batch size: 256 | lm loss: 4.525546E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.618 | TFLOPs: 9.34 | 7: iteration 109320/ 173500 | consumed samples: 27985920 | consumed tokens: 57315164160 | elapsed time per iteration (s): 0.10 | learning rate: 7.521E-05 | global batch size: 256 | lm loss: 4.528919E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.057 | TFLOPs: 9.44 | 7: iteration 109330/ 173500 | consumed samples: 27988480 | consumed tokens: 57320407040 | elapsed time per iteration (s): 0.08 | learning rate: 7.519E-05 | global batch size: 256 | lm loss: 4.518365E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.444 | TFLOPs: 11.85 | 7: iteration 109340/ 173500 | consumed samples: 27991040 | consumed tokens: 57325649920 | elapsed time per iteration (s): 0.08 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 4.518862E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.206 | TFLOPs: 11.84 | 7: iteration 109350/ 173500 | consumed samples: 27993600 | consumed tokens: 57330892800 | elapsed time per iteration (s): 0.09 | learning rate: 7.516E-05 | global batch size: 256 | lm loss: 4.519725E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.838 | TFLOPs: 11.11 | 7: iteration 109360/ 173500 | consumed samples: 27996160 | consumed tokens: 57336135680 | elapsed time per iteration (s): 0.09 | learning rate: 7.515E-05 | global batch size: 256 | lm loss: 4.517324E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.137 | TFLOPs: 10.47 | 7: iteration 109370/ 173500 | consumed samples: 27998720 | consumed tokens: 57341378560 | elapsed time per iteration (s): 0.08 | learning rate: 7.513E-05 | global batch size: 256 | lm loss: 4.528475E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.743 | TFLOPs: 12.00 | 7: iteration 109380/ 173500 | consumed samples: 28001280 | consumed tokens: 57346621440 | elapsed time per iteration (s): 0.08 | learning rate: 7.512E-05 | global batch size: 256 | lm loss: 4.513078E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.330 | TFLOPs: 12.02 | 7: iteration 109390/ 173500 | consumed samples: 28003840 | consumed tokens: 57351864320 | elapsed time per iteration (s): 0.09 | learning rate: 7.510E-05 | global batch size: 256 | lm loss: 4.513408E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3004.999 | TFLOPs: 11.18 | 7: iteration 109400/ 173500 | consumed samples: 28006400 | consumed tokens: 57357107200 | elapsed time per iteration (s): 0.08 | learning rate: 7.509E-05 | global batch size: 256 | lm loss: 4.512959E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.849 | TFLOPs: 11.87 | 7: iteration 109410/ 173500 | consumed samples: 28008960 | consumed tokens: 57362350080 | elapsed time per iteration (s): 0.09 | learning rate: 7.507E-05 | global batch size: 256 | lm loss: 4.509512E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.044 | TFLOPs: 11.11 | 7: iteration 109420/ 173500 | consumed samples: 28011520 | consumed tokens: 57367592960 | elapsed time per iteration (s): 0.08 | learning rate: 7.505E-05 | global batch size: 256 | lm loss: 4.512011E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.803 | TFLOPs: 11.95 | 7: iteration 109430/ 173500 | consumed samples: 28014080 | consumed tokens: 57372835840 | elapsed time per iteration (s): 0.08 | learning rate: 7.504E-05 | global batch size: 256 | lm loss: 4.529646E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.237 | TFLOPs: 11.90 | 7: iteration 109440/ 173500 | consumed samples: 28016640 | consumed tokens: 57378078720 | elapsed time per iteration (s): 0.08 | learning rate: 7.502E-05 | global batch size: 256 | lm loss: 4.521871E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.523 | TFLOPs: 11.96 | 7: iteration 109450/ 173500 | consumed samples: 28019200 | consumed tokens: 57383321600 | elapsed time per iteration (s): 0.08 | learning rate: 7.501E-05 | global batch size: 256 | lm loss: 4.527274E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.309 | TFLOPs: 11.91 | 7: iteration 109460/ 173500 | consumed samples: 28021760 | consumed tokens: 57388564480 | elapsed time per iteration (s): 0.08 | learning rate: 7.499E-05 | global batch size: 256 | lm loss: 4.514788E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.568 | TFLOPs: 11.96 | 7: iteration 109470/ 173500 | consumed samples: 28024320 | consumed tokens: 57393807360 | elapsed time per iteration (s): 0.08 | learning rate: 7.498E-05 | global batch size: 256 | lm loss: 4.518937E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.784 | TFLOPs: 11.84 | 7: iteration 109480/ 173500 | consumed samples: 28026880 | consumed tokens: 57399050240 | elapsed time per iteration (s): 0.08 | learning rate: 7.496E-05 | global batch size: 256 | lm loss: 4.510606E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.563 | TFLOPs: 11.26 | 7: iteration 109490/ 173500 | consumed samples: 28029440 | consumed tokens: 57404293120 | elapsed time per iteration (s): 0.08 | learning rate: 7.495E-05 | global batch size: 256 | lm loss: 4.503024E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.852 | TFLOPs: 11.88 | 7: iteration 109500/ 173500 | consumed samples: 28032000 | consumed tokens: 57409536000 | elapsed time per iteration (s): 0.08 | learning rate: 7.493E-05 | global batch size: 256 | lm loss: 4.510907E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.626 | TFLOPs: 11.76 | 7: iteration 109510/ 173500 | consumed samples: 28034560 | consumed tokens: 57414778880 | elapsed time per iteration (s): 0.08 | learning rate: 7.492E-05 | global batch size: 256 | lm loss: 4.522934E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.416 | TFLOPs: 11.86 | 7: iteration 109520/ 173500 | consumed samples: 28037120 | consumed tokens: 57420021760 | elapsed time per iteration (s): 0.10 | learning rate: 7.490E-05 | global batch size: 256 | lm loss: 4.521078E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2556.592 | TFLOPs: 9.51 | 7: iteration 109530/ 173500 | consumed samples: 28039680 | consumed tokens: 57425264640 | elapsed time per iteration (s): 0.10 | learning rate: 7.489E-05 | global batch size: 256 | lm loss: 4.516536E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2674.707 | TFLOPs: 9.95 | 7: iteration 109540/ 173500 | consumed samples: 28042240 | consumed tokens: 57430507520 | elapsed time per iteration (s): 0.08 | learning rate: 7.487E-05 | global batch size: 256 | lm loss: 4.525400E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.195 | TFLOPs: 11.81 | 7: iteration 109550/ 173500 | consumed samples: 28044800 | consumed tokens: 57435750400 | elapsed time per iteration (s): 0.08 | learning rate: 7.486E-05 | global batch size: 256 | lm loss: 4.515513E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.937 | TFLOPs: 11.90 | 7: iteration 109560/ 173500 | consumed samples: 28047360 | consumed tokens: 57440993280 | elapsed time per iteration (s): 0.08 | learning rate: 7.484E-05 | global batch size: 256 | lm loss: 4.522488E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.998 | TFLOPs: 11.92 | 7: iteration 109570/ 173500 | consumed samples: 28049920 | consumed tokens: 57446236160 | elapsed time per iteration (s): 0.08 | learning rate: 7.483E-05 | global batch size: 256 | lm loss: 4.498012E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.713 | TFLOPs: 11.96 | 7: iteration 109580/ 173500 | consumed samples: 28052480 | consumed tokens: 57451479040 | elapsed time per iteration (s): 0.08 | learning rate: 7.481E-05 | global batch size: 256 | lm loss: 4.512279E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.938 | TFLOPs: 11.26 | 7: iteration 109590/ 173500 | consumed samples: 28055040 | consumed tokens: 57456721920 | elapsed time per iteration (s): 0.09 | learning rate: 7.480E-05 | global batch size: 256 | lm loss: 4.534755E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.836 | TFLOPs: 10.82 | 7: iteration 109600/ 173500 | consumed samples: 28057600 | consumed tokens: 57461964800 | elapsed time per iteration (s): 0.09 | learning rate: 7.478E-05 | global batch size: 256 | lm loss: 4.507919E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.418 | TFLOPs: 10.30 | 7: iteration 109610/ 173500 | consumed samples: 28060160 | consumed tokens: 57467207680 | elapsed time per iteration (s): 0.08 | learning rate: 7.477E-05 | global batch size: 256 | lm loss: 4.519145E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.078 | TFLOPs: 11.84 | 7: iteration 109620/ 173500 | consumed samples: 28062720 | consumed tokens: 57472450560 | elapsed time per iteration (s): 0.09 | learning rate: 7.475E-05 | global batch size: 256 | lm loss: 4.524058E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.940 | TFLOPs: 10.61 | 7: iteration 109630/ 173500 | consumed samples: 28065280 | consumed tokens: 57477693440 | elapsed time per iteration (s): 0.10 | learning rate: 7.474E-05 | global batch size: 256 | lm loss: 4.515019E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2662.012 | TFLOPs: 9.90 | 7: iteration 109640/ 173500 | consumed samples: 28067840 | consumed tokens: 57482936320 | elapsed time per iteration (s): 0.09 | learning rate: 7.472E-05 | global batch size: 256 | lm loss: 4.520335E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.586 | TFLOPs: 10.95 | 7: iteration 109650/ 173500 | consumed samples: 28070400 | consumed tokens: 57488179200 | elapsed time per iteration (s): 0.08 | learning rate: 7.471E-05 | global batch size: 256 | lm loss: 4.525083E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.182 | TFLOPs: 11.58 | 7: iteration 109660/ 173500 | consumed samples: 28072960 | consumed tokens: 57493422080 | elapsed time per iteration (s): 0.10 | learning rate: 7.469E-05 | global batch size: 256 | lm loss: 4.524733E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.343 | TFLOPs: 9.64 | 7: iteration 109670/ 173500 | consumed samples: 28075520 | consumed tokens: 57498664960 | elapsed time per iteration (s): 0.08 | learning rate: 7.468E-05 | global batch size: 256 | lm loss: 4.523207E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.939 | TFLOPs: 11.91 | 7: iteration 109680/ 173500 | consumed samples: 28078080 | consumed tokens: 57503907840 | elapsed time per iteration (s): 0.08 | learning rate: 7.466E-05 | global batch size: 256 | lm loss: 4.521286E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.620 | TFLOPs: 11.27 | 7: iteration 109690/ 173500 | consumed samples: 28080640 | consumed tokens: 57509150720 | elapsed time per iteration (s): 0.10 | learning rate: 7.465E-05 | global batch size: 256 | lm loss: 4.519174E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.289 | TFLOPs: 9.26 | 7: iteration 109700/ 173500 | consumed samples: 28083200 | consumed tokens: 57514393600 | elapsed time per iteration (s): 0.09 | learning rate: 7.463E-05 | global batch size: 256 | lm loss: 4.522310E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.869 | TFLOPs: 10.46 | 7: iteration 109710/ 173500 | consumed samples: 28085760 | consumed tokens: 57519636480 | elapsed time per iteration (s): 0.08 | learning rate: 7.462E-05 | global batch size: 256 | lm loss: 4.498728E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.989 | TFLOPs: 11.53 | 7: iteration 109720/ 173500 | consumed samples: 28088320 | consumed tokens: 57524879360 | elapsed time per iteration (s): 0.08 | learning rate: 7.460E-05 | global batch size: 256 | lm loss: 4.517632E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.297 | TFLOPs: 11.87 | 7: iteration 109730/ 173500 | consumed samples: 28090880 | consumed tokens: 57530122240 | elapsed time per iteration (s): 0.11 | learning rate: 7.459E-05 | global batch size: 256 | lm loss: 4.517851E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.927 | TFLOPs: 8.95 | 7: iteration 109740/ 173500 | consumed samples: 28093440 | consumed tokens: 57535365120 | elapsed time per iteration (s): 0.09 | learning rate: 7.457E-05 | global batch size: 256 | lm loss: 4.522028E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2983.292 | TFLOPs: 11.10 | 7: iteration 109750/ 173500 | consumed samples: 28096000 | consumed tokens: 57540608000 | elapsed time per iteration (s): 0.10 | learning rate: 7.455E-05 | global batch size: 256 | lm loss: 4.508640E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2566.388 | TFLOPs: 9.55 | 7: iteration 109760/ 173500 | consumed samples: 28098560 | consumed tokens: 57545850880 | elapsed time per iteration (s): 0.10 | learning rate: 7.454E-05 | global batch size: 256 | lm loss: 4.519740E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2551.702 | TFLOPs: 9.49 | 7: iteration 109770/ 173500 | consumed samples: 28101120 | consumed tokens: 57551093760 | elapsed time per iteration (s): 0.08 | learning rate: 7.452E-05 | global batch size: 256 | lm loss: 4.518170E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.585 | TFLOPs: 11.64 | 7: iteration 109780/ 173500 | consumed samples: 28103680 | consumed tokens: 57556336640 | elapsed time per iteration (s): 0.08 | learning rate: 7.451E-05 | global batch size: 256 | lm loss: 4.501609E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.668 | TFLOPs: 11.95 | 7: iteration 109790/ 173500 | consumed samples: 28106240 | consumed tokens: 57561579520 | elapsed time per iteration (s): 0.08 | learning rate: 7.449E-05 | global batch size: 256 | lm loss: 4.521439E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.357 | TFLOPs: 11.62 | 7: iteration 109800/ 173500 | consumed samples: 28108800 | consumed tokens: 57566822400 | elapsed time per iteration (s): 0.08 | learning rate: 7.448E-05 | global batch size: 256 | lm loss: 4.501466E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.297 | TFLOPs: 11.91 | 7: iteration 109810/ 173500 | consumed samples: 28111360 | consumed tokens: 57572065280 | elapsed time per iteration (s): 0.08 | learning rate: 7.446E-05 | global batch size: 256 | lm loss: 4.526207E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.554 | TFLOPs: 11.72 | 7: iteration 109820/ 173500 | consumed samples: 28113920 | consumed tokens: 57577308160 | elapsed time per iteration (s): 0.08 | learning rate: 7.445E-05 | global batch size: 256 | lm loss: 4.514491E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.771 | TFLOPs: 11.94 | 7: iteration 109830/ 173500 | consumed samples: 28116480 | consumed tokens: 57582551040 | elapsed time per iteration (s): 0.09 | learning rate: 7.443E-05 | global batch size: 256 | lm loss: 4.506733E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2885.127 | TFLOPs: 10.73 | 7: iteration 109840/ 173500 | consumed samples: 28119040 | consumed tokens: 57587793920 | elapsed time per iteration (s): 0.08 | learning rate: 7.442E-05 | global batch size: 256 | lm loss: 4.515267E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.232 | TFLOPs: 11.94 | 7: iteration 109850/ 173500 | consumed samples: 28121600 | consumed tokens: 57593036800 | elapsed time per iteration (s): 0.09 | learning rate: 7.440E-05 | global batch size: 256 | lm loss: 4.520089E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.175 | TFLOPs: 10.46 | 7: iteration 109860/ 173500 | consumed samples: 28124160 | consumed tokens: 57598279680 | elapsed time per iteration (s): 0.10 | learning rate: 7.439E-05 | global batch size: 256 | lm loss: 4.511326E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2648.559 | TFLOPs: 9.85 | 7: iteration 109870/ 173500 | consumed samples: 28126720 | consumed tokens: 57603522560 | elapsed time per iteration (s): 0.08 | learning rate: 7.437E-05 | global batch size: 256 | lm loss: 4.509304E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.369 | TFLOPs: 11.81 | 7: iteration 109880/ 173500 | consumed samples: 28129280 | consumed tokens: 57608765440 | elapsed time per iteration (s): 0.08 | learning rate: 7.436E-05 | global batch size: 256 | lm loss: 4.525011E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.711 | TFLOPs: 11.91 | 7: iteration 109890/ 173500 | consumed samples: 28131840 | consumed tokens: 57614008320 | elapsed time per iteration (s): 0.08 | learning rate: 7.434E-05 | global batch size: 256 | lm loss: 4.504153E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.088 | TFLOPs: 11.88 | 7: iteration 109900/ 173500 | consumed samples: 28134400 | consumed tokens: 57619251200 | elapsed time per iteration (s): 0.09 | learning rate: 7.433E-05 | global batch size: 256 | lm loss: 4.505639E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2829.405 | TFLOPs: 10.52 | 7: iteration 109910/ 173500 | consumed samples: 28136960 | consumed tokens: 57624494080 | elapsed time per iteration (s): 0.08 | learning rate: 7.431E-05 | global batch size: 256 | lm loss: 4.516380E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.093 | TFLOPs: 11.91 | 7: iteration 109920/ 173500 | consumed samples: 28139520 | consumed tokens: 57629736960 | elapsed time per iteration (s): 0.09 | learning rate: 7.430E-05 | global batch size: 256 | lm loss: 4.508801E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.032 | TFLOPs: 10.29 | 7: iteration 109930/ 173500 | consumed samples: 28142080 | consumed tokens: 57634979840 | elapsed time per iteration (s): 0.09 | learning rate: 7.428E-05 | global batch size: 256 | lm loss: 4.510836E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.377 | TFLOPs: 11.06 | 7: iteration 109940/ 173500 | consumed samples: 28144640 | consumed tokens: 57640222720 | elapsed time per iteration (s): 0.08 | learning rate: 7.427E-05 | global batch size: 256 | lm loss: 4.521404E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.006 | TFLOPs: 11.46 | 7: iteration 109950/ 173500 | consumed samples: 28147200 | consumed tokens: 57645465600 | elapsed time per iteration (s): 0.08 | learning rate: 7.425E-05 | global batch size: 256 | lm loss: 4.520980E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.274 | TFLOPs: 11.91 | 7: iteration 109960/ 173500 | consumed samples: 28149760 | consumed tokens: 57650708480 | elapsed time per iteration (s): 0.10 | learning rate: 7.424E-05 | global batch size: 256 | lm loss: 4.511951E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2537.098 | TFLOPs: 9.44 | 7: iteration 109970/ 173500 | consumed samples: 28152320 | consumed tokens: 57655951360 | elapsed time per iteration (s): 0.09 | learning rate: 7.422E-05 | global batch size: 256 | lm loss: 4.508125E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.903 | TFLOPs: 10.40 | 7: iteration 109980/ 173500 | consumed samples: 28154880 | consumed tokens: 57661194240 | elapsed time per iteration (s): 0.09 | learning rate: 7.421E-05 | global batch size: 256 | lm loss: 4.517344E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.147 | TFLOPs: 10.20 | 7: iteration 109990/ 173500 | consumed samples: 28157440 | consumed tokens: 57666437120 | elapsed time per iteration (s): 0.08 | learning rate: 7.419E-05 | global batch size: 256 | lm loss: 4.519682E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.851 | TFLOPs: 11.41 | 0: [2023-03-17 02:54:28,948] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=0, lr=[7.417709678812063e-05, 7.417709678812063e-05, 7.417709678812063e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 110000/ 173500 | consumed samples: 28160000 | consumed tokens: 57671680000 | elapsed time per iteration (s): 0.08 | learning rate: 7.418E-05 | global batch size: 256 | lm loss: 4.508189E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.516 | TFLOPs: 11.44 | 0: steps: 110000 loss: 4.4886 iter time (s): 0.087 samples/sec: 2946.289 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 110000 | lm loss value: 4.373056E+00 | lm loss PPL: 7.928559E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 110000 to checkpoints_14m91b100m 0: [2023-03-17 02:54:29,019] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step110000 is begin to save! 0: [2023-03-17 02:54:29,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:54:29,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:54:29,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:54:29,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:54:29,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:54:29,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:54:29,054] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:54:29,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:54:29,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:54:29,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:54:29,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:54:29,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:54:29,061] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step110000/mp_rank_00_model_states.pt 0: [2023-03-17 02:54:29,061] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:54:29,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:54:29,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:54:29,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,085] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,085] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2023-03-17 02:54:29,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:54:29,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:54:29,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:54:29,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: successfully saved checkpoint at iteration 110000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.02 7: iteration 110010/ 173500 | consumed samples: 28162560 | consumed tokens: 57676922880 | elapsed time per iteration (s): 0.12 | learning rate: 7.416E-05 | global batch size: 256 | lm loss: 4.508707E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2192.054 | TFLOPs: 8.15 | 7: iteration 110020/ 173500 | consumed samples: 28165120 | consumed tokens: 57682165760 | elapsed time per iteration (s): 0.10 | learning rate: 7.415E-05 | global batch size: 256 | lm loss: 4.512082E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2651.316 | TFLOPs: 9.86 | 7: iteration 110030/ 173500 | consumed samples: 28167680 | consumed tokens: 57687408640 | elapsed time per iteration (s): 0.08 | learning rate: 7.413E-05 | global batch size: 256 | lm loss: 4.517305E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.217 | TFLOPs: 11.92 | 7: iteration 110040/ 173500 | consumed samples: 28170240 | consumed tokens: 57692651520 | elapsed time per iteration (s): 0.09 | learning rate: 7.412E-05 | global batch size: 256 | lm loss: 4.514990E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.069 | TFLOPs: 11.02 | 7: iteration 110050/ 173500 | consumed samples: 28172800 | consumed tokens: 57697894400 | elapsed time per iteration (s): 0.10 | learning rate: 7.410E-05 | global batch size: 256 | lm loss: 4.513932E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.651 | TFLOPs: 9.93 | 7: iteration 110060/ 173500 | consumed samples: 28175360 | consumed tokens: 57703137280 | elapsed time per iteration (s): 0.08 | learning rate: 7.409E-05 | global batch size: 256 | lm loss: 4.510214E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.951 | TFLOPs: 11.26 | 7: iteration 110070/ 173500 | consumed samples: 28177920 | consumed tokens: 57708380160 | elapsed time per iteration (s): 0.09 | learning rate: 7.407E-05 | global batch size: 256 | lm loss: 4.529284E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.461 | TFLOPs: 10.63 | 7: iteration 110080/ 173500 | consumed samples: 28180480 | consumed tokens: 57713623040 | elapsed time per iteration (s): 0.08 | learning rate: 7.406E-05 | global batch size: 256 | lm loss: 4.528635E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.702 | TFLOPs: 11.95 | 7: iteration 110090/ 173500 | consumed samples: 28183040 | consumed tokens: 57718865920 | elapsed time per iteration (s): 0.08 | learning rate: 7.404E-05 | global batch size: 256 | lm loss: 4.518561E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.930 | TFLOPs: 11.93 | 7: iteration 110100/ 173500 | consumed samples: 28185600 | consumed tokens: 57724108800 | elapsed time per iteration (s): 0.08 | learning rate: 7.403E-05 | global batch size: 256 | lm loss: 4.517685E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.658 | TFLOPs: 11.87 | 7: iteration 110110/ 173500 | consumed samples: 28188160 | consumed tokens: 57729351680 | elapsed time per iteration (s): 0.08 | learning rate: 7.401E-05 | global batch size: 256 | lm loss: 4.515721E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.824 | TFLOPs: 11.33 | 7: iteration 110120/ 173500 | consumed samples: 28190720 | consumed tokens: 57734594560 | elapsed time per iteration (s): 0.08 | learning rate: 7.400E-05 | global batch size: 256 | lm loss: 4.518526E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.668 | TFLOPs: 11.47 | 7: iteration 110130/ 173500 | consumed samples: 28193280 | consumed tokens: 57739837440 | elapsed time per iteration (s): 0.08 | learning rate: 7.398E-05 | global batch size: 256 | lm loss: 4.521384E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.553 | TFLOPs: 11.25 | 7: iteration 110140/ 173500 | consumed samples: 28195840 | consumed tokens: 57745080320 | elapsed time per iteration (s): 0.08 | learning rate: 7.397E-05 | global batch size: 256 | lm loss: 4.522103E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.439 | TFLOPs: 11.50 | 7: iteration 110150/ 173500 | consumed samples: 28198400 | consumed tokens: 57750323200 | elapsed time per iteration (s): 0.08 | learning rate: 7.395E-05 | global batch size: 256 | lm loss: 4.513899E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.245 | TFLOPs: 11.76 | 7: iteration 110160/ 173500 | consumed samples: 28200960 | consumed tokens: 57755566080 | elapsed time per iteration (s): 0.09 | learning rate: 7.394E-05 | global batch size: 256 | lm loss: 4.520061E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2848.819 | TFLOPs: 10.60 | 7: iteration 110170/ 173500 | consumed samples: 28203520 | consumed tokens: 57760808960 | elapsed time per iteration (s): 0.10 | learning rate: 7.392E-05 | global batch size: 256 | lm loss: 4.522276E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.059 | TFLOPs: 9.30 | 7: iteration 110180/ 173500 | consumed samples: 28206080 | consumed tokens: 57766051840 | elapsed time per iteration (s): 0.09 | learning rate: 7.391E-05 | global batch size: 256 | lm loss: 4.509295E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2716.843 | TFLOPs: 10.11 | 7: iteration 110190/ 173500 | consumed samples: 28208640 | consumed tokens: 57771294720 | elapsed time per iteration (s): 0.17 | learning rate: 7.389E-05 | global batch size: 256 | lm loss: 4.522513E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1512.609 | TFLOPs: 5.63 | 7: iteration 110200/ 173500 | consumed samples: 28211200 | consumed tokens: 57776537600 | elapsed time per iteration (s): 0.13 | learning rate: 7.388E-05 | global batch size: 256 | lm loss: 4.520095E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1968.882 | TFLOPs: 7.32 | 7: iteration 110210/ 173500 | consumed samples: 28213760 | consumed tokens: 57781780480 | elapsed time per iteration (s): 0.12 | learning rate: 7.386E-05 | global batch size: 256 | lm loss: 4.507016E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.769 | TFLOPs: 8.01 | 7: iteration 110220/ 173500 | consumed samples: 28216320 | consumed tokens: 57787023360 | elapsed time per iteration (s): 0.09 | learning rate: 7.385E-05 | global batch size: 256 | lm loss: 4.519925E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.672 | TFLOPs: 10.99 | 7: iteration 110230/ 173500 | consumed samples: 28218880 | consumed tokens: 57792266240 | elapsed time per iteration (s): 0.08 | learning rate: 7.383E-05 | global batch size: 256 | lm loss: 4.536493E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.554 | TFLOPs: 11.83 | 7: iteration 110240/ 173500 | consumed samples: 28221440 | consumed tokens: 57797509120 | elapsed time per iteration (s): 0.09 | learning rate: 7.382E-05 | global batch size: 256 | lm loss: 4.517891E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2947.406 | TFLOPs: 10.96 | 7: iteration 110250/ 173500 | consumed samples: 28224000 | consumed tokens: 57802752000 | elapsed time per iteration (s): 0.09 | learning rate: 7.380E-05 | global batch size: 256 | lm loss: 4.512576E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.635 | TFLOPs: 10.33 | 7: iteration 110260/ 173500 | consumed samples: 28226560 | consumed tokens: 57807994880 | elapsed time per iteration (s): 0.08 | learning rate: 7.378E-05 | global batch size: 256 | lm loss: 4.517009E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.528 | TFLOPs: 11.88 | 7: iteration 110270/ 173500 | consumed samples: 28229120 | consumed tokens: 57813237760 | elapsed time per iteration (s): 0.08 | learning rate: 7.377E-05 | global batch size: 256 | lm loss: 4.519828E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.136 | TFLOPs: 11.86 | 7: iteration 110280/ 173500 | consumed samples: 28231680 | consumed tokens: 57818480640 | elapsed time per iteration (s): 0.08 | learning rate: 7.375E-05 | global batch size: 256 | lm loss: 4.506607E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.366 | TFLOPs: 11.82 | 7: iteration 110290/ 173500 | consumed samples: 28234240 | consumed tokens: 57823723520 | elapsed time per iteration (s): 0.08 | learning rate: 7.374E-05 | global batch size: 256 | lm loss: 4.514268E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.988 | TFLOPs: 11.77 | 7: iteration 110300/ 173500 | consumed samples: 28236800 | consumed tokens: 57828966400 | elapsed time per iteration (s): 0.08 | learning rate: 7.372E-05 | global batch size: 256 | lm loss: 4.520940E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.851 | TFLOPs: 11.63 | 7: iteration 110310/ 173500 | consumed samples: 28239360 | consumed tokens: 57834209280 | elapsed time per iteration (s): 0.08 | learning rate: 7.371E-05 | global batch size: 256 | lm loss: 4.514825E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.581 | TFLOPs: 11.92 | 7: iteration 110320/ 173500 | consumed samples: 28241920 | consumed tokens: 57839452160 | elapsed time per iteration (s): 0.08 | learning rate: 7.369E-05 | global batch size: 256 | lm loss: 4.513927E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.553 | TFLOPs: 11.89 | 7: iteration 110330/ 173500 | consumed samples: 28244480 | consumed tokens: 57844695040 | elapsed time per iteration (s): 0.08 | learning rate: 7.368E-05 | global batch size: 256 | lm loss: 4.534103E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.263 | TFLOPs: 11.87 | 7: iteration 110340/ 173500 | consumed samples: 28247040 | consumed tokens: 57849937920 | elapsed time per iteration (s): 0.08 | learning rate: 7.366E-05 | global batch size: 256 | lm loss: 4.512566E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.921 | TFLOPs: 11.89 | 7: iteration 110350/ 173500 | consumed samples: 28249600 | consumed tokens: 57855180800 | elapsed time per iteration (s): 0.09 | learning rate: 7.365E-05 | global batch size: 256 | lm loss: 4.505372E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.732 | TFLOPs: 11.19 | 7: iteration 110360/ 173500 | consumed samples: 28252160 | consumed tokens: 57860423680 | elapsed time per iteration (s): 0.08 | learning rate: 7.363E-05 | global batch size: 256 | lm loss: 4.519928E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.536 | TFLOPs: 11.23 | 7: iteration 110370/ 173500 | consumed samples: 28254720 | consumed tokens: 57865666560 | elapsed time per iteration (s): 0.08 | learning rate: 7.362E-05 | global batch size: 256 | lm loss: 4.529588E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.230 | TFLOPs: 11.62 | 7: iteration 110380/ 173500 | consumed samples: 28257280 | consumed tokens: 57870909440 | elapsed time per iteration (s): 0.08 | learning rate: 7.360E-05 | global batch size: 256 | lm loss: 4.522542E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.769 | TFLOPs: 11.92 | 7: iteration 110390/ 173500 | consumed samples: 28259840 | consumed tokens: 57876152320 | elapsed time per iteration (s): 0.08 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 4.522269E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.337 | TFLOPs: 11.88 | 7: iteration 110400/ 173500 | consumed samples: 28262400 | consumed tokens: 57881395200 | elapsed time per iteration (s): 0.08 | learning rate: 7.357E-05 | global batch size: 256 | lm loss: 4.514923E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.185 | TFLOPs: 11.93 | 7: iteration 110410/ 173500 | consumed samples: 28264960 | consumed tokens: 57886638080 | elapsed time per iteration (s): 0.08 | learning rate: 7.356E-05 | global batch size: 256 | lm loss: 4.524788E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.635 | TFLOPs: 11.96 | 7: iteration 110420/ 173500 | consumed samples: 28267520 | consumed tokens: 57891880960 | elapsed time per iteration (s): 0.08 | learning rate: 7.354E-05 | global batch size: 256 | lm loss: 4.511337E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.903 | TFLOPs: 11.46 | 7: iteration 110430/ 173500 | consumed samples: 28270080 | consumed tokens: 57897123840 | elapsed time per iteration (s): 0.08 | learning rate: 7.353E-05 | global batch size: 256 | lm loss: 4.505137E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.999 | TFLOPs: 11.95 | 7: iteration 110440/ 173500 | consumed samples: 28272640 | consumed tokens: 57902366720 | elapsed time per iteration (s): 0.09 | learning rate: 7.351E-05 | global batch size: 256 | lm loss: 4.501995E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.036 | TFLOPs: 10.99 | 7: iteration 110450/ 173500 | consumed samples: 28275200 | consumed tokens: 57907609600 | elapsed time per iteration (s): 0.10 | learning rate: 7.350E-05 | global batch size: 256 | lm loss: 4.518540E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.534 | TFLOPs: 9.84 | 7: iteration 110460/ 173500 | consumed samples: 28277760 | consumed tokens: 57912852480 | elapsed time per iteration (s): 0.09 | learning rate: 7.348E-05 | global batch size: 256 | lm loss: 4.520044E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.408 | TFLOPs: 10.04 | 7: iteration 110470/ 173500 | consumed samples: 28280320 | consumed tokens: 57918095360 | elapsed time per iteration (s): 0.09 | learning rate: 7.347E-05 | global batch size: 256 | lm loss: 4.511681E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2934.748 | TFLOPs: 10.92 | 7: iteration 110480/ 173500 | consumed samples: 28282880 | consumed tokens: 57923338240 | elapsed time per iteration (s): 0.09 | learning rate: 7.345E-05 | global batch size: 256 | lm loss: 4.508933E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.471 | TFLOPs: 11.15 | 7: iteration 110490/ 173500 | consumed samples: 28285440 | consumed tokens: 57928581120 | elapsed time per iteration (s): 0.09 | learning rate: 7.344E-05 | global batch size: 256 | lm loss: 4.532012E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.903 | TFLOPs: 10.05 | 7: iteration 110500/ 173500 | consumed samples: 28288000 | consumed tokens: 57933824000 | elapsed time per iteration (s): 0.09 | learning rate: 7.342E-05 | global batch size: 256 | lm loss: 4.511496E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2705.099 | TFLOPs: 10.06 | 7: iteration 110510/ 173500 | consumed samples: 28290560 | consumed tokens: 57939066880 | elapsed time per iteration (s): 0.10 | learning rate: 7.341E-05 | global batch size: 256 | lm loss: 4.520312E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2486.071 | TFLOPs: 9.25 | 7: iteration 110520/ 173500 | consumed samples: 28293120 | consumed tokens: 57944309760 | elapsed time per iteration (s): 0.08 | learning rate: 7.339E-05 | global batch size: 256 | lm loss: 4.510352E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.622 | TFLOPs: 11.42 | 7: iteration 110530/ 173500 | consumed samples: 28295680 | consumed tokens: 57949552640 | elapsed time per iteration (s): 0.10 | learning rate: 7.338E-05 | global batch size: 256 | lm loss: 4.519463E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2466.739 | TFLOPs: 9.18 | 7: iteration 110540/ 173500 | consumed samples: 28298240 | consumed tokens: 57954795520 | elapsed time per iteration (s): 0.11 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 4.511673E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2233.662 | TFLOPs: 8.31 | 7: iteration 110550/ 173500 | consumed samples: 28300800 | consumed tokens: 57960038400 | elapsed time per iteration (s): 0.11 | learning rate: 7.335E-05 | global batch size: 256 | lm loss: 4.508643E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2362.408 | TFLOPs: 8.79 | 7: iteration 110560/ 173500 | consumed samples: 28303360 | consumed tokens: 57965281280 | elapsed time per iteration (s): 0.11 | learning rate: 7.333E-05 | global batch size: 256 | lm loss: 4.513832E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2428.224 | TFLOPs: 9.03 | 7: iteration 110570/ 173500 | consumed samples: 28305920 | consumed tokens: 57970524160 | elapsed time per iteration (s): 0.11 | learning rate: 7.332E-05 | global batch size: 256 | lm loss: 4.506673E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2414.735 | TFLOPs: 8.98 | 7: iteration 110580/ 173500 | consumed samples: 28308480 | consumed tokens: 57975767040 | elapsed time per iteration (s): 0.11 | learning rate: 7.330E-05 | global batch size: 256 | lm loss: 4.512271E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.747 | TFLOPs: 8.91 | 7: iteration 110590/ 173500 | consumed samples: 28311040 | consumed tokens: 57981009920 | elapsed time per iteration (s): 0.12 | learning rate: 7.329E-05 | global batch size: 256 | lm loss: 4.506181E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2134.531 | TFLOPs: 7.94 | 7: iteration 110600/ 173500 | consumed samples: 28313600 | consumed tokens: 57986252800 | elapsed time per iteration (s): 0.11 | learning rate: 7.327E-05 | global batch size: 256 | lm loss: 4.519958E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2374.102 | TFLOPs: 8.83 | 7: iteration 110610/ 173500 | consumed samples: 28316160 | consumed tokens: 57991495680 | elapsed time per iteration (s): 0.10 | learning rate: 7.326E-05 | global batch size: 256 | lm loss: 4.511852E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.040 | TFLOPs: 9.19 | 7: iteration 110620/ 173500 | consumed samples: 28318720 | consumed tokens: 57996738560 | elapsed time per iteration (s): 0.09 | learning rate: 7.324E-05 | global batch size: 256 | lm loss: 4.527948E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2760.912 | TFLOPs: 10.27 | 7: iteration 110630/ 173500 | consumed samples: 28321280 | consumed tokens: 58001981440 | elapsed time per iteration (s): 0.08 | learning rate: 7.323E-05 | global batch size: 256 | lm loss: 4.508582E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.106 | TFLOPs: 11.60 | 7: iteration 110640/ 173500 | consumed samples: 28323840 | consumed tokens: 58007224320 | elapsed time per iteration (s): 0.08 | learning rate: 7.321E-05 | global batch size: 256 | lm loss: 4.515430E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.961 | TFLOPs: 11.21 | 7: iteration 110650/ 173500 | consumed samples: 28326400 | consumed tokens: 58012467200 | elapsed time per iteration (s): 0.08 | learning rate: 7.320E-05 | global batch size: 256 | lm loss: 4.512318E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.386 | TFLOPs: 11.22 | 7: iteration 110660/ 173500 | consumed samples: 28328960 | consumed tokens: 58017710080 | elapsed time per iteration (s): 0.08 | learning rate: 7.318E-05 | global batch size: 256 | lm loss: 4.519924E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.050 | TFLOPs: 11.48 | 7: iteration 110670/ 173500 | consumed samples: 28331520 | consumed tokens: 58022952960 | elapsed time per iteration (s): 0.09 | learning rate: 7.317E-05 | global batch size: 256 | lm loss: 4.508176E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.431 | TFLOPs: 10.69 | 7: iteration 110680/ 173500 | consumed samples: 28334080 | consumed tokens: 58028195840 | elapsed time per iteration (s): 0.08 | learning rate: 7.315E-05 | global batch size: 256 | lm loss: 4.512194E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.433 | TFLOPs: 11.38 | 7: iteration 110690/ 173500 | consumed samples: 28336640 | consumed tokens: 58033438720 | elapsed time per iteration (s): 0.08 | learning rate: 7.314E-05 | global batch size: 256 | lm loss: 4.523633E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.925 | TFLOPs: 12.13 | 7: iteration 110700/ 173500 | consumed samples: 28339200 | consumed tokens: 58038681600 | elapsed time per iteration (s): 0.09 | learning rate: 7.312E-05 | global batch size: 256 | lm loss: 4.514678E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.722 | TFLOPs: 10.19 | 7: iteration 110710/ 173500 | consumed samples: 28341760 | consumed tokens: 58043924480 | elapsed time per iteration (s): 0.09 | learning rate: 7.311E-05 | global batch size: 256 | lm loss: 4.527518E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.774 | TFLOPs: 11.14 | 7: iteration 110720/ 173500 | consumed samples: 28344320 | consumed tokens: 58049167360 | elapsed time per iteration (s): 0.11 | learning rate: 7.309E-05 | global batch size: 256 | lm loss: 4.518063E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.733 | TFLOPs: 8.82 | 7: iteration 110730/ 173500 | consumed samples: 28346880 | consumed tokens: 58054410240 | elapsed time per iteration (s): 0.11 | learning rate: 7.308E-05 | global batch size: 256 | lm loss: 4.512161E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2313.092 | TFLOPs: 8.60 | 7: iteration 110740/ 173500 | consumed samples: 28349440 | consumed tokens: 58059653120 | elapsed time per iteration (s): 0.11 | learning rate: 7.306E-05 | global batch size: 256 | lm loss: 4.524724E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2376.600 | TFLOPs: 8.84 | 7: iteration 110750/ 173500 | consumed samples: 28352000 | consumed tokens: 58064896000 | elapsed time per iteration (s): 0.10 | learning rate: 7.305E-05 | global batch size: 256 | lm loss: 4.521292E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.009 | TFLOPs: 9.12 | 7: iteration 110760/ 173500 | consumed samples: 28354560 | consumed tokens: 58070138880 | elapsed time per iteration (s): 0.12 | learning rate: 7.303E-05 | global batch size: 256 | lm loss: 4.517886E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2222.232 | TFLOPs: 8.27 | 7: iteration 110770/ 173500 | consumed samples: 28357120 | consumed tokens: 58075381760 | elapsed time per iteration (s): 0.10 | learning rate: 7.302E-05 | global batch size: 256 | lm loss: 4.529813E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.928 | TFLOPs: 9.09 | 7: iteration 110780/ 173500 | consumed samples: 28359680 | consumed tokens: 58080624640 | elapsed time per iteration (s): 0.10 | learning rate: 7.300E-05 | global batch size: 256 | lm loss: 4.512612E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.311 | TFLOPs: 9.26 | 7: iteration 110790/ 173500 | consumed samples: 28362240 | consumed tokens: 58085867520 | elapsed time per iteration (s): 0.12 | learning rate: 7.299E-05 | global batch size: 256 | lm loss: 4.523841E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2180.204 | TFLOPs: 8.11 | 7: iteration 110800/ 173500 | consumed samples: 28364800 | consumed tokens: 58091110400 | elapsed time per iteration (s): 0.11 | learning rate: 7.297E-05 | global batch size: 256 | lm loss: 4.525900E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2356.427 | TFLOPs: 8.76 | 7: iteration 110810/ 173500 | consumed samples: 28367360 | consumed tokens: 58096353280 | elapsed time per iteration (s): 0.11 | learning rate: 7.296E-05 | global batch size: 256 | lm loss: 4.515901E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2425.702 | TFLOPs: 9.02 | 7: iteration 110820/ 173500 | consumed samples: 28369920 | consumed tokens: 58101596160 | elapsed time per iteration (s): 0.12 | learning rate: 7.294E-05 | global batch size: 256 | lm loss: 4.513903E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2169.256 | TFLOPs: 8.07 | 7: iteration 110830/ 173500 | consumed samples: 28372480 | consumed tokens: 58106839040 | elapsed time per iteration (s): 0.10 | learning rate: 7.293E-05 | global batch size: 256 | lm loss: 4.524043E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2469.969 | TFLOPs: 9.19 | 7: iteration 110840/ 173500 | consumed samples: 28375040 | consumed tokens: 58112081920 | elapsed time per iteration (s): 0.12 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 4.523236E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.371 | TFLOPs: 7.81 | 7: iteration 110850/ 173500 | consumed samples: 28377600 | consumed tokens: 58117324800 | elapsed time per iteration (s): 0.10 | learning rate: 7.290E-05 | global batch size: 256 | lm loss: 4.504893E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.943 | TFLOPs: 9.34 | 7: iteration 110860/ 173500 | consumed samples: 28380160 | consumed tokens: 58122567680 | elapsed time per iteration (s): 0.11 | learning rate: 7.288E-05 | global batch size: 256 | lm loss: 4.521471E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.285 | TFLOPs: 8.91 | 7: iteration 110870/ 173500 | consumed samples: 28382720 | consumed tokens: 58127810560 | elapsed time per iteration (s): 0.11 | learning rate: 7.287E-05 | global batch size: 256 | lm loss: 4.518394E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.620 | TFLOPs: 8.75 | 7: iteration 110880/ 173500 | consumed samples: 28385280 | consumed tokens: 58133053440 | elapsed time per iteration (s): 0.10 | learning rate: 7.285E-05 | global batch size: 256 | lm loss: 4.517454E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.146 | TFLOPs: 9.30 | 7: iteration 110890/ 173500 | consumed samples: 28387840 | consumed tokens: 58138296320 | elapsed time per iteration (s): 0.12 | learning rate: 7.284E-05 | global batch size: 256 | lm loss: 4.516916E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2141.467 | TFLOPs: 7.97 | 7: iteration 110900/ 173500 | consumed samples: 28390400 | consumed tokens: 58143539200 | elapsed time per iteration (s): 0.11 | learning rate: 7.282E-05 | global batch size: 256 | lm loss: 4.509831E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.725 | TFLOPs: 8.53 | 7: iteration 110910/ 173500 | consumed samples: 28392960 | consumed tokens: 58148782080 | elapsed time per iteration (s): 0.11 | learning rate: 7.281E-05 | global batch size: 256 | lm loss: 4.519095E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.486 | TFLOPs: 8.98 | 7: iteration 110920/ 173500 | consumed samples: 28395520 | consumed tokens: 58154024960 | elapsed time per iteration (s): 0.10 | learning rate: 7.279E-05 | global batch size: 256 | lm loss: 4.518681E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.190 | TFLOPs: 9.26 | 7: iteration 110930/ 173500 | consumed samples: 28398080 | consumed tokens: 58159267840 | elapsed time per iteration (s): 0.10 | learning rate: 7.278E-05 | global batch size: 256 | lm loss: 4.515933E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.280 | TFLOPs: 9.30 | 7: iteration 110940/ 173500 | consumed samples: 28400640 | consumed tokens: 58164510720 | elapsed time per iteration (s): 0.11 | learning rate: 7.276E-05 | global batch size: 256 | lm loss: 4.503884E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.654 | TFLOPs: 8.69 | 7: iteration 110950/ 173500 | consumed samples: 28403200 | consumed tokens: 58169753600 | elapsed time per iteration (s): 0.11 | learning rate: 7.275E-05 | global batch size: 256 | lm loss: 4.525612E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2410.825 | TFLOPs: 8.97 | 7: iteration 110960/ 173500 | consumed samples: 28405760 | consumed tokens: 58174996480 | elapsed time per iteration (s): 0.11 | learning rate: 7.273E-05 | global batch size: 256 | lm loss: 4.516643E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2330.196 | TFLOPs: 8.67 | 7: iteration 110970/ 173500 | consumed samples: 28408320 | consumed tokens: 58180239360 | elapsed time per iteration (s): 0.12 | learning rate: 7.272E-05 | global batch size: 256 | lm loss: 4.521004E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2102.389 | TFLOPs: 7.82 | 7: iteration 110980/ 173500 | consumed samples: 28410880 | consumed tokens: 58185482240 | elapsed time per iteration (s): 0.11 | learning rate: 7.270E-05 | global batch size: 256 | lm loss: 4.526413E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2321.992 | TFLOPs: 8.64 | 7: iteration 110990/ 173500 | consumed samples: 28413440 | consumed tokens: 58190725120 | elapsed time per iteration (s): 0.11 | learning rate: 7.269E-05 | global batch size: 256 | lm loss: 4.515799E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.883 | TFLOPs: 8.60 | 7: iteration 111000/ 173500 | consumed samples: 28416000 | consumed tokens: 58195968000 | elapsed time per iteration (s): 0.11 | learning rate: 7.267E-05 | global batch size: 256 | lm loss: 4.513733E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.384 | TFLOPs: 8.72 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 111000 | lm loss value: 4.379717E+00 | lm loss PPL: 7.981547E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 111000 to checkpoints_14m91b100m 0: [2023-03-17 02:56:05,435] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step111000 is begin to save! 0: [2023-03-17 02:56:05,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:56:05,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:56:05,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:56:05,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:56:05,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:56:05,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:56:05,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:56:05,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:56:05,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:56:05,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:56:05,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:56:05,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:56:05,478] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step111000/mp_rank_00_model_states.pt 0: [2023-03-17 02:56:05,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:56:05,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:56:05,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:56:05,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:56:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2023-03-17 02:56:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:56:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: successfully saved checkpoint at iteration 111000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.74 7: iteration 111010/ 173500 | consumed samples: 28418560 | consumed tokens: 58201210880 | elapsed time per iteration (s): 0.13 | learning rate: 7.266E-05 | global batch size: 256 | lm loss: 4.522124E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2000.049 | TFLOPs: 7.44 | 7: iteration 111020/ 173500 | consumed samples: 28421120 | consumed tokens: 58206453760 | elapsed time per iteration (s): 0.11 | learning rate: 7.264E-05 | global batch size: 256 | lm loss: 4.501025E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.014 | TFLOPs: 8.85 | 7: iteration 111030/ 173500 | consumed samples: 28423680 | consumed tokens: 58211696640 | elapsed time per iteration (s): 0.11 | learning rate: 7.263E-05 | global batch size: 256 | lm loss: 4.532637E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2362.634 | TFLOPs: 8.79 | 7: iteration 111040/ 173500 | consumed samples: 28426240 | consumed tokens: 58216939520 | elapsed time per iteration (s): 0.11 | learning rate: 7.261E-05 | global batch size: 256 | lm loss: 4.517231E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.475 | TFLOPs: 8.72 | 7: iteration 111050/ 173500 | consumed samples: 28428800 | consumed tokens: 58222182400 | elapsed time per iteration (s): 0.11 | learning rate: 7.260E-05 | global batch size: 256 | lm loss: 4.508799E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.717 | TFLOPs: 8.44 | 7: iteration 111060/ 173500 | consumed samples: 28431360 | consumed tokens: 58227425280 | elapsed time per iteration (s): 0.11 | learning rate: 7.258E-05 | global batch size: 256 | lm loss: 4.522359E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2253.518 | TFLOPs: 8.38 | 7: iteration 111070/ 173500 | consumed samples: 28433920 | consumed tokens: 58232668160 | elapsed time per iteration (s): 0.11 | learning rate: 7.257E-05 | global batch size: 256 | lm loss: 4.515334E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.070 | TFLOPs: 9.02 | 7: iteration 111080/ 173500 | consumed samples: 28436480 | consumed tokens: 58237911040 | elapsed time per iteration (s): 0.11 | learning rate: 7.255E-05 | global batch size: 256 | lm loss: 4.523449E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2317.612 | TFLOPs: 8.62 | 7: iteration 111090/ 173500 | consumed samples: 28439040 | consumed tokens: 58243153920 | elapsed time per iteration (s): 0.10 | learning rate: 7.254E-05 | global batch size: 256 | lm loss: 4.508962E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.861 | TFLOPs: 9.26 | 7: iteration 111100/ 173500 | consumed samples: 28441600 | consumed tokens: 58248396800 | elapsed time per iteration (s): 0.09 | learning rate: 7.252E-05 | global batch size: 256 | lm loss: 4.514847E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2720.596 | TFLOPs: 10.12 | 7: iteration 111110/ 173500 | consumed samples: 28444160 | consumed tokens: 58253639680 | elapsed time per iteration (s): 0.08 | learning rate: 7.251E-05 | global batch size: 256 | lm loss: 4.515561E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.310 | TFLOPs: 11.87 | 7: iteration 111120/ 173500 | consumed samples: 28446720 | consumed tokens: 58258882560 | elapsed time per iteration (s): 0.08 | learning rate: 7.249E-05 | global batch size: 256 | lm loss: 4.516795E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.086 | TFLOPs: 11.43 | 7: iteration 111130/ 173500 | consumed samples: 28449280 | consumed tokens: 58264125440 | elapsed time per iteration (s): 0.08 | learning rate: 7.248E-05 | global batch size: 256 | lm loss: 4.507746E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.622 | TFLOPs: 11.82 | 7: iteration 111140/ 173500 | consumed samples: 28451840 | consumed tokens: 58269368320 | elapsed time per iteration (s): 0.10 | learning rate: 7.246E-05 | global batch size: 256 | lm loss: 4.517056E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.493 | TFLOPs: 9.61 | 7: iteration 111150/ 173500 | consumed samples: 28454400 | consumed tokens: 58274611200 | elapsed time per iteration (s): 0.10 | learning rate: 7.245E-05 | global batch size: 256 | lm loss: 4.513256E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.553 | TFLOPs: 9.45 | 7: iteration 111160/ 173500 | consumed samples: 28456960 | consumed tokens: 58279854080 | elapsed time per iteration (s): 0.10 | learning rate: 7.243E-05 | global batch size: 256 | lm loss: 4.539777E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.603 | TFLOPs: 10.00 | 7: iteration 111170/ 173500 | consumed samples: 28459520 | consumed tokens: 58285096960 | elapsed time per iteration (s): 0.10 | learning rate: 7.242E-05 | global batch size: 256 | lm loss: 4.516789E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.098 | TFLOPs: 10.01 | 7: iteration 111180/ 173500 | consumed samples: 28462080 | consumed tokens: 58290339840 | elapsed time per iteration (s): 0.08 | learning rate: 7.240E-05 | global batch size: 256 | lm loss: 4.530141E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.213 | TFLOPs: 12.05 | 7: iteration 111190/ 173500 | consumed samples: 28464640 | consumed tokens: 58295582720 | elapsed time per iteration (s): 0.08 | learning rate: 7.239E-05 | global batch size: 256 | lm loss: 4.509578E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3250.102 | TFLOPs: 12.09 | 7: iteration 111200/ 173500 | consumed samples: 28467200 | consumed tokens: 58300825600 | elapsed time per iteration (s): 0.08 | learning rate: 7.237E-05 | global batch size: 256 | lm loss: 4.517937E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.806 | TFLOPs: 12.04 | 7: iteration 111210/ 173500 | consumed samples: 28469760 | consumed tokens: 58306068480 | elapsed time per iteration (s): 0.09 | learning rate: 7.236E-05 | global batch size: 256 | lm loss: 4.504268E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.229 | TFLOPs: 10.79 | 7: iteration 111220/ 173500 | consumed samples: 28472320 | consumed tokens: 58311311360 | elapsed time per iteration (s): 0.09 | learning rate: 7.234E-05 | global batch size: 256 | lm loss: 4.513341E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2717.867 | TFLOPs: 10.11 | 7: iteration 111230/ 173500 | consumed samples: 28474880 | consumed tokens: 58316554240 | elapsed time per iteration (s): 0.08 | learning rate: 7.233E-05 | global batch size: 256 | lm loss: 4.522181E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.998 | TFLOPs: 11.86 | 7: iteration 111240/ 173500 | consumed samples: 28477440 | consumed tokens: 58321797120 | elapsed time per iteration (s): 0.08 | learning rate: 7.231E-05 | global batch size: 256 | lm loss: 4.511457E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.792 | TFLOPs: 11.61 | 7: iteration 111250/ 173500 | consumed samples: 28480000 | consumed tokens: 58327040000 | elapsed time per iteration (s): 0.09 | learning rate: 7.230E-05 | global batch size: 256 | lm loss: 4.512746E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2943.583 | TFLOPs: 10.95 | 7: iteration 111260/ 173500 | consumed samples: 28482560 | consumed tokens: 58332282880 | elapsed time per iteration (s): 0.08 | learning rate: 7.228E-05 | global batch size: 256 | lm loss: 4.529148E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.852 | TFLOPs: 11.86 | 7: iteration 111270/ 173500 | consumed samples: 28485120 | consumed tokens: 58337525760 | elapsed time per iteration (s): 0.09 | learning rate: 7.227E-05 | global batch size: 256 | lm loss: 4.527753E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2899.340 | TFLOPs: 10.78 | 7: iteration 111280/ 173500 | consumed samples: 28487680 | consumed tokens: 58342768640 | elapsed time per iteration (s): 0.08 | learning rate: 7.225E-05 | global batch size: 256 | lm loss: 4.518924E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.546 | TFLOPs: 11.59 | 7: iteration 111290/ 173500 | consumed samples: 28490240 | consumed tokens: 58348011520 | elapsed time per iteration (s): 0.08 | learning rate: 7.224E-05 | global batch size: 256 | lm loss: 4.511843E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.211 | TFLOPs: 11.67 | 7: iteration 111300/ 173500 | consumed samples: 28492800 | consumed tokens: 58353254400 | elapsed time per iteration (s): 0.08 | learning rate: 7.222E-05 | global batch size: 256 | lm loss: 4.507003E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.646 | TFLOPs: 11.56 | 7: iteration 111310/ 173500 | consumed samples: 28495360 | consumed tokens: 58358497280 | elapsed time per iteration (s): 0.08 | learning rate: 7.221E-05 | global batch size: 256 | lm loss: 4.531631E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.441 | TFLOPs: 11.89 | 7: iteration 111320/ 173500 | consumed samples: 28497920 | consumed tokens: 58363740160 | elapsed time per iteration (s): 0.09 | learning rate: 7.219E-05 | global batch size: 256 | lm loss: 4.512405E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.537 | TFLOPs: 10.33 | 7: iteration 111330/ 173500 | consumed samples: 28500480 | consumed tokens: 58368983040 | elapsed time per iteration (s): 0.08 | learning rate: 7.218E-05 | global batch size: 256 | lm loss: 4.509824E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.429 | TFLOPs: 11.28 | 7: iteration 111340/ 173500 | consumed samples: 28503040 | consumed tokens: 58374225920 | elapsed time per iteration (s): 0.09 | learning rate: 7.216E-05 | global batch size: 256 | lm loss: 4.526048E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.484 | TFLOPs: 10.38 | 7: iteration 111350/ 173500 | consumed samples: 28505600 | consumed tokens: 58379468800 | elapsed time per iteration (s): 0.10 | learning rate: 7.215E-05 | global batch size: 256 | lm loss: 4.508549E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2637.331 | TFLOPs: 9.81 | 7: iteration 111360/ 173500 | consumed samples: 28508160 | consumed tokens: 58384711680 | elapsed time per iteration (s): 0.10 | learning rate: 7.213E-05 | global batch size: 256 | lm loss: 4.521563E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.226 | TFLOPs: 9.37 | 7: iteration 111370/ 173500 | consumed samples: 28510720 | consumed tokens: 58389954560 | elapsed time per iteration (s): 0.10 | learning rate: 7.212E-05 | global batch size: 256 | lm loss: 4.529307E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.737 | TFLOPs: 9.45 | 7: iteration 111380/ 173500 | consumed samples: 28513280 | consumed tokens: 58395197440 | elapsed time per iteration (s): 0.09 | learning rate: 7.210E-05 | global batch size: 256 | lm loss: 4.491055E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.134 | TFLOPs: 10.68 | 7: iteration 111390/ 173500 | consumed samples: 28515840 | consumed tokens: 58400440320 | elapsed time per iteration (s): 0.08 | learning rate: 7.209E-05 | global batch size: 256 | lm loss: 4.519935E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.264 | TFLOPs: 11.60 | 7: iteration 111400/ 173500 | consumed samples: 28518400 | consumed tokens: 58405683200 | elapsed time per iteration (s): 0.10 | learning rate: 7.207E-05 | global batch size: 256 | lm loss: 4.513237E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.326 | TFLOPs: 9.37 | 7: iteration 111410/ 173500 | consumed samples: 28520960 | consumed tokens: 58410926080 | elapsed time per iteration (s): 0.09 | learning rate: 7.206E-05 | global batch size: 256 | lm loss: 4.516727E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.185 | TFLOPs: 10.97 | 7: iteration 111420/ 173500 | consumed samples: 28523520 | consumed tokens: 58416168960 | elapsed time per iteration (s): 0.08 | learning rate: 7.205E-05 | global batch size: 256 | lm loss: 4.518484E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.944 | TFLOPs: 11.62 | 7: iteration 111430/ 173500 | consumed samples: 28526080 | consumed tokens: 58421411840 | elapsed time per iteration (s): 0.10 | learning rate: 7.203E-05 | global batch size: 256 | lm loss: 4.519146E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2501.992 | TFLOPs: 9.31 | 7: iteration 111440/ 173500 | consumed samples: 28528640 | consumed tokens: 58426654720 | elapsed time per iteration (s): 0.09 | learning rate: 7.202E-05 | global batch size: 256 | lm loss: 4.515405E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.899 | TFLOPs: 10.67 | 7: iteration 111450/ 173500 | consumed samples: 28531200 | consumed tokens: 58431897600 | elapsed time per iteration (s): 0.15 | learning rate: 7.200E-05 | global batch size: 256 | lm loss: 4.511003E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1753.318 | TFLOPs: 6.52 | 7: iteration 111460/ 173500 | consumed samples: 28533760 | consumed tokens: 58437140480 | elapsed time per iteration (s): 0.10 | learning rate: 7.199E-05 | global batch size: 256 | lm loss: 4.515434E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2453.267 | TFLOPs: 9.13 | 7: iteration 111470/ 173500 | consumed samples: 28536320 | consumed tokens: 58442383360 | elapsed time per iteration (s): 0.09 | learning rate: 7.197E-05 | global batch size: 256 | lm loss: 4.524209E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2930.021 | TFLOPs: 10.90 | 7: iteration 111480/ 173500 | consumed samples: 28538880 | consumed tokens: 58447626240 | elapsed time per iteration (s): 0.08 | learning rate: 7.196E-05 | global batch size: 256 | lm loss: 4.514797E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.017 | TFLOPs: 12.01 | 7: iteration 111490/ 173500 | consumed samples: 28541440 | consumed tokens: 58452869120 | elapsed time per iteration (s): 0.10 | learning rate: 7.194E-05 | global batch size: 256 | lm loss: 4.510522E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2544.862 | TFLOPs: 9.47 | 7: iteration 111500/ 173500 | consumed samples: 28544000 | consumed tokens: 58458112000 | elapsed time per iteration (s): 0.08 | learning rate: 7.193E-05 | global batch size: 256 | lm loss: 4.508495E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.498 | TFLOPs: 11.47 | 7: iteration 111510/ 173500 | consumed samples: 28546560 | consumed tokens: 58463354880 | elapsed time per iteration (s): 0.08 | learning rate: 7.191E-05 | global batch size: 256 | lm loss: 4.512282E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.560 | TFLOPs: 11.90 | 7: iteration 111520/ 173500 | consumed samples: 28549120 | consumed tokens: 58468597760 | elapsed time per iteration (s): 0.08 | learning rate: 7.190E-05 | global batch size: 256 | lm loss: 4.507307E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.917 | TFLOPs: 11.92 | 7: iteration 111530/ 173500 | consumed samples: 28551680 | consumed tokens: 58473840640 | elapsed time per iteration (s): 0.08 | learning rate: 7.188E-05 | global batch size: 256 | lm loss: 4.510175E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.033 | TFLOPs: 11.59 | 7: iteration 111540/ 173500 | consumed samples: 28554240 | consumed tokens: 58479083520 | elapsed time per iteration (s): 0.08 | learning rate: 7.187E-05 | global batch size: 256 | lm loss: 4.517283E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.671 | TFLOPs: 11.99 | 7: iteration 111550/ 173500 | consumed samples: 28556800 | consumed tokens: 58484326400 | elapsed time per iteration (s): 0.08 | learning rate: 7.185E-05 | global batch size: 256 | lm loss: 4.521429E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.093 | TFLOPs: 12.00 | 7: iteration 111560/ 173500 | consumed samples: 28559360 | consumed tokens: 58489569280 | elapsed time per iteration (s): 0.10 | learning rate: 7.184E-05 | global batch size: 256 | lm loss: 4.521372E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2594.109 | TFLOPs: 9.65 | 7: iteration 111570/ 173500 | consumed samples: 28561920 | consumed tokens: 58494812160 | elapsed time per iteration (s): 0.08 | learning rate: 7.182E-05 | global batch size: 256 | lm loss: 4.516006E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.076 | TFLOPs: 11.73 | 7: iteration 111580/ 173500 | consumed samples: 28564480 | consumed tokens: 58500055040 | elapsed time per iteration (s): 0.09 | learning rate: 7.181E-05 | global batch size: 256 | lm loss: 4.507137E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2951.829 | TFLOPs: 10.98 | 7: iteration 111590/ 173500 | consumed samples: 28567040 | consumed tokens: 58505297920 | elapsed time per iteration (s): 0.10 | learning rate: 7.179E-05 | global batch size: 256 | lm loss: 4.511576E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2627.319 | TFLOPs: 9.77 | 7: iteration 111600/ 173500 | consumed samples: 28569600 | consumed tokens: 58510540800 | elapsed time per iteration (s): 0.10 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 4.512643E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2494.218 | TFLOPs: 9.28 | 7: iteration 111610/ 173500 | consumed samples: 28572160 | consumed tokens: 58515783680 | elapsed time per iteration (s): 0.08 | learning rate: 7.176E-05 | global batch size: 256 | lm loss: 4.502482E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3249.107 | TFLOPs: 12.09 | 7: iteration 111620/ 173500 | consumed samples: 28574720 | consumed tokens: 58521026560 | elapsed time per iteration (s): 0.08 | learning rate: 7.175E-05 | global batch size: 256 | lm loss: 4.513867E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.613 | TFLOPs: 11.76 | 7: iteration 111630/ 173500 | consumed samples: 28577280 | consumed tokens: 58526269440 | elapsed time per iteration (s): 0.09 | learning rate: 7.173E-05 | global batch size: 256 | lm loss: 4.504910E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.212 | TFLOPs: 10.28 | 7: iteration 111640/ 173500 | consumed samples: 28579840 | consumed tokens: 58531512320 | elapsed time per iteration (s): 0.08 | learning rate: 7.172E-05 | global batch size: 256 | lm loss: 4.516099E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.651 | TFLOPs: 11.45 | 7: iteration 111650/ 173500 | consumed samples: 28582400 | consumed tokens: 58536755200 | elapsed time per iteration (s): 0.08 | learning rate: 7.170E-05 | global batch size: 256 | lm loss: 4.518990E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.173 | TFLOPs: 11.34 | 7: iteration 111660/ 173500 | consumed samples: 28584960 | consumed tokens: 58541998080 | elapsed time per iteration (s): 0.08 | learning rate: 7.169E-05 | global batch size: 256 | lm loss: 4.516807E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.665 | TFLOPs: 11.30 | 7: iteration 111670/ 173500 | consumed samples: 28587520 | consumed tokens: 58547240960 | elapsed time per iteration (s): 0.08 | learning rate: 7.167E-05 | global batch size: 256 | lm loss: 4.511765E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.968 | TFLOPs: 12.07 | 7: iteration 111680/ 173500 | consumed samples: 28590080 | consumed tokens: 58552483840 | elapsed time per iteration (s): 0.08 | learning rate: 7.166E-05 | global batch size: 256 | lm loss: 4.515092E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.392 | TFLOPs: 11.67 | 7: iteration 111690/ 173500 | consumed samples: 28592640 | consumed tokens: 58557726720 | elapsed time per iteration (s): 0.08 | learning rate: 7.164E-05 | global batch size: 256 | lm loss: 4.508844E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.529 | TFLOPs: 11.93 | 7: iteration 111700/ 173500 | consumed samples: 28595200 | consumed tokens: 58562969600 | elapsed time per iteration (s): 0.10 | learning rate: 7.163E-05 | global batch size: 256 | lm loss: 4.507185E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2662.959 | TFLOPs: 9.91 | 7: iteration 111710/ 173500 | consumed samples: 28597760 | consumed tokens: 58568212480 | elapsed time per iteration (s): 0.08 | learning rate: 7.161E-05 | global batch size: 256 | lm loss: 4.521119E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.592 | TFLOPs: 12.08 | 7: iteration 111720/ 173500 | consumed samples: 28600320 | consumed tokens: 58573455360 | elapsed time per iteration (s): 0.09 | learning rate: 7.160E-05 | global batch size: 256 | lm loss: 4.513592E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.492 | TFLOPs: 10.78 | 7: iteration 111730/ 173500 | consumed samples: 28602880 | consumed tokens: 58578698240 | elapsed time per iteration (s): 0.09 | learning rate: 7.158E-05 | global batch size: 256 | lm loss: 4.519884E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2709.030 | TFLOPs: 10.08 | 7: iteration 111740/ 173500 | consumed samples: 28605440 | consumed tokens: 58583941120 | elapsed time per iteration (s): 0.08 | learning rate: 7.157E-05 | global batch size: 256 | lm loss: 4.503106E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.725 | TFLOPs: 12.07 | 7: iteration 111750/ 173500 | consumed samples: 28608000 | consumed tokens: 58589184000 | elapsed time per iteration (s): 0.09 | learning rate: 7.155E-05 | global batch size: 256 | lm loss: 4.516960E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.960 | TFLOPs: 10.88 | 7: iteration 111760/ 173500 | consumed samples: 28610560 | consumed tokens: 58594426880 | elapsed time per iteration (s): 0.08 | learning rate: 7.154E-05 | global batch size: 256 | lm loss: 4.511779E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.091 | TFLOPs: 11.59 | 7: iteration 111770/ 173500 | consumed samples: 28613120 | consumed tokens: 58599669760 | elapsed time per iteration (s): 0.09 | learning rate: 7.152E-05 | global batch size: 256 | lm loss: 4.516409E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2982.015 | TFLOPs: 11.09 | 7: iteration 111780/ 173500 | consumed samples: 28615680 | consumed tokens: 58604912640 | elapsed time per iteration (s): 0.09 | learning rate: 7.151E-05 | global batch size: 256 | lm loss: 4.520237E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2775.315 | TFLOPs: 10.32 | 7: iteration 111790/ 173500 | consumed samples: 28618240 | consumed tokens: 58610155520 | elapsed time per iteration (s): 0.08 | learning rate: 7.149E-05 | global batch size: 256 | lm loss: 4.514423E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.200 | TFLOPs: 11.23 | 7: iteration 111800/ 173500 | consumed samples: 28620800 | consumed tokens: 58615398400 | elapsed time per iteration (s): 0.08 | learning rate: 7.148E-05 | global batch size: 256 | lm loss: 4.506692E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.709 | TFLOPs: 11.51 | 7: iteration 111810/ 173500 | consumed samples: 28623360 | consumed tokens: 58620641280 | elapsed time per iteration (s): 0.08 | learning rate: 7.146E-05 | global batch size: 256 | lm loss: 4.521908E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.393 | TFLOPs: 11.58 | 7: iteration 111820/ 173500 | consumed samples: 28625920 | consumed tokens: 58625884160 | elapsed time per iteration (s): 0.08 | learning rate: 7.145E-05 | global batch size: 256 | lm loss: 4.527424E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.685 | TFLOPs: 11.54 | 7: iteration 111830/ 173500 | consumed samples: 28628480 | consumed tokens: 58631127040 | elapsed time per iteration (s): 0.08 | learning rate: 7.143E-05 | global batch size: 256 | lm loss: 4.514557E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.708 | TFLOPs: 11.73 | 7: iteration 111840/ 173500 | consumed samples: 28631040 | consumed tokens: 58636369920 | elapsed time per iteration (s): 0.09 | learning rate: 7.142E-05 | global batch size: 256 | lm loss: 4.515721E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.380 | TFLOPs: 10.14 | 7: iteration 111850/ 173500 | consumed samples: 28633600 | consumed tokens: 58641612800 | elapsed time per iteration (s): 0.09 | learning rate: 7.140E-05 | global batch size: 256 | lm loss: 4.523634E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.029 | TFLOPs: 10.51 | 7: iteration 111860/ 173500 | consumed samples: 28636160 | consumed tokens: 58646855680 | elapsed time per iteration (s): 0.09 | learning rate: 7.139E-05 | global batch size: 256 | lm loss: 4.515684E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.225 | TFLOPs: 10.42 | 7: iteration 111870/ 173500 | consumed samples: 28638720 | consumed tokens: 58652098560 | elapsed time per iteration (s): 0.09 | learning rate: 7.137E-05 | global batch size: 256 | lm loss: 4.528415E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.598 | TFLOPs: 11.00 | 7: iteration 111880/ 173500 | consumed samples: 28641280 | consumed tokens: 58657341440 | elapsed time per iteration (s): 0.08 | learning rate: 7.136E-05 | global batch size: 256 | lm loss: 4.521054E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3259.317 | TFLOPs: 12.12 | 7: iteration 111890/ 173500 | consumed samples: 28643840 | consumed tokens: 58662584320 | elapsed time per iteration (s): 0.08 | learning rate: 7.135E-05 | global batch size: 256 | lm loss: 4.524004E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3251.977 | TFLOPs: 12.10 | 7: iteration 111900/ 173500 | consumed samples: 28646400 | consumed tokens: 58667827200 | elapsed time per iteration (s): 0.08 | learning rate: 7.133E-05 | global batch size: 256 | lm loss: 4.524537E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3256.219 | TFLOPs: 12.11 | 7: iteration 111910/ 173500 | consumed samples: 28648960 | consumed tokens: 58673070080 | elapsed time per iteration (s): 0.09 | learning rate: 7.132E-05 | global batch size: 256 | lm loss: 4.518555E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2846.165 | TFLOPs: 10.59 | 7: iteration 111920/ 173500 | consumed samples: 28651520 | consumed tokens: 58678312960 | elapsed time per iteration (s): 0.09 | learning rate: 7.130E-05 | global batch size: 256 | lm loss: 4.519680E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.487 | TFLOPs: 10.54 | 7: iteration 111930/ 173500 | consumed samples: 28654080 | consumed tokens: 58683555840 | elapsed time per iteration (s): 0.09 | learning rate: 7.129E-05 | global batch size: 256 | lm loss: 4.512060E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.571 | TFLOPs: 10.13 | 7: iteration 111940/ 173500 | consumed samples: 28656640 | consumed tokens: 58688798720 | elapsed time per iteration (s): 0.09 | learning rate: 7.127E-05 | global batch size: 256 | lm loss: 4.502197E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.171 | TFLOPs: 10.08 | 7: iteration 111950/ 173500 | consumed samples: 28659200 | consumed tokens: 58694041600 | elapsed time per iteration (s): 0.08 | learning rate: 7.126E-05 | global batch size: 256 | lm loss: 4.517327E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.071 | TFLOPs: 11.64 | 7: iteration 111960/ 173500 | consumed samples: 28661760 | consumed tokens: 58699284480 | elapsed time per iteration (s): 0.09 | learning rate: 7.124E-05 | global batch size: 256 | lm loss: 4.515540E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.355 | TFLOPs: 10.70 | 7: iteration 111970/ 173500 | consumed samples: 28664320 | consumed tokens: 58704527360 | elapsed time per iteration (s): 0.08 | learning rate: 7.123E-05 | global batch size: 256 | lm loss: 4.504294E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.771 | TFLOPs: 11.91 | 7: iteration 111980/ 173500 | consumed samples: 28666880 | consumed tokens: 58709770240 | elapsed time per iteration (s): 0.08 | learning rate: 7.121E-05 | global batch size: 256 | lm loss: 4.507817E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.404 | TFLOPs: 11.50 | 7: iteration 111990/ 173500 | consumed samples: 28669440 | consumed tokens: 58715013120 | elapsed time per iteration (s): 0.08 | learning rate: 7.120E-05 | global batch size: 256 | lm loss: 4.501444E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.960 | TFLOPs: 11.92 | 0: [2023-03-17 02:57:35,014] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=0, lr=[7.118156405567987e-05, 7.118156405567987e-05, 7.118156405567987e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 112000/ 173500 | consumed samples: 28672000 | consumed tokens: 58720256000 | elapsed time per iteration (s): 0.08 | learning rate: 7.118E-05 | global batch size: 256 | lm loss: 4.525726E+00 | grad norm: 0.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.230 | TFLOPs: 11.91 | 0: steps: 112000 loss: 4.4998 iter time (s): 0.092 samples/sec: 2773.760 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 112000 | lm loss value: 4.410427E+00 | lm loss PPL: 8.230457E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 112000 to checkpoints_14m91b100m 0: [2023-03-17 02:57:35,072] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step112000 is begin to save! 0: [2023-03-17 02:57:35,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:57:35,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:57:35,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:57:35,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:57:35,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:57:35,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:57:35,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:57:35,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:57:35,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:57:35,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:57:35,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:57:35,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:57:35,114] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step112000/mp_rank_00_model_states.pt 0: [2023-03-17 02:57:35,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:57:35,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:57:35,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:57:35,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:57:35,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:57:35,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:57:35,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:57:35,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 02:57:35,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: successfully saved checkpoint at iteration 112000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.81 7: iteration 112010/ 173500 | consumed samples: 28674560 | consumed tokens: 58725498880 | elapsed time per iteration (s): 0.09 | learning rate: 7.117E-05 | global batch size: 256 | lm loss: 4.528903E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2755.973 | TFLOPs: 10.25 | 7: iteration 112020/ 173500 | consumed samples: 28677120 | consumed tokens: 58730741760 | elapsed time per iteration (s): 0.08 | learning rate: 7.115E-05 | global batch size: 256 | lm loss: 4.517751E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.919 | TFLOPs: 12.00 | 7: iteration 112030/ 173500 | consumed samples: 28679680 | consumed tokens: 58735984640 | elapsed time per iteration (s): 0.08 | learning rate: 7.114E-05 | global batch size: 256 | lm loss: 4.508529E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.547 | TFLOPs: 11.44 | 7: iteration 112040/ 173500 | consumed samples: 28682240 | consumed tokens: 58741227520 | elapsed time per iteration (s): 0.08 | learning rate: 7.112E-05 | global batch size: 256 | lm loss: 4.513010E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.064 | TFLOPs: 11.88 | 7: iteration 112050/ 173500 | consumed samples: 28684800 | consumed tokens: 58746470400 | elapsed time per iteration (s): 0.08 | learning rate: 7.111E-05 | global batch size: 256 | lm loss: 4.506310E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.892 | TFLOPs: 11.93 | 7: iteration 112060/ 173500 | consumed samples: 28687360 | consumed tokens: 58751713280 | elapsed time per iteration (s): 0.09 | learning rate: 7.109E-05 | global batch size: 256 | lm loss: 4.513215E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.311 | TFLOPs: 10.30 | 7: iteration 112070/ 173500 | consumed samples: 28689920 | consumed tokens: 58756956160 | elapsed time per iteration (s): 0.10 | learning rate: 7.108E-05 | global batch size: 256 | lm loss: 4.517455E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.630 | TFLOPs: 9.26 | 7: iteration 112080/ 173500 | consumed samples: 28692480 | consumed tokens: 58762199040 | elapsed time per iteration (s): 0.08 | learning rate: 7.106E-05 | global batch size: 256 | lm loss: 4.509947E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.009 | TFLOPs: 11.68 | 7: iteration 112090/ 173500 | consumed samples: 28695040 | consumed tokens: 58767441920 | elapsed time per iteration (s): 0.08 | learning rate: 7.105E-05 | global batch size: 256 | lm loss: 4.511844E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.942 | TFLOPs: 12.06 | 7: iteration 112100/ 173500 | consumed samples: 28697600 | consumed tokens: 58772684800 | elapsed time per iteration (s): 0.08 | learning rate: 7.103E-05 | global batch size: 256 | lm loss: 4.515062E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.930 | TFLOPs: 11.37 | 7: iteration 112110/ 173500 | consumed samples: 28700160 | consumed tokens: 58777927680 | elapsed time per iteration (s): 0.08 | learning rate: 7.102E-05 | global batch size: 256 | lm loss: 4.520034E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.001 | TFLOPs: 12.03 | 7: iteration 112120/ 173500 | consumed samples: 28702720 | consumed tokens: 58783170560 | elapsed time per iteration (s): 0.09 | learning rate: 7.100E-05 | global batch size: 256 | lm loss: 4.502680E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.280 | TFLOPs: 10.54 | 7: iteration 112130/ 173500 | consumed samples: 28705280 | consumed tokens: 58788413440 | elapsed time per iteration (s): 0.08 | learning rate: 7.099E-05 | global batch size: 256 | lm loss: 4.512985E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.353 | TFLOPs: 11.70 | 7: iteration 112140/ 173500 | consumed samples: 28707840 | consumed tokens: 58793656320 | elapsed time per iteration (s): 0.08 | learning rate: 7.097E-05 | global batch size: 256 | lm loss: 4.515512E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.556 | TFLOPs: 11.53 | 7: iteration 112150/ 173500 | consumed samples: 28710400 | consumed tokens: 58798899200 | elapsed time per iteration (s): 0.09 | learning rate: 7.096E-05 | global batch size: 256 | lm loss: 4.515950E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.281 | TFLOPs: 10.60 | 7: iteration 112160/ 173500 | consumed samples: 28712960 | consumed tokens: 58804142080 | elapsed time per iteration (s): 0.08 | learning rate: 7.094E-05 | global batch size: 256 | lm loss: 4.517988E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.954 | TFLOPs: 11.31 | 7: iteration 112170/ 173500 | consumed samples: 28715520 | consumed tokens: 58809384960 | elapsed time per iteration (s): 0.08 | learning rate: 7.093E-05 | global batch size: 256 | lm loss: 4.511774E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.031 | TFLOPs: 12.05 | 7: iteration 112180/ 173500 | consumed samples: 28718080 | consumed tokens: 58814627840 | elapsed time per iteration (s): 0.09 | learning rate: 7.091E-05 | global batch size: 256 | lm loss: 4.519615E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.677 | TFLOPs: 11.05 | 7: iteration 112190/ 173500 | consumed samples: 28720640 | consumed tokens: 58819870720 | elapsed time per iteration (s): 0.08 | learning rate: 7.090E-05 | global batch size: 256 | lm loss: 4.508796E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.047 | TFLOPs: 11.97 | 7: iteration 112200/ 173500 | consumed samples: 28723200 | consumed tokens: 58825113600 | elapsed time per iteration (s): 0.08 | learning rate: 7.088E-05 | global batch size: 256 | lm loss: 4.514027E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.544 | TFLOPs: 11.95 | 7: iteration 112210/ 173500 | consumed samples: 28725760 | consumed tokens: 58830356480 | elapsed time per iteration (s): 0.08 | learning rate: 7.087E-05 | global batch size: 256 | lm loss: 4.517848E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.093 | TFLOPs: 11.42 | 7: iteration 112220/ 173500 | consumed samples: 28728320 | consumed tokens: 58835599360 | elapsed time per iteration (s): 0.08 | learning rate: 7.086E-05 | global batch size: 256 | lm loss: 4.511373E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.428 | TFLOPs: 11.35 | 7: iteration 112230/ 173500 | consumed samples: 28730880 | consumed tokens: 58840842240 | elapsed time per iteration (s): 0.08 | learning rate: 7.084E-05 | global batch size: 256 | lm loss: 4.517606E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.496 | TFLOPs: 11.97 | 7: iteration 112240/ 173500 | consumed samples: 28733440 | consumed tokens: 58846085120 | elapsed time per iteration (s): 0.08 | learning rate: 7.083E-05 | global batch size: 256 | lm loss: 4.516715E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.709 | TFLOPs: 11.99 | 7: iteration 112250/ 173500 | consumed samples: 28736000 | consumed tokens: 58851328000 | elapsed time per iteration (s): 0.08 | learning rate: 7.081E-05 | global batch size: 256 | lm loss: 4.524861E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.576 | TFLOPs: 11.30 | 7: iteration 112260/ 173500 | consumed samples: 28738560 | consumed tokens: 58856570880 | elapsed time per iteration (s): 0.08 | learning rate: 7.080E-05 | global batch size: 256 | lm loss: 4.517494E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.244 | TFLOPs: 12.04 | 7: iteration 112270/ 173500 | consumed samples: 28741120 | consumed tokens: 58861813760 | elapsed time per iteration (s): 0.08 | learning rate: 7.078E-05 | global batch size: 256 | lm loss: 4.521140E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.959 | TFLOPs: 11.34 | 7: iteration 112280/ 173500 | consumed samples: 28743680 | consumed tokens: 58867056640 | elapsed time per iteration (s): 0.08 | learning rate: 7.077E-05 | global batch size: 256 | lm loss: 4.528435E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.899 | TFLOPs: 11.71 | 7: iteration 112290/ 173500 | consumed samples: 28746240 | consumed tokens: 58872299520 | elapsed time per iteration (s): 0.09 | learning rate: 7.075E-05 | global batch size: 256 | lm loss: 4.506819E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.911 | TFLOPs: 11.14 | 7: iteration 112300/ 173500 | consumed samples: 28748800 | consumed tokens: 58877542400 | elapsed time per iteration (s): 0.08 | learning rate: 7.074E-05 | global batch size: 256 | lm loss: 4.517855E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.373 | TFLOPs: 11.50 | 7: iteration 112310/ 173500 | consumed samples: 28751360 | consumed tokens: 58882785280 | elapsed time per iteration (s): 0.08 | learning rate: 7.072E-05 | global batch size: 256 | lm loss: 4.527443E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.030 | TFLOPs: 11.49 | 7: iteration 112320/ 173500 | consumed samples: 28753920 | consumed tokens: 58888028160 | elapsed time per iteration (s): 0.08 | learning rate: 7.071E-05 | global batch size: 256 | lm loss: 4.504537E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.494 | TFLOPs: 11.42 | 7: iteration 112330/ 173500 | consumed samples: 28756480 | consumed tokens: 58893271040 | elapsed time per iteration (s): 0.08 | learning rate: 7.069E-05 | global batch size: 256 | lm loss: 4.507043E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.404 | TFLOPs: 12.03 | 7: iteration 112340/ 173500 | consumed samples: 28759040 | consumed tokens: 58898513920 | elapsed time per iteration (s): 0.10 | learning rate: 7.068E-05 | global batch size: 256 | lm loss: 4.524366E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2502.219 | TFLOPs: 9.31 | 7: iteration 112350/ 173500 | consumed samples: 28761600 | consumed tokens: 58903756800 | elapsed time per iteration (s): 0.08 | learning rate: 7.066E-05 | global batch size: 256 | lm loss: 4.521702E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.837 | TFLOPs: 11.41 | 7: iteration 112360/ 173500 | consumed samples: 28764160 | consumed tokens: 58908999680 | elapsed time per iteration (s): 0.08 | learning rate: 7.065E-05 | global batch size: 256 | lm loss: 4.508573E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.792 | TFLOPs: 12.07 | 7: iteration 112370/ 173500 | consumed samples: 28766720 | consumed tokens: 58914242560 | elapsed time per iteration (s): 0.08 | learning rate: 7.063E-05 | global batch size: 256 | lm loss: 4.524197E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.811 | TFLOPs: 11.63 | 7: iteration 112380/ 173500 | consumed samples: 28769280 | consumed tokens: 58919485440 | elapsed time per iteration (s): 0.08 | learning rate: 7.062E-05 | global batch size: 256 | lm loss: 4.519501E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.691 | TFLOPs: 12.01 | 7: iteration 112390/ 173500 | consumed samples: 28771840 | consumed tokens: 58924728320 | elapsed time per iteration (s): 0.08 | learning rate: 7.060E-05 | global batch size: 256 | lm loss: 4.511277E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.099 | TFLOPs: 12.04 | 7: iteration 112400/ 173500 | consumed samples: 28774400 | consumed tokens: 58929971200 | elapsed time per iteration (s): 0.09 | learning rate: 7.059E-05 | global batch size: 256 | lm loss: 4.527248E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.567 | TFLOPs: 11.06 | 7: iteration 112410/ 173500 | consumed samples: 28776960 | consumed tokens: 58935214080 | elapsed time per iteration (s): 0.10 | learning rate: 7.057E-05 | global batch size: 256 | lm loss: 4.510725E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2515.970 | TFLOPs: 9.36 | 7: iteration 112420/ 173500 | consumed samples: 28779520 | consumed tokens: 58940456960 | elapsed time per iteration (s): 0.09 | learning rate: 7.056E-05 | global batch size: 256 | lm loss: 4.506702E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2709.091 | TFLOPs: 10.08 | 7: iteration 112430/ 173500 | consumed samples: 28782080 | consumed tokens: 58945699840 | elapsed time per iteration (s): 0.10 | learning rate: 7.054E-05 | global batch size: 256 | lm loss: 4.513494E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.314 | TFLOPs: 9.24 | 7: iteration 112440/ 173500 | consumed samples: 28784640 | consumed tokens: 58950942720 | elapsed time per iteration (s): 0.09 | learning rate: 7.053E-05 | global batch size: 256 | lm loss: 4.512759E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2742.315 | TFLOPs: 10.20 | 7: iteration 112450/ 173500 | consumed samples: 28787200 | consumed tokens: 58956185600 | elapsed time per iteration (s): 0.08 | learning rate: 7.051E-05 | global batch size: 256 | lm loss: 4.505071E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.908 | TFLOPs: 12.00 | 7: iteration 112460/ 173500 | consumed samples: 28789760 | consumed tokens: 58961428480 | elapsed time per iteration (s): 0.09 | learning rate: 7.050E-05 | global batch size: 256 | lm loss: 4.512283E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2880.771 | TFLOPs: 10.72 | 7: iteration 112470/ 173500 | consumed samples: 28792320 | consumed tokens: 58966671360 | elapsed time per iteration (s): 0.09 | learning rate: 7.049E-05 | global batch size: 256 | lm loss: 4.523104E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.709 | TFLOPs: 10.83 | 7: iteration 112480/ 173500 | consumed samples: 28794880 | consumed tokens: 58971914240 | elapsed time per iteration (s): 0.08 | learning rate: 7.047E-05 | global batch size: 256 | lm loss: 4.509416E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.377 | TFLOPs: 11.76 | 7: iteration 112490/ 173500 | consumed samples: 28797440 | consumed tokens: 58977157120 | elapsed time per iteration (s): 0.08 | learning rate: 7.046E-05 | global batch size: 256 | lm loss: 4.527770E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.191 | TFLOPs: 11.72 | 7: iteration 112500/ 173500 | consumed samples: 28800000 | consumed tokens: 58982400000 | elapsed time per iteration (s): 0.09 | learning rate: 7.044E-05 | global batch size: 256 | lm loss: 4.518323E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2732.297 | TFLOPs: 10.16 | 7: iteration 112510/ 173500 | consumed samples: 28802560 | consumed tokens: 58987642880 | elapsed time per iteration (s): 0.08 | learning rate: 7.043E-05 | global batch size: 256 | lm loss: 4.508272E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.699 | TFLOPs: 12.08 | 7: iteration 112520/ 173500 | consumed samples: 28805120 | consumed tokens: 58992885760 | elapsed time per iteration (s): 0.08 | learning rate: 7.041E-05 | global batch size: 256 | lm loss: 4.527639E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.685 | TFLOPs: 12.08 | 7: iteration 112530/ 173500 | consumed samples: 28807680 | consumed tokens: 58998128640 | elapsed time per iteration (s): 0.08 | learning rate: 7.040E-05 | global batch size: 256 | lm loss: 4.512913E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.375 | TFLOPs: 11.51 | 7: iteration 112540/ 173500 | consumed samples: 28810240 | consumed tokens: 59003371520 | elapsed time per iteration (s): 0.08 | learning rate: 7.038E-05 | global batch size: 256 | lm loss: 4.500842E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.808 | TFLOPs: 12.04 | 7: iteration 112550/ 173500 | consumed samples: 28812800 | consumed tokens: 59008614400 | elapsed time per iteration (s): 0.08 | learning rate: 7.037E-05 | global batch size: 256 | lm loss: 4.524707E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.393 | TFLOPs: 11.50 | 7: iteration 112560/ 173500 | consumed samples: 28815360 | consumed tokens: 59013857280 | elapsed time per iteration (s): 0.08 | learning rate: 7.035E-05 | global batch size: 256 | lm loss: 4.517537E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.444 | TFLOPs: 11.75 | 7: iteration 112570/ 173500 | consumed samples: 28817920 | consumed tokens: 59019100160 | elapsed time per iteration (s): 0.09 | learning rate: 7.034E-05 | global batch size: 256 | lm loss: 4.507580E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.773 | TFLOPs: 10.27 | 7: iteration 112580/ 173500 | consumed samples: 28820480 | consumed tokens: 59024343040 | elapsed time per iteration (s): 0.08 | learning rate: 7.032E-05 | global batch size: 256 | lm loss: 4.516650E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.743 | TFLOPs: 12.04 | 7: iteration 112590/ 173500 | consumed samples: 28823040 | consumed tokens: 59029585920 | elapsed time per iteration (s): 0.10 | learning rate: 7.031E-05 | global batch size: 256 | lm loss: 4.514140E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2565.872 | TFLOPs: 9.54 | 7: iteration 112600/ 173500 | consumed samples: 28825600 | consumed tokens: 59034828800 | elapsed time per iteration (s): 0.08 | learning rate: 7.029E-05 | global batch size: 256 | lm loss: 4.517535E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.454 | TFLOPs: 11.35 | 7: iteration 112610/ 173500 | consumed samples: 28828160 | consumed tokens: 59040071680 | elapsed time per iteration (s): 0.09 | learning rate: 7.028E-05 | global batch size: 256 | lm loss: 4.522158E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.202 | TFLOPs: 10.25 | 7: iteration 112620/ 173500 | consumed samples: 28830720 | consumed tokens: 59045314560 | elapsed time per iteration (s): 0.10 | learning rate: 7.026E-05 | global batch size: 256 | lm loss: 4.529255E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.275 | TFLOPs: 9.56 | 7: iteration 112630/ 173500 | consumed samples: 28833280 | consumed tokens: 59050557440 | elapsed time per iteration (s): 0.10 | learning rate: 7.025E-05 | global batch size: 256 | lm loss: 4.512269E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.595 | TFLOPs: 9.41 | 7: iteration 112640/ 173500 | consumed samples: 28835840 | consumed tokens: 59055800320 | elapsed time per iteration (s): 0.09 | learning rate: 7.023E-05 | global batch size: 256 | lm loss: 4.514956E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.005 | TFLOPs: 10.17 | 7: iteration 112650/ 173500 | consumed samples: 28838400 | consumed tokens: 59061043200 | elapsed time per iteration (s): 0.09 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 4.507948E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2911.461 | TFLOPs: 10.83 | 7: iteration 112660/ 173500 | consumed samples: 28840960 | consumed tokens: 59066286080 | elapsed time per iteration (s): 0.08 | learning rate: 7.020E-05 | global batch size: 256 | lm loss: 4.518499E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.823 | TFLOPs: 11.40 | 7: iteration 112670/ 173500 | consumed samples: 28843520 | consumed tokens: 59071528960 | elapsed time per iteration (s): 0.09 | learning rate: 7.019E-05 | global batch size: 256 | lm loss: 4.510720E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.178 | TFLOPs: 10.27 | 7: iteration 112680/ 173500 | consumed samples: 28846080 | consumed tokens: 59076771840 | elapsed time per iteration (s): 0.08 | learning rate: 7.017E-05 | global batch size: 256 | lm loss: 4.516448E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.093 | TFLOPs: 12.05 | 7: iteration 112690/ 173500 | consumed samples: 28848640 | consumed tokens: 59082014720 | elapsed time per iteration (s): 0.08 | learning rate: 7.016E-05 | global batch size: 256 | lm loss: 4.525047E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.454 | TFLOPs: 11.21 | 7: iteration 112700/ 173500 | consumed samples: 28851200 | consumed tokens: 59087257600 | elapsed time per iteration (s): 0.08 | learning rate: 7.015E-05 | global batch size: 256 | lm loss: 4.518205E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.732 | TFLOPs: 12.00 | 7: iteration 112710/ 173500 | consumed samples: 28853760 | consumed tokens: 59092500480 | elapsed time per iteration (s): 0.08 | learning rate: 7.013E-05 | global batch size: 256 | lm loss: 4.500912E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.840 | TFLOPs: 11.51 | 7: iteration 112720/ 173500 | consumed samples: 28856320 | consumed tokens: 59097743360 | elapsed time per iteration (s): 0.08 | learning rate: 7.012E-05 | global batch size: 256 | lm loss: 4.503135E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.467 | TFLOPs: 11.47 | 7: iteration 112730/ 173500 | consumed samples: 28858880 | consumed tokens: 59102986240 | elapsed time per iteration (s): 0.08 | learning rate: 7.010E-05 | global batch size: 256 | lm loss: 4.519530E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.233 | TFLOPs: 12.02 | 7: iteration 112740/ 173500 | consumed samples: 28861440 | consumed tokens: 59108229120 | elapsed time per iteration (s): 0.10 | learning rate: 7.009E-05 | global batch size: 256 | lm loss: 4.516563E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.676 | TFLOPs: 9.42 | 7: iteration 112750/ 173500 | consumed samples: 28864000 | consumed tokens: 59113472000 | elapsed time per iteration (s): 0.10 | learning rate: 7.007E-05 | global batch size: 256 | lm loss: 4.524900E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2660.815 | TFLOPs: 9.90 | 7: iteration 112760/ 173500 | consumed samples: 28866560 | consumed tokens: 59118714880 | elapsed time per iteration (s): 0.08 | learning rate: 7.006E-05 | global batch size: 256 | lm loss: 4.522318E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.349 | TFLOPs: 11.97 | 7: iteration 112770/ 173500 | consumed samples: 28869120 | consumed tokens: 59123957760 | elapsed time per iteration (s): 0.08 | learning rate: 7.004E-05 | global batch size: 256 | lm loss: 4.516520E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.817 | TFLOPs: 12.02 | 7: iteration 112780/ 173500 | consumed samples: 28871680 | consumed tokens: 59129200640 | elapsed time per iteration (s): 0.08 | learning rate: 7.003E-05 | global batch size: 256 | lm loss: 4.519442E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.190 | TFLOPs: 12.02 | 7: iteration 112790/ 173500 | consumed samples: 28874240 | consumed tokens: 59134443520 | elapsed time per iteration (s): 0.10 | learning rate: 7.001E-05 | global batch size: 256 | lm loss: 4.514953E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2616.160 | TFLOPs: 9.73 | 7: iteration 112800/ 173500 | consumed samples: 28876800 | consumed tokens: 59139686400 | elapsed time per iteration (s): 0.09 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 4.519537E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.739 | TFLOPs: 10.54 | 7: iteration 112810/ 173500 | consumed samples: 28879360 | consumed tokens: 59144929280 | elapsed time per iteration (s): 0.10 | learning rate: 6.998E-05 | global batch size: 256 | lm loss: 4.514849E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.942 | TFLOPs: 9.22 | 7: iteration 112820/ 173500 | consumed samples: 28881920 | consumed tokens: 59150172160 | elapsed time per iteration (s): 0.09 | learning rate: 6.997E-05 | global batch size: 256 | lm loss: 4.511623E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.309 | TFLOPs: 10.71 | 7: iteration 112830/ 173500 | consumed samples: 28884480 | consumed tokens: 59155415040 | elapsed time per iteration (s): 0.08 | learning rate: 6.995E-05 | global batch size: 256 | lm loss: 4.515976E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.792 | TFLOPs: 11.79 | 7: iteration 112840/ 173500 | consumed samples: 28887040 | consumed tokens: 59160657920 | elapsed time per iteration (s): 0.09 | learning rate: 6.994E-05 | global batch size: 256 | lm loss: 4.513843E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2886.474 | TFLOPs: 10.74 | 7: iteration 112850/ 173500 | consumed samples: 28889600 | consumed tokens: 59165900800 | elapsed time per iteration (s): 0.09 | learning rate: 6.992E-05 | global batch size: 256 | lm loss: 4.531323E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2806.255 | TFLOPs: 10.44 | 7: iteration 112860/ 173500 | consumed samples: 28892160 | consumed tokens: 59171143680 | elapsed time per iteration (s): 0.09 | learning rate: 6.991E-05 | global batch size: 256 | lm loss: 4.518667E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2827.165 | TFLOPs: 10.52 | 7: iteration 112870/ 173500 | consumed samples: 28894720 | consumed tokens: 59176386560 | elapsed time per iteration (s): 0.10 | learning rate: 6.989E-05 | global batch size: 256 | lm loss: 4.518732E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.944 | TFLOPs: 9.68 | 7: iteration 112880/ 173500 | consumed samples: 28897280 | consumed tokens: 59181629440 | elapsed time per iteration (s): 0.11 | learning rate: 6.988E-05 | global batch size: 256 | lm loss: 4.503188E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2400.542 | TFLOPs: 8.93 | 7: iteration 112890/ 173500 | consumed samples: 28899840 | consumed tokens: 59186872320 | elapsed time per iteration (s): 0.09 | learning rate: 6.987E-05 | global batch size: 256 | lm loss: 4.507521E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2900.268 | TFLOPs: 10.79 | 7: iteration 112900/ 173500 | consumed samples: 28902400 | consumed tokens: 59192115200 | elapsed time per iteration (s): 0.08 | learning rate: 6.985E-05 | global batch size: 256 | lm loss: 4.517501E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.257 | TFLOPs: 11.89 | 7: iteration 112910/ 173500 | consumed samples: 28904960 | consumed tokens: 59197358080 | elapsed time per iteration (s): 0.08 | learning rate: 6.984E-05 | global batch size: 256 | lm loss: 4.521715E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.406 | TFLOPs: 11.87 | 7: iteration 112920/ 173500 | consumed samples: 28907520 | consumed tokens: 59202600960 | elapsed time per iteration (s): 0.08 | learning rate: 6.982E-05 | global batch size: 256 | lm loss: 4.518742E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.783 | TFLOPs: 11.91 | 7: iteration 112930/ 173500 | consumed samples: 28910080 | consumed tokens: 59207843840 | elapsed time per iteration (s): 0.10 | learning rate: 6.981E-05 | global batch size: 256 | lm loss: 4.507780E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.258 | TFLOPs: 9.97 | 7: iteration 112940/ 173500 | consumed samples: 28912640 | consumed tokens: 59213086720 | elapsed time per iteration (s): 0.09 | learning rate: 6.979E-05 | global batch size: 256 | lm loss: 4.521862E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.325 | TFLOPs: 11.06 | 7: iteration 112950/ 173500 | consumed samples: 28915200 | consumed tokens: 59218329600 | elapsed time per iteration (s): 0.08 | learning rate: 6.978E-05 | global batch size: 256 | lm loss: 4.509124E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.176 | TFLOPs: 11.91 | 7: iteration 112960/ 173500 | consumed samples: 28917760 | consumed tokens: 59223572480 | elapsed time per iteration (s): 0.08 | learning rate: 6.976E-05 | global batch size: 256 | lm loss: 4.510299E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.810 | TFLOPs: 11.92 | 7: iteration 112970/ 173500 | consumed samples: 28920320 | consumed tokens: 59228815360 | elapsed time per iteration (s): 0.08 | learning rate: 6.975E-05 | global batch size: 256 | lm loss: 4.525882E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.994 | TFLOPs: 11.93 | 7: iteration 112980/ 173500 | consumed samples: 28922880 | consumed tokens: 59234058240 | elapsed time per iteration (s): 0.09 | learning rate: 6.973E-05 | global batch size: 256 | lm loss: 4.506730E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.500 | TFLOPs: 10.93 | 7: iteration 112990/ 173500 | consumed samples: 28925440 | consumed tokens: 59239301120 | elapsed time per iteration (s): 0.08 | learning rate: 6.972E-05 | global batch size: 256 | lm loss: 4.509836E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.125 | TFLOPs: 11.85 | 7: iteration 113000/ 173500 | consumed samples: 28928000 | consumed tokens: 59244544000 | elapsed time per iteration (s): 0.08 | learning rate: 6.970E-05 | global batch size: 256 | lm loss: 4.516870E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.022 | TFLOPs: 11.84 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 113000 | lm loss value: 4.390903E+00 | lm loss PPL: 8.071327E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 113000 to checkpoints_14m91b100m 0: [2023-03-17 02:59:01,048] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step113000 is begin to save! 0: [2023-03-17 02:59:01,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_01-model_00-model_states.pt... 0: [2023-03-17 02:59:01,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_01-model_00-model_states.pt. 0: [2023-03-17 02:59:01,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_03-model_00-model_states.pt... 0: [2023-03-17 02:59:01,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_03-model_00-model_states.pt. 0: [2023-03-17 02:59:01,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_04-model_00-model_states.pt... 0: [2023-03-17 02:59:01,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_04-model_00-model_states.pt. 0: [2023-03-17 02:59:01,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_05-model_00-model_states.pt... 0: [2023-03-17 02:59:01,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_05-model_00-model_states.pt. 0: [2023-03-17 02:59:01,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_06-model_00-model_states.pt... 0: [2023-03-17 02:59:01,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_06-model_00-model_states.pt. 0: [2023-03-17 02:59:01,090] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/layer_08-model_00-model_states.pt... 0: [2023-03-17 02:59:01,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/layer_08-model_00-model_states.pt. 0: [2023-03-17 02:59:01,091] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step113000/mp_rank_00_model_states.pt 0: [2023-03-17 02:59:01,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/mp_rank_00_model_states.pt... 0: [2023-03-17 02:59:01,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/mp_rank_00_model_states.pt. 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 02:59:01,110] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 0: [2023-03-17 02:59:01,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,122] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2023-03-17 02:59:01,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 02:59:01,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 02:59:01,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2023-03-17 02:59:01,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2023-03-17 02:59:01,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2023-03-17 02:59:01,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 02:59:01,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 02:59:01,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: successfully saved checkpoint at iteration 113000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.38 7: iteration 113010/ 173500 | consumed samples: 28930560 | consumed tokens: 59249786880 | elapsed time per iteration (s): 0.09 | learning rate: 6.969E-05 | global batch size: 256 | lm loss: 4.508960E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.479 | TFLOPs: 10.21 | 7: iteration 113020/ 173500 | consumed samples: 28933120 | consumed tokens: 59255029760 | elapsed time per iteration (s): 0.10 | learning rate: 6.967E-05 | global batch size: 256 | lm loss: 4.509713E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2669.573 | TFLOPs: 9.93 | 7: iteration 113030/ 173500 | consumed samples: 28935680 | consumed tokens: 59260272640 | elapsed time per iteration (s): 0.08 | learning rate: 6.966E-05 | global batch size: 256 | lm loss: 4.501990E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.186 | TFLOPs: 11.70 | 7: iteration 113040/ 173500 | consumed samples: 28938240 | consumed tokens: 59265515520 | elapsed time per iteration (s): 0.08 | learning rate: 6.964E-05 | global batch size: 256 | lm loss: 4.525831E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.946 | TFLOPs: 11.43 | 7: iteration 113050/ 173500 | consumed samples: 28940800 | consumed tokens: 59270758400 | elapsed time per iteration (s): 0.08 | learning rate: 6.963E-05 | global batch size: 256 | lm loss: 4.515078E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.192 | TFLOPs: 11.32 | 7: iteration 113060/ 173500 | consumed samples: 28943360 | consumed tokens: 59276001280 | elapsed time per iteration (s): 0.09 | learning rate: 6.961E-05 | global batch size: 256 | lm loss: 4.516501E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2982.292 | TFLOPs: 11.09 | 7: iteration 113070/ 173500 | consumed samples: 28945920 | consumed tokens: 59281244160 | elapsed time per iteration (s): 0.08 | learning rate: 6.960E-05 | global batch size: 256 | lm loss: 4.510466E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.261 | TFLOPs: 11.38 | 7: iteration 113080/ 173500 | consumed samples: 28948480 | consumed tokens: 59286487040 | elapsed time per iteration (s): 0.10 | learning rate: 6.959E-05 | global batch size: 256 | lm loss: 4.512106E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.210 | TFLOPs: 9.76 | 7: iteration 113090/ 173500 | consumed samples: 28951040 | consumed tokens: 59291729920 | elapsed time per iteration (s): 0.09 | learning rate: 6.957E-05 | global batch size: 256 | lm loss: 4.518049E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.821 | TFLOPs: 11.02 | 7: iteration 113100/ 173500 | consumed samples: 28953600 | consumed tokens: 59296972800 | elapsed time per iteration (s): 0.10 | learning rate: 6.956E-05 | global batch size: 256 | lm loss: 4.508755E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.758 | TFLOPs: 10.01 | 7: iteration 113110/ 173500 | consumed samples: 28956160 | consumed tokens: 59302215680 | elapsed time per iteration (s): 0.08 | learning rate: 6.954E-05 | global batch size: 256 | lm loss: 4.508374E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.968 | TFLOPs: 11.65 | 7: iteration 113120/ 173500 | consumed samples: 28958720 | consumed tokens: 59307458560 | elapsed time per iteration (s): 0.08 | learning rate: 6.953E-05 | global batch size: 256 | lm loss: 4.511067E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.725 | TFLOPs: 11.68 | 7: iteration 113130/ 173500 | consumed samples: 28961280 | consumed tokens: 59312701440 | elapsed time per iteration (s): 0.08 | learning rate: 6.951E-05 | global batch size: 256 | lm loss: 4.518083E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.804 | TFLOPs: 11.90 | 7: iteration 113140/ 173500 | consumed samples: 28963840 | consumed tokens: 59317944320 | elapsed time per iteration (s): 0.08 | learning rate: 6.950E-05 | global batch size: 256 | lm loss: 4.511578E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.614 | TFLOPs: 11.90 | 7: iteration 113150/ 173500 | consumed samples: 28966400 | consumed tokens: 59323187200 | elapsed time per iteration (s): 0.08 | learning rate: 6.948E-05 | global batch size: 256 | lm loss: 4.513269E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.572 | TFLOPs: 11.78 | 7: iteration 113160/ 173500 | consumed samples: 28968960 | consumed tokens: 59328430080 | elapsed time per iteration (s): 0.09 | learning rate: 6.947E-05 | global batch size: 256 | lm loss: 4.516497E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.117 | TFLOPs: 11.16 | 7: iteration 113170/ 173500 | consumed samples: 28971520 | consumed tokens: 59333672960 | elapsed time per iteration (s): 0.08 | learning rate: 6.945E-05 | global batch size: 256 | lm loss: 4.516312E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.332 | TFLOPs: 11.58 | 7: iteration 113180/ 173500 | consumed samples: 28974080 | consumed tokens: 59338915840 | elapsed time per iteration (s): 0.08 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 4.509643E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.190 | TFLOPs: 11.46 | 7: iteration 113190/ 173500 | consumed samples: 28976640 | consumed tokens: 59344158720 | elapsed time per iteration (s): 0.08 | learning rate: 6.942E-05 | global batch size: 256 | lm loss: 4.527105E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.687 | TFLOPs: 11.68 | 7: iteration 113200/ 173500 | consumed samples: 28979200 | consumed tokens: 59349401600 | elapsed time per iteration (s): 0.08 | learning rate: 6.941E-05 | global batch size: 256 | lm loss: 4.520001E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.155 | TFLOPs: 11.96 | 7: iteration 113210/ 173500 | consumed samples: 28981760 | consumed tokens: 59354644480 | elapsed time per iteration (s): 0.09 | learning rate: 6.939E-05 | global batch size: 256 | lm loss: 4.518732E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.118 | TFLOPs: 10.86 | 7: iteration 113220/ 173500 | consumed samples: 28984320 | consumed tokens: 59359887360 | elapsed time per iteration (s): 0.08 | learning rate: 6.938E-05 | global batch size: 256 | lm loss: 4.519766E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.147 | TFLOPs: 11.78 | 7: iteration 113230/ 173500 | consumed samples: 28986880 | consumed tokens: 59365130240 | elapsed time per iteration (s): 0.09 | learning rate: 6.936E-05 | global batch size: 256 | lm loss: 4.511493E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.332 | TFLOPs: 11.07 | 7: iteration 113240/ 173500 | consumed samples: 28989440 | consumed tokens: 59370373120 | elapsed time per iteration (s): 0.08 | learning rate: 6.935E-05 | global batch size: 256 | lm loss: 4.511445E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.746 | TFLOPs: 11.83 | 7: iteration 113250/ 173500 | consumed samples: 28992000 | consumed tokens: 59375616000 | elapsed time per iteration (s): 0.08 | learning rate: 6.934E-05 | global batch size: 256 | lm loss: 4.510975E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.042 | TFLOPs: 11.38 | 7: iteration 113260/ 173500 | consumed samples: 28994560 | consumed tokens: 59380858880 | elapsed time per iteration (s): 0.09 | learning rate: 6.932E-05 | global batch size: 256 | lm loss: 4.517879E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.231 | TFLOPs: 11.05 | 7: iteration 113270/ 173500 | consumed samples: 28997120 | consumed tokens: 59386101760 | elapsed time per iteration (s): 0.08 | learning rate: 6.931E-05 | global batch size: 256 | lm loss: 4.513060E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.419 | TFLOPs: 11.85 | 7: iteration 113280/ 173500 | consumed samples: 28999680 | consumed tokens: 59391344640 | elapsed time per iteration (s): 0.09 | learning rate: 6.929E-05 | global batch size: 256 | lm loss: 4.520492E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.694 | TFLOPs: 10.88 | 7: iteration 113290/ 173500 | consumed samples: 29002240 | consumed tokens: 59396587520 | elapsed time per iteration (s): 0.08 | learning rate: 6.928E-05 | global batch size: 256 | lm loss: 4.515145E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.788 | TFLOPs: 11.78 | 7: iteration 113300/ 173500 | consumed samples: 29004800 | consumed tokens: 59401830400 | elapsed time per iteration (s): 0.08 | learning rate: 6.926E-05 | global batch size: 256 | lm loss: 4.527979E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.065 | TFLOPs: 11.84 | 7: iteration 113310/ 173500 | consumed samples: 29007360 | consumed tokens: 59407073280 | elapsed time per iteration (s): 0.08 | learning rate: 6.925E-05 | global batch size: 256 | lm loss: 4.509389E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.478 | TFLOPs: 11.81 | 7: iteration 113320/ 173500 | consumed samples: 29009920 | consumed tokens: 59412316160 | elapsed time per iteration (s): 0.08 | learning rate: 6.923E-05 | global batch size: 256 | lm loss: 4.499775E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.326 | TFLOPs: 11.77 | 7: iteration 113330/ 173500 | consumed samples: 29012480 | consumed tokens: 59417559040 | elapsed time per iteration (s): 0.08 | learning rate: 6.922E-05 | global batch size: 256 | lm loss: 4.516705E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.013 | TFLOPs: 11.84 | 7: iteration 113340/ 173500 | consumed samples: 29015040 | consumed tokens: 59422801920 | elapsed time per iteration (s): 0.08 | learning rate: 6.920E-05 | global batch size: 256 | lm loss: 4.511941E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.301 | TFLOPs: 11.38 | 7: iteration 113350/ 173500 | consumed samples: 29017600 | consumed tokens: 59428044800 | elapsed time per iteration (s): 0.08 | learning rate: 6.919E-05 | global batch size: 256 | lm loss: 4.510934E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.024 | TFLOPs: 11.55 | 7: iteration 113360/ 173500 | consumed samples: 29020160 | consumed tokens: 59433287680 | elapsed time per iteration (s): 0.10 | learning rate: 6.917E-05 | global batch size: 256 | lm loss: 4.523159E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2544.106 | TFLOPs: 9.46 | 7: iteration 113370/ 173500 | consumed samples: 29022720 | consumed tokens: 59438530560 | elapsed time per iteration (s): 0.09 | learning rate: 6.916E-05 | global batch size: 256 | lm loss: 4.514502E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.114 | TFLOPs: 10.80 | 7: iteration 113380/ 173500 | consumed samples: 29025280 | consumed tokens: 59443773440 | elapsed time per iteration (s): 0.08 | learning rate: 6.914E-05 | global batch size: 256 | lm loss: 4.519130E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.293 | TFLOPs: 11.83 | 7: iteration 113390/ 173500 | consumed samples: 29027840 | consumed tokens: 59449016320 | elapsed time per iteration (s): 0.08 | learning rate: 6.913E-05 | global batch size: 256 | lm loss: 4.517752E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.099 | TFLOPs: 11.82 | 7: iteration 113400/ 173500 | consumed samples: 29030400 | consumed tokens: 59454259200 | elapsed time per iteration (s): 0.08 | learning rate: 6.912E-05 | global batch size: 256 | lm loss: 4.499236E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.708 | TFLOPs: 11.80 | 7: iteration 113410/ 173500 | consumed samples: 29032960 | consumed tokens: 59459502080 | elapsed time per iteration (s): 0.08 | learning rate: 6.910E-05 | global batch size: 256 | lm loss: 4.513340E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.528 | TFLOPs: 11.46 | 7: iteration 113420/ 173500 | consumed samples: 29035520 | consumed tokens: 59464744960 | elapsed time per iteration (s): 0.09 | learning rate: 6.909E-05 | global batch size: 256 | lm loss: 4.514201E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.941 | TFLOPs: 10.99 | 7: iteration 113430/ 173500 | consumed samples: 29038080 | consumed tokens: 59469987840 | elapsed time per iteration (s): 0.08 | learning rate: 6.907E-05 | global batch size: 256 | lm loss: 4.512785E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.887 | TFLOPs: 11.85 | 7: iteration 113440/ 173500 | consumed samples: 29040640 | consumed tokens: 59475230720 | elapsed time per iteration (s): 0.08 | learning rate: 6.906E-05 | global batch size: 256 | lm loss: 4.517943E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.786 | TFLOPs: 11.56 | 7: iteration 113450/ 173500 | consumed samples: 29043200 | consumed tokens: 59480473600 | elapsed time per iteration (s): 0.08 | learning rate: 6.904E-05 | global batch size: 256 | lm loss: 4.523862E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.324 | TFLOPs: 11.90 | 7: iteration 113460/ 173500 | consumed samples: 29045760 | consumed tokens: 59485716480 | elapsed time per iteration (s): 0.08 | learning rate: 6.903E-05 | global batch size: 256 | lm loss: 4.511973E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.460 | TFLOPs: 11.36 | 7: iteration 113470/ 173500 | consumed samples: 29048320 | consumed tokens: 59490959360 | elapsed time per iteration (s): 0.08 | learning rate: 6.901E-05 | global batch size: 256 | lm loss: 4.529175E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.458 | TFLOPs: 11.82 | 7: iteration 113480/ 173500 | consumed samples: 29050880 | consumed tokens: 59496202240 | elapsed time per iteration (s): 0.09 | learning rate: 6.900E-05 | global batch size: 256 | lm loss: 4.520537E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.112 | TFLOPs: 10.55 | 7: iteration 113490/ 173500 | consumed samples: 29053440 | consumed tokens: 59501445120 | elapsed time per iteration (s): 0.08 | learning rate: 6.898E-05 | global batch size: 256 | lm loss: 4.520036E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.080 | TFLOPs: 11.45 | 7: iteration 113500/ 173500 | consumed samples: 29056000 | consumed tokens: 59506688000 | elapsed time per iteration (s): 0.08 | learning rate: 6.897E-05 | global batch size: 256 | lm loss: 4.509621E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.198 | TFLOPs: 11.82 | 7: iteration 113510/ 173500 | consumed samples: 29058560 | consumed tokens: 59511930880 | elapsed time per iteration (s): 0.08 | learning rate: 6.895E-05 | global batch size: 256 | lm loss: 4.504436E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.639 | TFLOPs: 11.93 | 7: iteration 113520/ 173500 | consumed samples: 29061120 | consumed tokens: 59517173760 | elapsed time per iteration (s): 0.08 | learning rate: 6.894E-05 | global batch size: 256 | lm loss: 4.511609E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.502 | TFLOPs: 11.93 | 7: iteration 113530/ 173500 | consumed samples: 29063680 | consumed tokens: 59522416640 | elapsed time per iteration (s): 0.08 | learning rate: 6.892E-05 | global batch size: 256 | lm loss: 4.508560E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.217 | TFLOPs: 11.86 | 7: iteration 113540/ 173500 | consumed samples: 29066240 | consumed tokens: 59527659520 | elapsed time per iteration (s): 0.08 | learning rate: 6.891E-05 | global batch size: 256 | lm loss: 4.522683E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.681 | TFLOPs: 11.88 | 7: iteration 113550/ 173500 | consumed samples: 29068800 | consumed tokens: 59532902400 | elapsed time per iteration (s): 0.09 | learning rate: 6.890E-05 | global batch size: 256 | lm loss: 4.509929E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.664 | TFLOPs: 11.08 | 7: iteration 113560/ 173500 | consumed samples: 29071360 | consumed tokens: 59538145280 | elapsed time per iteration (s): 0.10 | learning rate: 6.888E-05 | global batch size: 256 | lm loss: 4.510637E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.930 | TFLOPs: 9.52 | 7: iteration 113570/ 173500 | consumed samples: 29073920 | consumed tokens: 59543388160 | elapsed time per iteration (s): 0.08 | learning rate: 6.887E-05 | global batch size: 256 | lm loss: 4.510844E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.741 | TFLOPs: 11.97 | 7: iteration 113580/ 173500 | consumed samples: 29076480 | consumed tokens: 59548631040 | elapsed time per iteration (s): 0.08 | learning rate: 6.885E-05 | global batch size: 256 | lm loss: 4.524247E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.455 | TFLOPs: 11.99 | 7: iteration 113590/ 173500 | consumed samples: 29079040 | consumed tokens: 59553873920 | elapsed time per iteration (s): 0.08 | learning rate: 6.884E-05 | global batch size: 256 | lm loss: 4.510617E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.154 | TFLOPs: 12.03 | 7: iteration 113600/ 173500 | consumed samples: 29081600 | consumed tokens: 59559116800 | elapsed time per iteration (s): 0.09 | learning rate: 6.882E-05 | global batch size: 256 | lm loss: 4.517338E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.732 | TFLOPs: 11.04 | 7: iteration 113610/ 173500 | consumed samples: 29084160 | consumed tokens: 59564359680 | elapsed time per iteration (s): 0.09 | learning rate: 6.881E-05 | global batch size: 256 | lm loss: 4.525596E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.742 | TFLOPs: 10.69 | 7: iteration 113620/ 173500 | consumed samples: 29086720 | consumed tokens: 59569602560 | elapsed time per iteration (s): 0.10 | learning rate: 6.879E-05 | global batch size: 256 | lm loss: 4.521678E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2632.640 | TFLOPs: 9.79 | 7: iteration 113630/ 173500 | consumed samples: 29089280 | consumed tokens: 59574845440 | elapsed time per iteration (s): 0.08 | learning rate: 6.878E-05 | global batch size: 256 | lm loss: 4.523434E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.939 | TFLOPs: 12.05 | 7: iteration 113640/ 173500 | consumed samples: 29091840 | consumed tokens: 59580088320 | elapsed time per iteration (s): 0.10 | learning rate: 6.876E-05 | global batch size: 256 | lm loss: 4.516429E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.261 | TFLOPs: 9.28 | 7: iteration 113650/ 173500 | consumed samples: 29094400 | consumed tokens: 59585331200 | elapsed time per iteration (s): 0.11 | learning rate: 6.875E-05 | global batch size: 256 | lm loss: 4.524266E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2287.228 | TFLOPs: 8.51 | 7: iteration 113660/ 173500 | consumed samples: 29096960 | consumed tokens: 59590574080 | elapsed time per iteration (s): 0.11 | learning rate: 6.873E-05 | global batch size: 256 | lm loss: 4.501871E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.019 | TFLOPs: 8.56 | 7: iteration 113670/ 173500 | consumed samples: 29099520 | consumed tokens: 59595816960 | elapsed time per iteration (s): 0.12 | learning rate: 6.872E-05 | global batch size: 256 | lm loss: 4.505486E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2133.549 | TFLOPs: 7.94 | 7: iteration 113680/ 173500 | consumed samples: 29102080 | consumed tokens: 59601059840 | elapsed time per iteration (s): 0.12 | learning rate: 6.871E-05 | global batch size: 256 | lm loss: 4.524746E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2171.385 | TFLOPs: 8.08 | 7: iteration 113690/ 173500 | consumed samples: 29104640 | consumed tokens: 59606302720 | elapsed time per iteration (s): 0.12 | learning rate: 6.869E-05 | global batch size: 256 | lm loss: 4.505442E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2174.912 | TFLOPs: 8.09 | 7: iteration 113700/ 173500 | consumed samples: 29107200 | consumed tokens: 59611545600 | elapsed time per iteration (s): 0.11 | learning rate: 6.868E-05 | global batch size: 256 | lm loss: 4.519486E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2230.282 | TFLOPs: 8.30 | 7: iteration 113710/ 173500 | consumed samples: 29109760 | consumed tokens: 59616788480 | elapsed time per iteration (s): 0.12 | learning rate: 6.866E-05 | global batch size: 256 | lm loss: 4.518105E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.756 | TFLOPs: 8.15 | 7: iteration 113720/ 173500 | consumed samples: 29112320 | consumed tokens: 59622031360 | elapsed time per iteration (s): 0.12 | learning rate: 6.865E-05 | global batch size: 256 | lm loss: 4.520039E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.109 | TFLOPs: 7.88 | 7: iteration 113730/ 173500 | consumed samples: 29114880 | consumed tokens: 59627274240 | elapsed time per iteration (s): 0.12 | learning rate: 6.863E-05 | global batch size: 256 | lm loss: 4.520179E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2049.269 | TFLOPs: 7.62 | 7: iteration 113740/ 173500 | consumed samples: 29117440 | consumed tokens: 59632517120 | elapsed time per iteration (s): 0.12 | learning rate: 6.862E-05 | global batch size: 256 | lm loss: 4.506507E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.707 | TFLOPs: 7.82 | 7: iteration 113750/ 173500 | consumed samples: 29120000 | consumed tokens: 59637760000 | elapsed time per iteration (s): 0.12 | learning rate: 6.860E-05 | global batch size: 256 | lm loss: 4.513857E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2088.352 | TFLOPs: 7.77 | 7: iteration 113760/ 173500 | consumed samples: 29122560 | consumed tokens: 59643002880 | elapsed time per iteration (s): 0.11 | learning rate: 6.859E-05 | global batch size: 256 | lm loss: 4.508090E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2350.220 | TFLOPs: 8.74 | 7: iteration 113770/ 173500 | consumed samples: 29125120 | consumed tokens: 59648245760 | elapsed time per iteration (s): 0.12 | learning rate: 6.857E-05 | global batch size: 256 | lm loss: 4.526405E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2156.770 | TFLOPs: 8.02 | 7: iteration 113780/ 173500 | consumed samples: 29127680 | consumed tokens: 59653488640 | elapsed time per iteration (s): 0.13 | learning rate: 6.856E-05 | global batch size: 256 | lm loss: 4.518179E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1935.549 | TFLOPs: 7.20 | 7: iteration 113790/ 173500 | consumed samples: 29130240 | consumed tokens: 59658731520 | elapsed time per iteration (s): 0.12 | learning rate: 6.854E-05 | global batch size: 256 | lm loss: 4.513098E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2204.955 | TFLOPs: 8.20 | 7: iteration 113800/ 173500 | consumed samples: 29132800 | consumed tokens: 59663974400 | elapsed time per iteration (s): 0.12 | learning rate: 6.853E-05 | global batch size: 256 | lm loss: 4.513433E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2067.701 | TFLOPs: 7.69 | 7: iteration 113810/ 173500 | consumed samples: 29135360 | consumed tokens: 59669217280 | elapsed time per iteration (s): 0.13 | learning rate: 6.852E-05 | global batch size: 256 | lm loss: 4.517717E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1900.127 | TFLOPs: 7.07 | 7: iteration 113820/ 173500 | consumed samples: 29137920 | consumed tokens: 59674460160 | elapsed time per iteration (s): 0.10 | learning rate: 6.850E-05 | global batch size: 256 | lm loss: 4.524243E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.989 | TFLOPs: 9.52 | 7: iteration 113830/ 173500 | consumed samples: 29140480 | consumed tokens: 59679703040 | elapsed time per iteration (s): 0.11 | learning rate: 6.849E-05 | global batch size: 256 | lm loss: 4.509653E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2274.260 | TFLOPs: 8.46 | 7: iteration 113840/ 173500 | consumed samples: 29143040 | consumed tokens: 59684945920 | elapsed time per iteration (s): 0.13 | learning rate: 6.847E-05 | global batch size: 256 | lm loss: 4.498886E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1999.539 | TFLOPs: 7.44 | 7: iteration 113850/ 173500 | consumed samples: 29145600 | consumed tokens: 59690188800 | elapsed time per iteration (s): 0.12 | learning rate: 6.846E-05 | global batch size: 256 | lm loss: 4.513278E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2146.608 | TFLOPs: 7.98 | 7: iteration 113860/ 173500 | consumed samples: 29148160 | consumed tokens: 59695431680 | elapsed time per iteration (s): 0.13 | learning rate: 6.844E-05 | global batch size: 256 | lm loss: 4.516185E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.652 | TFLOPs: 7.61 | 7: iteration 113870/ 173500 | consumed samples: 29150720 | consumed tokens: 59700674560 | elapsed time per iteration (s): 0.13 | learning rate: 6.843E-05 | global batch size: 256 | lm loss: 4.515813E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.744 | TFLOPs: 7.42 | 7: iteration 113880/ 173500 | consumed samples: 29153280 | consumed tokens: 59705917440 | elapsed time per iteration (s): 0.13 | learning rate: 6.841E-05 | global batch size: 256 | lm loss: 4.522142E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.816 | TFLOPs: 7.33 | 7: iteration 113890/ 173500 | consumed samples: 29155840 | consumed tokens: 59711160320 | elapsed time per iteration (s): 0.13 | learning rate: 6.840E-05 | global batch size: 256 | lm loss: 4.504823E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1980.701 | TFLOPs: 7.37 | 7: iteration 113900/ 173500 | consumed samples: 29158400 | consumed tokens: 59716403200 | elapsed time per iteration (s): 0.13 | learning rate: 6.838E-05 | global batch size: 256 | lm loss: 4.508372E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2024.604 | TFLOPs: 7.53 | 7: iteration 113910/ 173500 | consumed samples: 29160960 | consumed tokens: 59721646080 | elapsed time per iteration (s): 0.12 | learning rate: 6.837E-05 | global batch size: 256 | lm loss: 4.514812E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2175.439 | TFLOPs: 8.09 | 7: iteration 113920/ 173500 | consumed samples: 29163520 | consumed tokens: 59726888960 | elapsed time per iteration (s): 0.10 | learning rate: 6.835E-05 | global batch size: 256 | lm loss: 4.509251E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2648.052 | TFLOPs: 9.85 | 7: iteration 113930/ 173500 | consumed samples: 29166080 | consumed tokens: 59732131840 | elapsed time per iteration (s): 0.10 | learning rate: 6.834E-05 | global batch size: 256 | lm loss: 4.517093E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.409 | TFLOPs: 9.12 | 7: iteration 113940/ 173500 | consumed samples: 29168640 | consumed tokens: 59737374720 | elapsed time per iteration (s): 0.10 | learning rate: 6.833E-05 | global batch size: 256 | lm loss: 4.507396E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.647 | TFLOPs: 9.16 | 7: iteration 113950/ 173500 | consumed samples: 29171200 | consumed tokens: 59742617600 | elapsed time per iteration (s): 0.10 | learning rate: 6.831E-05 | global batch size: 256 | lm loss: 4.517475E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.241 | TFLOPs: 9.37 | 7: iteration 113960/ 173500 | consumed samples: 29173760 | consumed tokens: 59747860480 | elapsed time per iteration (s): 0.10 | learning rate: 6.830E-05 | global batch size: 256 | lm loss: 4.521443E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.551 | TFLOPs: 9.38 | 7: iteration 113970/ 173500 | consumed samples: 29176320 | consumed tokens: 59753103360 | elapsed time per iteration (s): 0.10 | learning rate: 6.828E-05 | global batch size: 256 | lm loss: 4.531086E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.341 | TFLOPs: 9.26 | 7: iteration 113980/ 173500 | consumed samples: 29178880 | consumed tokens: 59758346240 | elapsed time per iteration (s): 0.10 | learning rate: 6.827E-05 | global batch size: 256 | lm loss: 4.529771E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2489.956 | TFLOPs: 9.26 | 7: iteration 113990/ 173500 | consumed samples: 29181440 | consumed tokens: 59763589120 | elapsed time per iteration (s): 0.10 | learning rate: 6.825E-05 | global batch size: 256 | lm loss: 4.518433E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.417 | TFLOPs: 9.56 | 0: [2023-03-17 03:00:36,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=0, lr=[6.823796836261315e-05, 6.823796836261315e-05, 6.823796836261315e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 114000/ 173500 | consumed samples: 29184000 | consumed tokens: 59768832000 | elapsed time per iteration (s): 0.10 | learning rate: 6.824E-05 | global batch size: 256 | lm loss: 4.513991E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.661 | TFLOPs: 9.16 | 0: steps: 114000 loss: 4.5170 iter time (s): 0.090 samples/sec: 2840.596 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 114000 | lm loss value: 4.414803E+00 | lm loss PPL: 8.266552E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 114000 to checkpoints_14m91b100m 0: [2023-03-17 03:00:36,803] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step114000 is begin to save! 0: [2023-03-17 03:00:36,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:00:36,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:00:36,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:00:36,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:00:36,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:00:36,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:00:36,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:00:36,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:00:36,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:00:36,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:00:36,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:00:36,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:00:36,846] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step114000/mp_rank_00_model_states.pt 0: [2023-03-17 03:00:36,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:00:36,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:00:36,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:00:36,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2023-03-17 03:00:36,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:00:36,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: successfully saved checkpoint at iteration 114000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.59 7: iteration 114010/ 173500 | consumed samples: 29186560 | consumed tokens: 59774074880 | elapsed time per iteration (s): 0.11 | learning rate: 6.822E-05 | global batch size: 256 | lm loss: 4.509189E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.353 | TFLOPs: 8.41 | 7: iteration 114020/ 173500 | consumed samples: 29189120 | consumed tokens: 59779317760 | elapsed time per iteration (s): 0.10 | learning rate: 6.821E-05 | global batch size: 256 | lm loss: 4.512924E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.870 | TFLOPs: 9.48 | 7: iteration 114030/ 173500 | consumed samples: 29191680 | consumed tokens: 59784560640 | elapsed time per iteration (s): 0.11 | learning rate: 6.819E-05 | global batch size: 256 | lm loss: 4.515365E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2391.235 | TFLOPs: 8.89 | 7: iteration 114040/ 173500 | consumed samples: 29194240 | consumed tokens: 59789803520 | elapsed time per iteration (s): 0.10 | learning rate: 6.818E-05 | global batch size: 256 | lm loss: 4.514808E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2506.254 | TFLOPs: 9.32 | 7: iteration 114050/ 173500 | consumed samples: 29196800 | consumed tokens: 59795046400 | elapsed time per iteration (s): 0.10 | learning rate: 6.817E-05 | global batch size: 256 | lm loss: 4.517444E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.908 | TFLOPs: 9.23 | 7: iteration 114060/ 173500 | consumed samples: 29199360 | consumed tokens: 59800289280 | elapsed time per iteration (s): 0.10 | learning rate: 6.815E-05 | global batch size: 256 | lm loss: 4.517701E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.270 | TFLOPs: 9.26 | 7: iteration 114070/ 173500 | consumed samples: 29201920 | consumed tokens: 59805532160 | elapsed time per iteration (s): 0.11 | learning rate: 6.814E-05 | global batch size: 256 | lm loss: 4.501199E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2331.414 | TFLOPs: 8.67 | 7: iteration 114080/ 173500 | consumed samples: 29204480 | consumed tokens: 59810775040 | elapsed time per iteration (s): 0.11 | learning rate: 6.812E-05 | global batch size: 256 | lm loss: 4.523735E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.283 | TFLOPs: 8.95 | 7: iteration 114090/ 173500 | consumed samples: 29207040 | consumed tokens: 59816017920 | elapsed time per iteration (s): 0.31 | learning rate: 6.811E-05 | global batch size: 256 | lm loss: 4.506773E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 835.903 | TFLOPs: 3.11 | 7: iteration 114100/ 173500 | consumed samples: 29209600 | consumed tokens: 59821260800 | elapsed time per iteration (s): 0.08 | learning rate: 6.809E-05 | global batch size: 256 | lm loss: 4.514569E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.240 | TFLOPs: 11.82 | 7: iteration 114110/ 173500 | consumed samples: 29212160 | consumed tokens: 59826503680 | elapsed time per iteration (s): 0.09 | learning rate: 6.808E-05 | global batch size: 256 | lm loss: 4.518859E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.261 | TFLOPs: 10.55 | 7: iteration 114120/ 173500 | consumed samples: 29214720 | consumed tokens: 59831746560 | elapsed time per iteration (s): 0.08 | learning rate: 6.806E-05 | global batch size: 256 | lm loss: 4.516936E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.159 | TFLOPs: 11.65 | 7: iteration 114130/ 173500 | consumed samples: 29217280 | consumed tokens: 59836989440 | elapsed time per iteration (s): 0.08 | learning rate: 6.805E-05 | global batch size: 256 | lm loss: 4.520871E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.885 | TFLOPs: 11.94 | 7: iteration 114140/ 173500 | consumed samples: 29219840 | consumed tokens: 59842232320 | elapsed time per iteration (s): 0.08 | learning rate: 6.803E-05 | global batch size: 256 | lm loss: 4.505781E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.967 | TFLOPs: 11.99 | 7: iteration 114150/ 173500 | consumed samples: 29222400 | consumed tokens: 59847475200 | elapsed time per iteration (s): 0.08 | learning rate: 6.802E-05 | global batch size: 256 | lm loss: 4.511929E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.439 | TFLOPs: 11.96 | 7: iteration 114160/ 173500 | consumed samples: 29224960 | consumed tokens: 59852718080 | elapsed time per iteration (s): 0.08 | learning rate: 6.800E-05 | global batch size: 256 | lm loss: 4.510303E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.826 | TFLOPs: 11.99 | 7: iteration 114170/ 173500 | consumed samples: 29227520 | consumed tokens: 59857960960 | elapsed time per iteration (s): 0.08 | learning rate: 6.799E-05 | global batch size: 256 | lm loss: 4.514385E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.737 | TFLOPs: 11.76 | 7: iteration 114180/ 173500 | consumed samples: 29230080 | consumed tokens: 59863203840 | elapsed time per iteration (s): 0.08 | learning rate: 6.798E-05 | global batch size: 256 | lm loss: 4.511757E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.991 | TFLOPs: 11.57 | 7: iteration 114190/ 173500 | consumed samples: 29232640 | consumed tokens: 59868446720 | elapsed time per iteration (s): 0.08 | learning rate: 6.796E-05 | global batch size: 256 | lm loss: 4.516219E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.026 | TFLOPs: 11.41 | 7: iteration 114200/ 173500 | consumed samples: 29235200 | consumed tokens: 59873689600 | elapsed time per iteration (s): 0.10 | learning rate: 6.795E-05 | global batch size: 256 | lm loss: 4.514714E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.506 | TFLOPs: 9.64 | 7: iteration 114210/ 173500 | consumed samples: 29237760 | consumed tokens: 59878932480 | elapsed time per iteration (s): 0.11 | learning rate: 6.793E-05 | global batch size: 256 | lm loss: 4.497490E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2434.311 | TFLOPs: 9.05 | 7: iteration 114220/ 173500 | consumed samples: 29240320 | consumed tokens: 59884175360 | elapsed time per iteration (s): 0.10 | learning rate: 6.792E-05 | global batch size: 256 | lm loss: 4.509988E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.872 | TFLOPs: 9.19 | 7: iteration 114230/ 173500 | consumed samples: 29242880 | consumed tokens: 59889418240 | elapsed time per iteration (s): 0.11 | learning rate: 6.790E-05 | global batch size: 256 | lm loss: 4.517339E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2413.819 | TFLOPs: 8.98 | 7: iteration 114240/ 173500 | consumed samples: 29245440 | consumed tokens: 59894661120 | elapsed time per iteration (s): 0.11 | learning rate: 6.789E-05 | global batch size: 256 | lm loss: 4.513839E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.318 | TFLOPs: 8.80 | 7: iteration 114250/ 173500 | consumed samples: 29248000 | consumed tokens: 59899904000 | elapsed time per iteration (s): 0.12 | learning rate: 6.787E-05 | global batch size: 256 | lm loss: 4.513215E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2193.500 | TFLOPs: 8.16 | 7: iteration 114260/ 173500 | consumed samples: 29250560 | consumed tokens: 59905146880 | elapsed time per iteration (s): 0.09 | learning rate: 6.786E-05 | global batch size: 256 | lm loss: 4.509621E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.167 | TFLOPs: 10.22 | 7: iteration 114270/ 173500 | consumed samples: 29253120 | consumed tokens: 59910389760 | elapsed time per iteration (s): 0.08 | learning rate: 6.784E-05 | global batch size: 256 | lm loss: 4.513889E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.160 | TFLOPs: 11.32 | 7: iteration 114280/ 173500 | consumed samples: 29255680 | consumed tokens: 59915632640 | elapsed time per iteration (s): 0.08 | learning rate: 6.783E-05 | global batch size: 256 | lm loss: 4.514709E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.456 | TFLOPs: 11.25 | 7: iteration 114290/ 173500 | consumed samples: 29258240 | consumed tokens: 59920875520 | elapsed time per iteration (s): 0.09 | learning rate: 6.782E-05 | global batch size: 256 | lm loss: 4.519450E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.859 | TFLOPs: 11.18 | 7: iteration 114300/ 173500 | consumed samples: 29260800 | consumed tokens: 59926118400 | elapsed time per iteration (s): 0.11 | learning rate: 6.780E-05 | global batch size: 256 | lm loss: 4.510030E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2385.177 | TFLOPs: 8.87 | 7: iteration 114310/ 173500 | consumed samples: 29263360 | consumed tokens: 59931361280 | elapsed time per iteration (s): 0.11 | learning rate: 6.779E-05 | global batch size: 256 | lm loss: 4.511146E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2303.527 | TFLOPs: 8.57 | 7: iteration 114320/ 173500 | consumed samples: 29265920 | consumed tokens: 59936604160 | elapsed time per iteration (s): 0.09 | learning rate: 6.777E-05 | global batch size: 256 | lm loss: 4.510201E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.735 | TFLOPs: 10.55 | 7: iteration 114330/ 173500 | consumed samples: 29268480 | consumed tokens: 59941847040 | elapsed time per iteration (s): 0.08 | learning rate: 6.776E-05 | global batch size: 256 | lm loss: 4.512792E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.001 | TFLOPs: 11.34 | 7: iteration 114340/ 173500 | consumed samples: 29271040 | consumed tokens: 59947089920 | elapsed time per iteration (s): 0.09 | learning rate: 6.774E-05 | global batch size: 256 | lm loss: 4.518845E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.866 | TFLOPs: 11.18 | 7: iteration 114350/ 173500 | consumed samples: 29273600 | consumed tokens: 59952332800 | elapsed time per iteration (s): 0.08 | learning rate: 6.773E-05 | global batch size: 256 | lm loss: 4.509068E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.540 | TFLOPs: 11.80 | 7: iteration 114360/ 173500 | consumed samples: 29276160 | consumed tokens: 59957575680 | elapsed time per iteration (s): 0.08 | learning rate: 6.771E-05 | global batch size: 256 | lm loss: 4.510312E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.349 | TFLOPs: 11.54 | 7: iteration 114370/ 173500 | consumed samples: 29278720 | consumed tokens: 59962818560 | elapsed time per iteration (s): 0.08 | learning rate: 6.770E-05 | global batch size: 256 | lm loss: 4.506767E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.579 | TFLOPs: 11.80 | 7: iteration 114380/ 173500 | consumed samples: 29281280 | consumed tokens: 59968061440 | elapsed time per iteration (s): 0.08 | learning rate: 6.768E-05 | global batch size: 256 | lm loss: 4.516879E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.407 | TFLOPs: 11.68 | 7: iteration 114390/ 173500 | consumed samples: 29283840 | consumed tokens: 59973304320 | elapsed time per iteration (s): 0.09 | learning rate: 6.767E-05 | global batch size: 256 | lm loss: 4.522446E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.510 | TFLOPs: 11.05 | 7: iteration 114400/ 173500 | consumed samples: 29286400 | consumed tokens: 59978547200 | elapsed time per iteration (s): 0.09 | learning rate: 6.766E-05 | global batch size: 256 | lm loss: 4.517506E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2732.978 | TFLOPs: 10.17 | 7: iteration 114410/ 173500 | consumed samples: 29288960 | consumed tokens: 59983790080 | elapsed time per iteration (s): 0.10 | learning rate: 6.764E-05 | global batch size: 256 | lm loss: 4.522985E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2658.904 | TFLOPs: 9.89 | 7: iteration 114420/ 173500 | consumed samples: 29291520 | consumed tokens: 59989032960 | elapsed time per iteration (s): 0.08 | learning rate: 6.763E-05 | global batch size: 256 | lm loss: 4.508847E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.018 | TFLOPs: 11.57 | 7: iteration 114430/ 173500 | consumed samples: 29294080 | consumed tokens: 59994275840 | elapsed time per iteration (s): 0.08 | learning rate: 6.761E-05 | global batch size: 256 | lm loss: 4.524385E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.520 | TFLOPs: 11.99 | 7: iteration 114440/ 173500 | consumed samples: 29296640 | consumed tokens: 59999518720 | elapsed time per iteration (s): 0.09 | learning rate: 6.760E-05 | global batch size: 256 | lm loss: 4.512865E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2863.600 | TFLOPs: 10.65 | 7: iteration 114450/ 173500 | consumed samples: 29299200 | consumed tokens: 60004761600 | elapsed time per iteration (s): 0.12 | learning rate: 6.758E-05 | global batch size: 256 | lm loss: 4.507048E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2198.009 | TFLOPs: 8.18 | 7: iteration 114460/ 173500 | consumed samples: 29301760 | consumed tokens: 60010004480 | elapsed time per iteration (s): 0.09 | learning rate: 6.757E-05 | global batch size: 256 | lm loss: 4.506933E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2951.393 | TFLOPs: 10.98 | 7: iteration 114470/ 173500 | consumed samples: 29304320 | consumed tokens: 60015247360 | elapsed time per iteration (s): 0.08 | learning rate: 6.755E-05 | global batch size: 256 | lm loss: 4.511391E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.888 | TFLOPs: 11.56 | 7: iteration 114480/ 173500 | consumed samples: 29306880 | consumed tokens: 60020490240 | elapsed time per iteration (s): 0.10 | learning rate: 6.754E-05 | global batch size: 256 | lm loss: 4.527816E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.795 | TFLOPs: 9.42 | 7: iteration 114490/ 173500 | consumed samples: 29309440 | consumed tokens: 60025733120 | elapsed time per iteration (s): 0.09 | learning rate: 6.753E-05 | global batch size: 256 | lm loss: 4.507607E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.236 | TFLOPs: 10.05 | 7: iteration 114500/ 173500 | consumed samples: 29312000 | consumed tokens: 60030976000 | elapsed time per iteration (s): 0.09 | learning rate: 6.751E-05 | global batch size: 256 | lm loss: 4.502047E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.425 | TFLOPs: 10.72 | 7: iteration 114510/ 173500 | consumed samples: 29314560 | consumed tokens: 60036218880 | elapsed time per iteration (s): 0.09 | learning rate: 6.750E-05 | global batch size: 256 | lm loss: 4.520967E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2991.968 | TFLOPs: 11.13 | 7: iteration 114520/ 173500 | consumed samples: 29317120 | consumed tokens: 60041461760 | elapsed time per iteration (s): 0.09 | learning rate: 6.748E-05 | global batch size: 256 | lm loss: 4.522417E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.971 | TFLOPs: 10.48 | 7: iteration 114530/ 173500 | consumed samples: 29319680 | consumed tokens: 60046704640 | elapsed time per iteration (s): 0.08 | learning rate: 6.747E-05 | global batch size: 256 | lm loss: 4.505748E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.112 | TFLOPs: 11.89 | 7: iteration 114540/ 173500 | consumed samples: 29322240 | consumed tokens: 60051947520 | elapsed time per iteration (s): 0.08 | learning rate: 6.745E-05 | global batch size: 256 | lm loss: 4.514936E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.711 | TFLOPs: 11.94 | 7: iteration 114550/ 173500 | consumed samples: 29324800 | consumed tokens: 60057190400 | elapsed time per iteration (s): 0.08 | learning rate: 6.744E-05 | global batch size: 256 | lm loss: 4.504268E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.062 | TFLOPs: 12.00 | 7: iteration 114560/ 173500 | consumed samples: 29327360 | consumed tokens: 60062433280 | elapsed time per iteration (s): 0.08 | learning rate: 6.742E-05 | global batch size: 256 | lm loss: 4.511107E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.815 | TFLOPs: 12.01 | 7: iteration 114570/ 173500 | consumed samples: 29329920 | consumed tokens: 60067676160 | elapsed time per iteration (s): 0.10 | learning rate: 6.741E-05 | global batch size: 256 | lm loss: 4.518161E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.831 | TFLOPs: 9.82 | 7: iteration 114580/ 173500 | consumed samples: 29332480 | consumed tokens: 60072919040 | elapsed time per iteration (s): 0.09 | learning rate: 6.739E-05 | global batch size: 256 | lm loss: 4.521605E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.472 | TFLOPs: 10.17 | 7: iteration 114590/ 173500 | consumed samples: 29335040 | consumed tokens: 60078161920 | elapsed time per iteration (s): 0.10 | learning rate: 6.738E-05 | global batch size: 256 | lm loss: 4.512139E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.067 | TFLOPs: 9.82 | 7: iteration 114600/ 173500 | consumed samples: 29337600 | consumed tokens: 60083404800 | elapsed time per iteration (s): 0.08 | learning rate: 6.737E-05 | global batch size: 256 | lm loss: 4.507300E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.429 | TFLOPs: 11.86 | 7: iteration 114610/ 173500 | consumed samples: 29340160 | consumed tokens: 60088647680 | elapsed time per iteration (s): 0.08 | learning rate: 6.735E-05 | global batch size: 256 | lm loss: 4.525608E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.871 | TFLOPs: 11.91 | 7: iteration 114620/ 173500 | consumed samples: 29342720 | consumed tokens: 60093890560 | elapsed time per iteration (s): 0.08 | learning rate: 6.734E-05 | global batch size: 256 | lm loss: 4.498024E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.396 | TFLOPs: 11.97 | 7: iteration 114630/ 173500 | consumed samples: 29345280 | consumed tokens: 60099133440 | elapsed time per iteration (s): 0.08 | learning rate: 6.732E-05 | global batch size: 256 | lm loss: 4.509731E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.881 | TFLOPs: 11.95 | 7: iteration 114640/ 173500 | consumed samples: 29347840 | consumed tokens: 60104376320 | elapsed time per iteration (s): 0.09 | learning rate: 6.731E-05 | global batch size: 256 | lm loss: 4.517760E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.364 | TFLOPs: 10.84 | 7: iteration 114650/ 173500 | consumed samples: 29350400 | consumed tokens: 60109619200 | elapsed time per iteration (s): 0.08 | learning rate: 6.729E-05 | global batch size: 256 | lm loss: 4.508654E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.256 | TFLOPs: 11.98 | 7: iteration 114660/ 173500 | consumed samples: 29352960 | consumed tokens: 60114862080 | elapsed time per iteration (s): 0.08 | learning rate: 6.728E-05 | global batch size: 256 | lm loss: 4.511409E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.772 | TFLOPs: 11.43 | 7: iteration 114670/ 173500 | consumed samples: 29355520 | consumed tokens: 60120104960 | elapsed time per iteration (s): 0.08 | learning rate: 6.726E-05 | global batch size: 256 | lm loss: 4.505103E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.036 | TFLOPs: 11.92 | 7: iteration 114680/ 173500 | consumed samples: 29358080 | consumed tokens: 60125347840 | elapsed time per iteration (s): 0.08 | learning rate: 6.725E-05 | global batch size: 256 | lm loss: 4.497709E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.627 | TFLOPs: 11.97 | 7: iteration 114690/ 173500 | consumed samples: 29360640 | consumed tokens: 60130590720 | elapsed time per iteration (s): 0.08 | learning rate: 6.724E-05 | global batch size: 256 | lm loss: 4.511612E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.899 | TFLOPs: 11.37 | 7: iteration 114700/ 173500 | consumed samples: 29363200 | consumed tokens: 60135833600 | elapsed time per iteration (s): 0.08 | learning rate: 6.722E-05 | global batch size: 256 | lm loss: 4.527656E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.277 | TFLOPs: 11.96 | 7: iteration 114710/ 173500 | consumed samples: 29365760 | consumed tokens: 60141076480 | elapsed time per iteration (s): 0.08 | learning rate: 6.721E-05 | global batch size: 256 | lm loss: 4.513094E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.898 | TFLOPs: 11.95 | 7: iteration 114720/ 173500 | consumed samples: 29368320 | consumed tokens: 60146319360 | elapsed time per iteration (s): 0.08 | learning rate: 6.719E-05 | global batch size: 256 | lm loss: 4.527423E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.574 | TFLOPs: 11.76 | 7: iteration 114730/ 173500 | consumed samples: 29370880 | consumed tokens: 60151562240 | elapsed time per iteration (s): 0.10 | learning rate: 6.718E-05 | global batch size: 256 | lm loss: 4.514511E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2567.105 | TFLOPs: 9.55 | 7: iteration 114740/ 173500 | consumed samples: 29373440 | consumed tokens: 60156805120 | elapsed time per iteration (s): 0.08 | learning rate: 6.716E-05 | global batch size: 256 | lm loss: 4.521231E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.411 | TFLOPs: 11.33 | 7: iteration 114750/ 173500 | consumed samples: 29376000 | consumed tokens: 60162048000 | elapsed time per iteration (s): 0.08 | learning rate: 6.715E-05 | global batch size: 256 | lm loss: 4.508297E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.818 | TFLOPs: 11.92 | 7: iteration 114760/ 173500 | consumed samples: 29378560 | consumed tokens: 60167290880 | elapsed time per iteration (s): 0.11 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 4.512668E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.323 | TFLOPs: 8.99 | 7: iteration 114770/ 173500 | consumed samples: 29381120 | consumed tokens: 60172533760 | elapsed time per iteration (s): 0.11 | learning rate: 6.712E-05 | global batch size: 256 | lm loss: 4.524223E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.809 | TFLOPs: 8.60 | 7: iteration 114780/ 173500 | consumed samples: 29383680 | consumed tokens: 60177776640 | elapsed time per iteration (s): 0.11 | learning rate: 6.710E-05 | global batch size: 256 | lm loss: 4.508451E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.716 | TFLOPs: 8.69 | 7: iteration 114790/ 173500 | consumed samples: 29386240 | consumed tokens: 60183019520 | elapsed time per iteration (s): 0.11 | learning rate: 6.709E-05 | global batch size: 256 | lm loss: 4.521722E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2235.500 | TFLOPs: 8.32 | 7: iteration 114800/ 173500 | consumed samples: 29388800 | consumed tokens: 60188262400 | elapsed time per iteration (s): 0.11 | learning rate: 6.708E-05 | global batch size: 256 | lm loss: 4.517990E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2377.085 | TFLOPs: 8.84 | 7: iteration 114810/ 173500 | consumed samples: 29391360 | consumed tokens: 60193505280 | elapsed time per iteration (s): 0.09 | learning rate: 6.706E-05 | global batch size: 256 | lm loss: 4.512565E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2733.675 | TFLOPs: 10.17 | 7: iteration 114820/ 173500 | consumed samples: 29393920 | consumed tokens: 60198748160 | elapsed time per iteration (s): 0.08 | learning rate: 6.705E-05 | global batch size: 256 | lm loss: 4.511046E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.613 | TFLOPs: 11.99 | 7: iteration 114830/ 173500 | consumed samples: 29396480 | consumed tokens: 60203991040 | elapsed time per iteration (s): 0.08 | learning rate: 6.703E-05 | global batch size: 256 | lm loss: 4.527070E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.691 | TFLOPs: 11.98 | 7: iteration 114840/ 173500 | consumed samples: 29399040 | consumed tokens: 60209233920 | elapsed time per iteration (s): 0.08 | learning rate: 6.702E-05 | global batch size: 256 | lm loss: 4.525135E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.739 | TFLOPs: 11.99 | 7: iteration 114850/ 173500 | consumed samples: 29401600 | consumed tokens: 60214476800 | elapsed time per iteration (s): 0.08 | learning rate: 6.700E-05 | global batch size: 256 | lm loss: 4.519007E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.741 | TFLOPs: 12.00 | 7: iteration 114860/ 173500 | consumed samples: 29404160 | consumed tokens: 60219719680 | elapsed time per iteration (s): 0.08 | learning rate: 6.699E-05 | global batch size: 256 | lm loss: 4.508241E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.489 | TFLOPs: 11.97 | 7: iteration 114870/ 173500 | consumed samples: 29406720 | consumed tokens: 60224962560 | elapsed time per iteration (s): 0.08 | learning rate: 6.697E-05 | global batch size: 256 | lm loss: 4.519427E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.405 | TFLOPs: 11.91 | 7: iteration 114880/ 173500 | consumed samples: 29409280 | consumed tokens: 60230205440 | elapsed time per iteration (s): 0.08 | learning rate: 6.696E-05 | global batch size: 256 | lm loss: 4.531368E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.163 | TFLOPs: 11.95 | 7: iteration 114890/ 173500 | consumed samples: 29411840 | consumed tokens: 60235448320 | elapsed time per iteration (s): 0.08 | learning rate: 6.695E-05 | global batch size: 256 | lm loss: 4.516520E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.261 | TFLOPs: 11.93 | 7: iteration 114900/ 173500 | consumed samples: 29414400 | consumed tokens: 60240691200 | elapsed time per iteration (s): 0.08 | learning rate: 6.693E-05 | global batch size: 256 | lm loss: 4.519756E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.153 | TFLOPs: 11.94 | 7: iteration 114910/ 173500 | consumed samples: 29416960 | consumed tokens: 60245934080 | elapsed time per iteration (s): 0.08 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 4.512547E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.234 | TFLOPs: 11.95 | 7: iteration 114920/ 173500 | consumed samples: 29419520 | consumed tokens: 60251176960 | elapsed time per iteration (s): 0.08 | learning rate: 6.690E-05 | global batch size: 256 | lm loss: 4.514614E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.930 | TFLOPs: 11.89 | 7: iteration 114930/ 173500 | consumed samples: 29422080 | consumed tokens: 60256419840 | elapsed time per iteration (s): 0.08 | learning rate: 6.689E-05 | global batch size: 256 | lm loss: 4.520998E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.426 | TFLOPs: 11.97 | 7: iteration 114940/ 173500 | consumed samples: 29424640 | consumed tokens: 60261662720 | elapsed time per iteration (s): 0.08 | learning rate: 6.687E-05 | global batch size: 256 | lm loss: 4.515228E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.790 | TFLOPs: 11.97 | 7: iteration 114950/ 173500 | consumed samples: 29427200 | consumed tokens: 60266905600 | elapsed time per iteration (s): 0.08 | learning rate: 6.686E-05 | global batch size: 256 | lm loss: 4.517806E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.890 | TFLOPs: 11.97 | 7: iteration 114960/ 173500 | consumed samples: 29429760 | consumed tokens: 60272148480 | elapsed time per iteration (s): 0.08 | learning rate: 6.684E-05 | global batch size: 256 | lm loss: 4.529409E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.906 | TFLOPs: 11.93 | 7: iteration 114970/ 173500 | consumed samples: 29432320 | consumed tokens: 60277391360 | elapsed time per iteration (s): 0.08 | learning rate: 6.683E-05 | global batch size: 256 | lm loss: 4.518902E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.344 | TFLOPs: 11.91 | 7: iteration 114980/ 173500 | consumed samples: 29434880 | consumed tokens: 60282634240 | elapsed time per iteration (s): 0.09 | learning rate: 6.682E-05 | global batch size: 256 | lm loss: 4.501620E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.181 | TFLOPs: 11.19 | 7: iteration 114990/ 173500 | consumed samples: 29437440 | consumed tokens: 60287877120 | elapsed time per iteration (s): 0.08 | learning rate: 6.680E-05 | global batch size: 256 | lm loss: 4.507329E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.808 | TFLOPs: 11.90 | 7: iteration 115000/ 173500 | consumed samples: 29440000 | consumed tokens: 60293120000 | elapsed time per iteration (s): 0.08 | learning rate: 6.679E-05 | global batch size: 256 | lm loss: 4.516165E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.897 | TFLOPs: 11.98 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 115000 | lm loss value: 4.397906E+00 | lm loss PPL: 8.128051E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 115000 to checkpoints_14m91b100m 0: [2023-03-17 03:02:07,945] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step115000 is begin to save! 0: [2023-03-17 03:02:07,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:02:07,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:02:07,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:02:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:02:07,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:02:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:02:07,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:02:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:02:07,985] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:02:07,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:02:07,988] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:02:07,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:02:07,989] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step115000/mp_rank_00_model_states.pt 0: [2023-03-17 03:02:07,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:02:07,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:02:08,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:02:08,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:02:08,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:02:08,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:02:08,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:02:08,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2023-03-17 03:02:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2023-03-17 03:02:08,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:02:08,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:02:08,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: successfully saved checkpoint at iteration 115000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 88.71 7: iteration 115010/ 173500 | consumed samples: 29442560 | consumed tokens: 60298362880 | elapsed time per iteration (s): 0.09 | learning rate: 6.677E-05 | global batch size: 256 | lm loss: 4.510047E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2791.549 | TFLOPs: 10.38 | 7: iteration 115020/ 173500 | consumed samples: 29445120 | consumed tokens: 60303605760 | elapsed time per iteration (s): 0.08 | learning rate: 6.676E-05 | global batch size: 256 | lm loss: 4.522720E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.832 | TFLOPs: 11.95 | 7: iteration 115030/ 173500 | consumed samples: 29447680 | consumed tokens: 60308848640 | elapsed time per iteration (s): 0.12 | learning rate: 6.674E-05 | global batch size: 256 | lm loss: 4.510990E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2146.541 | TFLOPs: 7.98 | 7: iteration 115040/ 173500 | consumed samples: 29450240 | consumed tokens: 60314091520 | elapsed time per iteration (s): 0.08 | learning rate: 6.673E-05 | global batch size: 256 | lm loss: 4.509794E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.956 | TFLOPs: 11.72 | 7: iteration 115050/ 173500 | consumed samples: 29452800 | consumed tokens: 60319334400 | elapsed time per iteration (s): 0.10 | learning rate: 6.671E-05 | global batch size: 256 | lm loss: 4.519836E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.209 | TFLOPs: 9.82 | 7: iteration 115060/ 173500 | consumed samples: 29455360 | consumed tokens: 60324577280 | elapsed time per iteration (s): 0.10 | learning rate: 6.670E-05 | global batch size: 256 | lm loss: 4.511799E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2441.497 | TFLOPs: 9.08 | 7: iteration 115070/ 173500 | consumed samples: 29457920 | consumed tokens: 60329820160 | elapsed time per iteration (s): 0.10 | learning rate: 6.669E-05 | global batch size: 256 | lm loss: 4.505056E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.868 | TFLOPs: 9.09 | 7: iteration 115080/ 173500 | consumed samples: 29460480 | consumed tokens: 60335063040 | elapsed time per iteration (s): 0.10 | learning rate: 6.667E-05 | global batch size: 256 | lm loss: 4.509037E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.597 | TFLOPs: 9.33 | 7: iteration 115090/ 173500 | consumed samples: 29463040 | consumed tokens: 60340305920 | elapsed time per iteration (s): 0.09 | learning rate: 6.666E-05 | global batch size: 256 | lm loss: 4.512912E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.311 | TFLOPs: 10.25 | 7: iteration 115100/ 173500 | consumed samples: 29465600 | consumed tokens: 60345548800 | elapsed time per iteration (s): 0.08 | learning rate: 6.664E-05 | global batch size: 256 | lm loss: 4.514358E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.650 | TFLOPs: 11.99 | 7: iteration 115110/ 173500 | consumed samples: 29468160 | consumed tokens: 60350791680 | elapsed time per iteration (s): 0.08 | learning rate: 6.663E-05 | global batch size: 256 | lm loss: 4.510558E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.305 | TFLOPs: 11.77 | 7: iteration 115120/ 173500 | consumed samples: 29470720 | consumed tokens: 60356034560 | elapsed time per iteration (s): 0.08 | learning rate: 6.661E-05 | global batch size: 256 | lm loss: 4.532789E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.455 | TFLOPs: 11.83 | 7: iteration 115130/ 173500 | consumed samples: 29473280 | consumed tokens: 60361277440 | elapsed time per iteration (s): 0.08 | learning rate: 6.660E-05 | global batch size: 256 | lm loss: 4.509070E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.568 | TFLOPs: 11.83 | 7: iteration 115140/ 173500 | consumed samples: 29475840 | consumed tokens: 60366520320 | elapsed time per iteration (s): 0.08 | learning rate: 6.658E-05 | global batch size: 256 | lm loss: 4.501443E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.525 | TFLOPs: 11.86 | 7: iteration 115150/ 173500 | consumed samples: 29478400 | consumed tokens: 60371763200 | elapsed time per iteration (s): 0.08 | learning rate: 6.657E-05 | global batch size: 256 | lm loss: 4.520506E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.595 | TFLOPs: 11.59 | 7: iteration 115160/ 173500 | consumed samples: 29480960 | consumed tokens: 60377006080 | elapsed time per iteration (s): 0.08 | learning rate: 6.656E-05 | global batch size: 256 | lm loss: 4.506603E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.762 | TFLOPs: 11.79 | 7: iteration 115170/ 173500 | consumed samples: 29483520 | consumed tokens: 60382248960 | elapsed time per iteration (s): 0.08 | learning rate: 6.654E-05 | global batch size: 256 | lm loss: 4.509982E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.400 | TFLOPs: 11.93 | 7: iteration 115180/ 173500 | consumed samples: 29486080 | consumed tokens: 60387491840 | elapsed time per iteration (s): 0.08 | learning rate: 6.653E-05 | global batch size: 256 | lm loss: 4.526588E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.797 | TFLOPs: 11.98 | 7: iteration 115190/ 173500 | consumed samples: 29488640 | consumed tokens: 60392734720 | elapsed time per iteration (s): 0.08 | learning rate: 6.651E-05 | global batch size: 256 | lm loss: 4.520102E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.883 | TFLOPs: 11.26 | 7: iteration 115200/ 173500 | consumed samples: 29491200 | consumed tokens: 60397977600 | elapsed time per iteration (s): 0.08 | learning rate: 6.650E-05 | global batch size: 256 | lm loss: 4.524575E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.123 | TFLOPs: 11.97 | 7: iteration 115210/ 173500 | consumed samples: 29493760 | consumed tokens: 60403220480 | elapsed time per iteration (s): 0.08 | learning rate: 6.648E-05 | global batch size: 256 | lm loss: 4.517754E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.983 | TFLOPs: 11.21 | 7: iteration 115220/ 173500 | consumed samples: 29496320 | consumed tokens: 60408463360 | elapsed time per iteration (s): 0.08 | learning rate: 6.647E-05 | global batch size: 256 | lm loss: 4.502886E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.748 | TFLOPs: 11.97 | 7: iteration 115230/ 173500 | consumed samples: 29498880 | consumed tokens: 60413706240 | elapsed time per iteration (s): 0.08 | learning rate: 6.646E-05 | global batch size: 256 | lm loss: 4.503650E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.315 | TFLOPs: 11.65 | 7: iteration 115240/ 173500 | consumed samples: 29501440 | consumed tokens: 60418949120 | elapsed time per iteration (s): 0.09 | learning rate: 6.644E-05 | global batch size: 256 | lm loss: 4.508706E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2983.153 | TFLOPs: 11.10 | 7: iteration 115250/ 173500 | consumed samples: 29504000 | consumed tokens: 60424192000 | elapsed time per iteration (s): 0.09 | learning rate: 6.643E-05 | global batch size: 256 | lm loss: 4.524915E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.844 | TFLOPs: 11.19 | 7: iteration 115260/ 173500 | consumed samples: 29506560 | consumed tokens: 60429434880 | elapsed time per iteration (s): 0.08 | learning rate: 6.641E-05 | global batch size: 256 | lm loss: 4.530538E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.266 | TFLOPs: 11.56 | 7: iteration 115270/ 173500 | consumed samples: 29509120 | consumed tokens: 60434677760 | elapsed time per iteration (s): 0.08 | learning rate: 6.640E-05 | global batch size: 256 | lm loss: 4.525298E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.540 | TFLOPs: 11.95 | 7: iteration 115280/ 173500 | consumed samples: 29511680 | consumed tokens: 60439920640 | elapsed time per iteration (s): 0.08 | learning rate: 6.638E-05 | global batch size: 256 | lm loss: 4.522161E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.726 | TFLOPs: 11.99 | 7: iteration 115290/ 173500 | consumed samples: 29514240 | consumed tokens: 60445163520 | elapsed time per iteration (s): 0.08 | learning rate: 6.637E-05 | global batch size: 256 | lm loss: 4.510007E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.173 | TFLOPs: 11.94 | 7: iteration 115300/ 173500 | consumed samples: 29516800 | consumed tokens: 60450406400 | elapsed time per iteration (s): 0.09 | learning rate: 6.635E-05 | global batch size: 256 | lm loss: 4.517588E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2767.005 | TFLOPs: 10.29 | 7: iteration 115310/ 173500 | consumed samples: 29519360 | consumed tokens: 60455649280 | elapsed time per iteration (s): 0.09 | learning rate: 6.634E-05 | global batch size: 256 | lm loss: 4.532677E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2812.864 | TFLOPs: 10.46 | 7: iteration 115320/ 173500 | consumed samples: 29521920 | consumed tokens: 60460892160 | elapsed time per iteration (s): 0.08 | learning rate: 6.633E-05 | global batch size: 256 | lm loss: 4.510452E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.148 | TFLOPs: 11.26 | 7: iteration 115330/ 173500 | consumed samples: 29524480 | consumed tokens: 60466135040 | elapsed time per iteration (s): 0.09 | learning rate: 6.631E-05 | global batch size: 256 | lm loss: 4.519157E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.246 | TFLOPs: 10.72 | 7: iteration 115340/ 173500 | consumed samples: 29527040 | consumed tokens: 60471377920 | elapsed time per iteration (s): 0.10 | learning rate: 6.630E-05 | global batch size: 256 | lm loss: 4.508294E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.694 | TFLOPs: 9.52 | 7: iteration 115350/ 173500 | consumed samples: 29529600 | consumed tokens: 60476620800 | elapsed time per iteration (s): 0.10 | learning rate: 6.628E-05 | global batch size: 256 | lm loss: 4.518456E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2663.669 | TFLOPs: 9.91 | 7: iteration 115360/ 173500 | consumed samples: 29532160 | consumed tokens: 60481863680 | elapsed time per iteration (s): 0.08 | learning rate: 6.627E-05 | global batch size: 256 | lm loss: 4.529826E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.682 | TFLOPs: 11.94 | 7: iteration 115370/ 173500 | consumed samples: 29534720 | consumed tokens: 60487106560 | elapsed time per iteration (s): 0.08 | learning rate: 6.625E-05 | global batch size: 256 | lm loss: 4.517734E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.840 | TFLOPs: 11.94 | 7: iteration 115380/ 173500 | consumed samples: 29537280 | consumed tokens: 60492349440 | elapsed time per iteration (s): 0.08 | learning rate: 6.624E-05 | global batch size: 256 | lm loss: 4.519851E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.578 | TFLOPs: 12.02 | 7: iteration 115390/ 173500 | consumed samples: 29539840 | consumed tokens: 60497592320 | elapsed time per iteration (s): 0.08 | learning rate: 6.622E-05 | global batch size: 256 | lm loss: 4.505783E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.907 | TFLOPs: 12.02 | 7: iteration 115400/ 173500 | consumed samples: 29542400 | consumed tokens: 60502835200 | elapsed time per iteration (s): 0.08 | learning rate: 6.621E-05 | global batch size: 256 | lm loss: 4.496672E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.524 | TFLOPs: 12.01 | 7: iteration 115410/ 173500 | consumed samples: 29544960 | consumed tokens: 60508078080 | elapsed time per iteration (s): 0.08 | learning rate: 6.620E-05 | global batch size: 256 | lm loss: 4.519624E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.593 | TFLOPs: 11.98 | 7: iteration 115420/ 173500 | consumed samples: 29547520 | consumed tokens: 60513320960 | elapsed time per iteration (s): 0.09 | learning rate: 6.618E-05 | global batch size: 256 | lm loss: 4.521132E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.039 | TFLOPs: 10.10 | 7: iteration 115430/ 173500 | consumed samples: 29550080 | consumed tokens: 60518563840 | elapsed time per iteration (s): 0.08 | learning rate: 6.617E-05 | global batch size: 256 | lm loss: 4.516062E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.284 | TFLOPs: 11.34 | 7: iteration 115440/ 173500 | consumed samples: 29552640 | consumed tokens: 60523806720 | elapsed time per iteration (s): 0.11 | learning rate: 6.615E-05 | global batch size: 256 | lm loss: 4.507649E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2348.552 | TFLOPs: 8.74 | 7: iteration 115450/ 173500 | consumed samples: 29555200 | consumed tokens: 60529049600 | elapsed time per iteration (s): 0.11 | learning rate: 6.614E-05 | global batch size: 256 | lm loss: 4.521863E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2348.558 | TFLOPs: 8.74 | 7: iteration 115460/ 173500 | consumed samples: 29557760 | consumed tokens: 60534292480 | elapsed time per iteration (s): 0.11 | learning rate: 6.612E-05 | global batch size: 256 | lm loss: 4.503559E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.782 | TFLOPs: 8.62 | 7: iteration 115470/ 173500 | consumed samples: 29560320 | consumed tokens: 60539535360 | elapsed time per iteration (s): 0.13 | learning rate: 6.611E-05 | global batch size: 256 | lm loss: 4.520043E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1978.110 | TFLOPs: 7.36 | 7: iteration 115480/ 173500 | consumed samples: 29562880 | consumed tokens: 60544778240 | elapsed time per iteration (s): 0.11 | learning rate: 6.610E-05 | global batch size: 256 | lm loss: 4.510730E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2265.906 | TFLOPs: 8.43 | 7: iteration 115490/ 173500 | consumed samples: 29565440 | consumed tokens: 60550021120 | elapsed time per iteration (s): 0.11 | learning rate: 6.608E-05 | global batch size: 256 | lm loss: 4.512723E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.239 | TFLOPs: 8.85 | 7: iteration 115500/ 173500 | consumed samples: 29568000 | consumed tokens: 60555264000 | elapsed time per iteration (s): 0.11 | learning rate: 6.607E-05 | global batch size: 256 | lm loss: 4.497847E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2322.480 | TFLOPs: 8.64 | 7: iteration 115510/ 173500 | consumed samples: 29570560 | consumed tokens: 60560506880 | elapsed time per iteration (s): 0.13 | learning rate: 6.605E-05 | global batch size: 256 | lm loss: 4.514617E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2003.550 | TFLOPs: 7.45 | 7: iteration 115520/ 173500 | consumed samples: 29573120 | consumed tokens: 60565749760 | elapsed time per iteration (s): 0.11 | learning rate: 6.604E-05 | global batch size: 256 | lm loss: 4.516681E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.764 | TFLOPs: 9.05 | 7: iteration 115530/ 173500 | consumed samples: 29575680 | consumed tokens: 60570992640 | elapsed time per iteration (s): 0.10 | learning rate: 6.602E-05 | global batch size: 256 | lm loss: 4.525211E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2569.405 | TFLOPs: 9.56 | 7: iteration 115540/ 173500 | consumed samples: 29578240 | consumed tokens: 60576235520 | elapsed time per iteration (s): 0.11 | learning rate: 6.601E-05 | global batch size: 256 | lm loss: 4.520383E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.055 | TFLOPs: 8.88 | 7: iteration 115550/ 173500 | consumed samples: 29580800 | consumed tokens: 60581478400 | elapsed time per iteration (s): 0.11 | learning rate: 6.599E-05 | global batch size: 256 | lm loss: 4.513871E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.579 | TFLOPs: 8.75 | 7: iteration 115560/ 173500 | consumed samples: 29583360 | consumed tokens: 60586721280 | elapsed time per iteration (s): 0.11 | learning rate: 6.598E-05 | global batch size: 256 | lm loss: 4.519202E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.450 | TFLOPs: 8.72 | 7: iteration 115570/ 173500 | consumed samples: 29585920 | consumed tokens: 60591964160 | elapsed time per iteration (s): 0.11 | learning rate: 6.597E-05 | global batch size: 256 | lm loss: 4.529469E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2347.571 | TFLOPs: 8.73 | 7: iteration 115580/ 173500 | consumed samples: 29588480 | consumed tokens: 60597207040 | elapsed time per iteration (s): 0.11 | learning rate: 6.595E-05 | global batch size: 256 | lm loss: 4.518148E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2341.286 | TFLOPs: 8.71 | 7: iteration 115590/ 173500 | consumed samples: 29591040 | consumed tokens: 60602449920 | elapsed time per iteration (s): 0.11 | learning rate: 6.594E-05 | global batch size: 256 | lm loss: 4.508280E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.157 | TFLOPs: 8.72 | 7: iteration 115600/ 173500 | consumed samples: 29593600 | consumed tokens: 60607692800 | elapsed time per iteration (s): 0.11 | learning rate: 6.592E-05 | global batch size: 256 | lm loss: 4.503521E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.980 | TFLOPs: 8.63 | 7: iteration 115610/ 173500 | consumed samples: 29596160 | consumed tokens: 60612935680 | elapsed time per iteration (s): 0.11 | learning rate: 6.591E-05 | global batch size: 256 | lm loss: 4.506897E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.611 | TFLOPs: 8.59 | 7: iteration 115620/ 173500 | consumed samples: 29598720 | consumed tokens: 60618178560 | elapsed time per iteration (s): 0.11 | learning rate: 6.589E-05 | global batch size: 256 | lm loss: 4.520466E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.691 | TFLOPs: 8.62 | 7: iteration 115630/ 173500 | consumed samples: 29601280 | consumed tokens: 60623421440 | elapsed time per iteration (s): 0.11 | learning rate: 6.588E-05 | global batch size: 256 | lm loss: 4.526976E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.315 | TFLOPs: 8.82 | 7: iteration 115640/ 173500 | consumed samples: 29603840 | consumed tokens: 60628664320 | elapsed time per iteration (s): 0.10 | learning rate: 6.587E-05 | global batch size: 256 | lm loss: 4.533498E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2614.147 | TFLOPs: 9.72 | 7: iteration 115650/ 173500 | consumed samples: 29606400 | consumed tokens: 60633907200 | elapsed time per iteration (s): 0.09 | learning rate: 6.585E-05 | global batch size: 256 | lm loss: 4.523835E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2788.266 | TFLOPs: 10.37 | 7: iteration 115660/ 173500 | consumed samples: 29608960 | consumed tokens: 60639150080 | elapsed time per iteration (s): 0.08 | learning rate: 6.584E-05 | global batch size: 256 | lm loss: 4.508097E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.682 | TFLOPs: 11.54 | 7: iteration 115670/ 173500 | consumed samples: 29611520 | consumed tokens: 60644392960 | elapsed time per iteration (s): 0.09 | learning rate: 6.582E-05 | global batch size: 256 | lm loss: 4.520487E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.467 | TFLOPs: 11.13 | 7: iteration 115680/ 173500 | consumed samples: 29614080 | consumed tokens: 60649635840 | elapsed time per iteration (s): 0.08 | learning rate: 6.581E-05 | global batch size: 256 | lm loss: 4.524134E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.359 | TFLOPs: 11.89 | 7: iteration 115690/ 173500 | consumed samples: 29616640 | consumed tokens: 60654878720 | elapsed time per iteration (s): 0.08 | learning rate: 6.579E-05 | global batch size: 256 | lm loss: 4.504560E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.906 | TFLOPs: 11.93 | 7: iteration 115700/ 173500 | consumed samples: 29619200 | consumed tokens: 60660121600 | elapsed time per iteration (s): 0.10 | learning rate: 6.578E-05 | global batch size: 256 | lm loss: 4.512687E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.499 | TFLOPs: 9.30 | 7: iteration 115710/ 173500 | consumed samples: 29621760 | consumed tokens: 60665364480 | elapsed time per iteration (s): 0.09 | learning rate: 6.577E-05 | global batch size: 256 | lm loss: 4.521633E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.481 | TFLOPs: 11.12 | 7: iteration 115720/ 173500 | consumed samples: 29624320 | consumed tokens: 60670607360 | elapsed time per iteration (s): 0.08 | learning rate: 6.575E-05 | global batch size: 256 | lm loss: 4.520876E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.992 | TFLOPs: 11.22 | 7: iteration 115730/ 173500 | consumed samples: 29626880 | consumed tokens: 60675850240 | elapsed time per iteration (s): 0.08 | learning rate: 6.574E-05 | global batch size: 256 | lm loss: 4.508706E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.330 | TFLOPs: 11.58 | 7: iteration 115740/ 173500 | consumed samples: 29629440 | consumed tokens: 60681093120 | elapsed time per iteration (s): 0.09 | learning rate: 6.572E-05 | global batch size: 256 | lm loss: 4.525290E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.000 | TFLOPs: 10.47 | 7: iteration 115750/ 173500 | consumed samples: 29632000 | consumed tokens: 60686336000 | elapsed time per iteration (s): 0.08 | learning rate: 6.571E-05 | global batch size: 256 | lm loss: 4.503737E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.216 | TFLOPs: 11.94 | 7: iteration 115760/ 173500 | consumed samples: 29634560 | consumed tokens: 60691578880 | elapsed time per iteration (s): 0.09 | learning rate: 6.569E-05 | global batch size: 256 | lm loss: 4.509190E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.602 | TFLOPs: 11.05 | 7: iteration 115770/ 173500 | consumed samples: 29637120 | consumed tokens: 60696821760 | elapsed time per iteration (s): 0.09 | learning rate: 6.568E-05 | global batch size: 256 | lm loss: 4.507150E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2759.031 | TFLOPs: 10.26 | 7: iteration 115780/ 173500 | consumed samples: 29639680 | consumed tokens: 60702064640 | elapsed time per iteration (s): 0.08 | learning rate: 6.567E-05 | global batch size: 256 | lm loss: 4.511405E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.691 | TFLOPs: 11.90 | 7: iteration 115790/ 173500 | consumed samples: 29642240 | consumed tokens: 60707307520 | elapsed time per iteration (s): 0.08 | learning rate: 6.565E-05 | global batch size: 256 | lm loss: 4.521733E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.361 | TFLOPs: 12.00 | 7: iteration 115800/ 173500 | consumed samples: 29644800 | consumed tokens: 60712550400 | elapsed time per iteration (s): 0.10 | learning rate: 6.564E-05 | global batch size: 256 | lm loss: 4.514578E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2674.178 | TFLOPs: 9.95 | 7: iteration 115810/ 173500 | consumed samples: 29647360 | consumed tokens: 60717793280 | elapsed time per iteration (s): 0.08 | learning rate: 6.562E-05 | global batch size: 256 | lm loss: 4.515586E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.238 | TFLOPs: 12.02 | 7: iteration 115820/ 173500 | consumed samples: 29649920 | consumed tokens: 60723036160 | elapsed time per iteration (s): 0.08 | learning rate: 6.561E-05 | global batch size: 256 | lm loss: 4.519759E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.405 | TFLOPs: 12.01 | 7: iteration 115830/ 173500 | consumed samples: 29652480 | consumed tokens: 60728279040 | elapsed time per iteration (s): 0.08 | learning rate: 6.559E-05 | global batch size: 256 | lm loss: 4.521524E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.868 | TFLOPs: 12.01 | 7: iteration 115840/ 173500 | consumed samples: 29655040 | consumed tokens: 60733521920 | elapsed time per iteration (s): 0.08 | learning rate: 6.558E-05 | global batch size: 256 | lm loss: 4.527597E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.944 | TFLOPs: 11.34 | 7: iteration 115850/ 173500 | consumed samples: 29657600 | consumed tokens: 60738764800 | elapsed time per iteration (s): 0.08 | learning rate: 6.556E-05 | global batch size: 256 | lm loss: 4.515354E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.167 | TFLOPs: 11.73 | 7: iteration 115860/ 173500 | consumed samples: 29660160 | consumed tokens: 60744007680 | elapsed time per iteration (s): 0.09 | learning rate: 6.555E-05 | global batch size: 256 | lm loss: 4.514551E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.364 | TFLOPs: 10.76 | 7: iteration 115870/ 173500 | consumed samples: 29662720 | consumed tokens: 60749250560 | elapsed time per iteration (s): 0.08 | learning rate: 6.554E-05 | global batch size: 256 | lm loss: 4.526555E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.072 | TFLOPs: 11.94 | 7: iteration 115880/ 173500 | consumed samples: 29665280 | consumed tokens: 60754493440 | elapsed time per iteration (s): 0.08 | learning rate: 6.552E-05 | global batch size: 256 | lm loss: 4.515240E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.173 | TFLOPs: 11.95 | 7: iteration 115890/ 173500 | consumed samples: 29667840 | consumed tokens: 60759736320 | elapsed time per iteration (s): 0.08 | learning rate: 6.551E-05 | global batch size: 256 | lm loss: 4.508385E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.855 | TFLOPs: 12.05 | 7: iteration 115900/ 173500 | consumed samples: 29670400 | consumed tokens: 60764979200 | elapsed time per iteration (s): 0.08 | learning rate: 6.549E-05 | global batch size: 256 | lm loss: 4.510533E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.415 | TFLOPs: 12.05 | 7: iteration 115910/ 173500 | consumed samples: 29672960 | consumed tokens: 60770222080 | elapsed time per iteration (s): 0.08 | learning rate: 6.548E-05 | global batch size: 256 | lm loss: 4.513209E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.826 | TFLOPs: 11.94 | 7: iteration 115920/ 173500 | consumed samples: 29675520 | consumed tokens: 60775464960 | elapsed time per iteration (s): 0.09 | learning rate: 6.546E-05 | global batch size: 256 | lm loss: 4.526394E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.786 | TFLOPs: 11.01 | 7: iteration 115930/ 173500 | consumed samples: 29678080 | consumed tokens: 60780707840 | elapsed time per iteration (s): 0.08 | learning rate: 6.545E-05 | global batch size: 256 | lm loss: 4.515446E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.076 | TFLOPs: 11.65 | 7: iteration 115940/ 173500 | consumed samples: 29680640 | consumed tokens: 60785950720 | elapsed time per iteration (s): 0.08 | learning rate: 6.544E-05 | global batch size: 256 | lm loss: 4.510049E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.810 | TFLOPs: 11.96 | 7: iteration 115950/ 173500 | consumed samples: 29683200 | consumed tokens: 60791193600 | elapsed time per iteration (s): 0.10 | learning rate: 6.542E-05 | global batch size: 256 | lm loss: 4.490177E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.080 | TFLOPs: 9.73 | 7: iteration 115960/ 173500 | consumed samples: 29685760 | consumed tokens: 60796436480 | elapsed time per iteration (s): 0.08 | learning rate: 6.541E-05 | global batch size: 256 | lm loss: 4.513255E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.353 | TFLOPs: 11.97 | 7: iteration 115970/ 173500 | consumed samples: 29688320 | consumed tokens: 60801679360 | elapsed time per iteration (s): 0.08 | learning rate: 6.539E-05 | global batch size: 256 | lm loss: 4.502570E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.179 | TFLOPs: 11.87 | 7: iteration 115980/ 173500 | consumed samples: 29690880 | consumed tokens: 60806922240 | elapsed time per iteration (s): 0.08 | learning rate: 6.538E-05 | global batch size: 256 | lm loss: 4.505906E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.610 | TFLOPs: 11.87 | 7: iteration 115990/ 173500 | consumed samples: 29693440 | consumed tokens: 60812165120 | elapsed time per iteration (s): 0.10 | learning rate: 6.536E-05 | global batch size: 256 | lm loss: 4.523083E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2670.445 | TFLOPs: 9.93 | 0: [2023-03-17 03:03:38,457] [INFO] [logging.py:68:log_dist] [Rank 0] step=116000, skipped=0, lr=[6.535024808618106e-05, 6.535024808618106e-05, 6.535024808618106e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 116000/ 173500 | consumed samples: 29696000 | consumed tokens: 60817408000 | elapsed time per iteration (s): 0.09 | learning rate: 6.535E-05 | global batch size: 256 | lm loss: 4.525507E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.135 | TFLOPs: 11.14 | 0: steps: 116000 loss: 4.4810 iter time (s): 0.090 samples/sec: 2841.011 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 116000 | lm loss value: 4.387317E+00 | lm loss PPL: 8.042437E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 116000 to checkpoints_14m91b100m 0: [2023-03-17 03:03:38,515] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step116000 is begin to save! 0: [2023-03-17 03:03:38,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:03:38,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:03:38,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:03:38,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:03:38,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:03:38,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:03:38,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:03:38,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:03:38,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:03:38,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:03:38,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:03:38,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:03:38,564] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step116000/mp_rank_00_model_states.pt 0: [2023-03-17 03:03:38,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:03:38,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:03:38,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:03:38,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:03:38,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 5: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 6: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 3: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 2: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 7: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 4: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:03:38,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:03:38,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:03:38,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 1: [2023-03-17 03:03:38,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step116000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:03:38,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step116000 is ready now! 0: successfully saved checkpoint at iteration 116000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 86.45 7: iteration 116010/ 173500 | consumed samples: 29698560 | consumed tokens: 60822650880 | elapsed time per iteration (s): 0.09 | learning rate: 6.534E-05 | global batch size: 256 | lm loss: 4.505318E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.322 | TFLOPs: 10.23 | 7: iteration 116020/ 173500 | consumed samples: 29701120 | consumed tokens: 60827893760 | elapsed time per iteration (s): 0.08 | learning rate: 6.532E-05 | global batch size: 256 | lm loss: 4.506396E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.256 | TFLOPs: 12.03 | 7: iteration 116030/ 173500 | consumed samples: 29703680 | consumed tokens: 60833136640 | elapsed time per iteration (s): 0.10 | learning rate: 6.531E-05 | global batch size: 256 | lm loss: 4.515448E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2484.464 | TFLOPs: 9.24 | 7: iteration 116040/ 173500 | consumed samples: 29706240 | consumed tokens: 60838379520 | elapsed time per iteration (s): 0.09 | learning rate: 6.529E-05 | global batch size: 256 | lm loss: 4.513224E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.986 | TFLOPs: 10.92 | 7: iteration 116050/ 173500 | consumed samples: 29708800 | consumed tokens: 60843622400 | elapsed time per iteration (s): 0.08 | learning rate: 6.528E-05 | global batch size: 256 | lm loss: 4.513577E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.882 | TFLOPs: 11.77 | 7: iteration 116060/ 173500 | consumed samples: 29711360 | consumed tokens: 60848865280 | elapsed time per iteration (s): 0.08 | learning rate: 6.526E-05 | global batch size: 256 | lm loss: 4.503180E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.368 | TFLOPs: 11.99 | 7: iteration 116070/ 173500 | consumed samples: 29713920 | consumed tokens: 60854108160 | elapsed time per iteration (s): 0.08 | learning rate: 6.525E-05 | global batch size: 256 | lm loss: 4.524933E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.740 | TFLOPs: 11.88 | 7: iteration 116080/ 173500 | consumed samples: 29716480 | consumed tokens: 60859351040 | elapsed time per iteration (s): 0.08 | learning rate: 6.524E-05 | global batch size: 256 | lm loss: 4.512308E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.216 | TFLOPs: 11.92 | 7: iteration 116090/ 173500 | consumed samples: 29719040 | consumed tokens: 60864593920 | elapsed time per iteration (s): 0.08 | learning rate: 6.522E-05 | global batch size: 256 | lm loss: 4.513091E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.564 | TFLOPs: 11.79 | 7: iteration 116100/ 173500 | consumed samples: 29721600 | consumed tokens: 60869836800 | elapsed time per iteration (s): 0.08 | learning rate: 6.521E-05 | global batch size: 256 | lm loss: 4.526455E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.476 | TFLOPs: 12.03 | 7: iteration 116110/ 173500 | consumed samples: 29724160 | consumed tokens: 60875079680 | elapsed time per iteration (s): 0.08 | learning rate: 6.519E-05 | global batch size: 256 | lm loss: 4.510677E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.458 | TFLOPs: 12.00 | 7: iteration 116120/ 173500 | consumed samples: 29726720 | consumed tokens: 60880322560 | elapsed time per iteration (s): 0.10 | learning rate: 6.518E-05 | global batch size: 256 | lm loss: 4.522097E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.926 | TFLOPs: 9.27 | 7: iteration 116130/ 173500 | consumed samples: 29729280 | consumed tokens: 60885565440 | elapsed time per iteration (s): 0.09 | learning rate: 6.516E-05 | global batch size: 256 | lm loss: 4.516933E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.393 | TFLOPs: 10.35 | 7: iteration 116140/ 173500 | consumed samples: 29731840 | consumed tokens: 60890808320 | elapsed time per iteration (s): 0.10 | learning rate: 6.515E-05 | global batch size: 256 | lm loss: 4.504761E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2598.643 | TFLOPs: 9.67 | 7: iteration 116150/ 173500 | consumed samples: 29734400 | consumed tokens: 60896051200 | elapsed time per iteration (s): 0.08 | learning rate: 6.514E-05 | global batch size: 256 | lm loss: 4.515485E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.560 | TFLOPs: 11.83 | 7: iteration 116160/ 173500 | consumed samples: 29736960 | consumed tokens: 60901294080 | elapsed time per iteration (s): 0.08 | learning rate: 6.512E-05 | global batch size: 256 | lm loss: 4.515028E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.464 | TFLOPs: 11.31 | 7: iteration 116170/ 173500 | consumed samples: 29739520 | consumed tokens: 60906536960 | elapsed time per iteration (s): 0.08 | learning rate: 6.511E-05 | global batch size: 256 | lm loss: 4.523724E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.128 | TFLOPs: 11.37 | 7: iteration 116180/ 173500 | consumed samples: 29742080 | consumed tokens: 60911779840 | elapsed time per iteration (s): 0.09 | learning rate: 6.509E-05 | global batch size: 256 | lm loss: 4.499488E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2938.324 | TFLOPs: 10.93 | 7: iteration 116190/ 173500 | consumed samples: 29744640 | consumed tokens: 60917022720 | elapsed time per iteration (s): 0.08 | learning rate: 6.508E-05 | global batch size: 256 | lm loss: 4.526548E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.072 | TFLOPs: 11.55 | 7: iteration 116200/ 173500 | consumed samples: 29747200 | consumed tokens: 60922265600 | elapsed time per iteration (s): 0.10 | learning rate: 6.506E-05 | global batch size: 256 | lm loss: 4.517982E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.765 | TFLOPs: 9.16 | 7: iteration 116210/ 173500 | consumed samples: 29749760 | consumed tokens: 60927508480 | elapsed time per iteration (s): 0.10 | learning rate: 6.505E-05 | global batch size: 256 | lm loss: 4.500249E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.691 | TFLOPs: 9.16 | 7: iteration 116220/ 173500 | consumed samples: 29752320 | consumed tokens: 60932751360 | elapsed time per iteration (s): 0.10 | learning rate: 6.504E-05 | global batch size: 256 | lm loss: 4.506068E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.953 | TFLOPs: 9.19 | 7: iteration 116230/ 173500 | consumed samples: 29754880 | consumed tokens: 60937994240 | elapsed time per iteration (s): 0.09 | learning rate: 6.502E-05 | global batch size: 256 | lm loss: 4.503833E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2976.806 | TFLOPs: 11.07 | 7: iteration 116240/ 173500 | consumed samples: 29757440 | consumed tokens: 60943237120 | elapsed time per iteration (s): 0.10 | learning rate: 6.501E-05 | global batch size: 256 | lm loss: 4.502307E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.747 | TFLOPs: 9.30 | 7: iteration 116250/ 173500 | consumed samples: 29760000 | consumed tokens: 60948480000 | elapsed time per iteration (s): 0.10 | learning rate: 6.499E-05 | global batch size: 256 | lm loss: 4.504859E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2463.202 | TFLOPs: 9.16 | 7: iteration 116260/ 173500 | consumed samples: 29762560 | consumed tokens: 60953722880 | elapsed time per iteration (s): 0.09 | learning rate: 6.498E-05 | global batch size: 256 | lm loss: 4.512548E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.419 | TFLOPs: 10.55 | 7: iteration 116270/ 173500 | consumed samples: 29765120 | consumed tokens: 60958965760 | elapsed time per iteration (s): 0.09 | learning rate: 6.496E-05 | global batch size: 256 | lm loss: 4.513425E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.304 | TFLOPs: 11.00 | 7: iteration 116280/ 173500 | consumed samples: 29767680 | consumed tokens: 60964208640 | elapsed time per iteration (s): 0.09 | learning rate: 6.495E-05 | global batch size: 256 | lm loss: 4.516672E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.193 | TFLOPs: 10.63 | 7: iteration 116290/ 173500 | consumed samples: 29770240 | consumed tokens: 60969451520 | elapsed time per iteration (s): 0.08 | learning rate: 6.494E-05 | global batch size: 256 | lm loss: 4.526284E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.172 | TFLOPs: 11.51 | 7: iteration 116300/ 173500 | consumed samples: 29772800 | consumed tokens: 60974694400 | elapsed time per iteration (s): 0.08 | learning rate: 6.492E-05 | global batch size: 256 | lm loss: 4.507829E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.537 | TFLOPs: 11.88 | 7: iteration 116310/ 173500 | consumed samples: 29775360 | consumed tokens: 60979937280 | elapsed time per iteration (s): 0.08 | learning rate: 6.491E-05 | global batch size: 256 | lm loss: 4.518792E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.272 | TFLOPs: 11.41 | 7: iteration 116320/ 173500 | consumed samples: 29777920 | consumed tokens: 60985180160 | elapsed time per iteration (s): 0.08 | learning rate: 6.489E-05 | global batch size: 256 | lm loss: 4.502586E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.854 | TFLOPs: 11.86 | 7: iteration 116330/ 173500 | consumed samples: 29780480 | consumed tokens: 60990423040 | elapsed time per iteration (s): 0.08 | learning rate: 6.488E-05 | global batch size: 256 | lm loss: 4.515582E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.436 | TFLOPs: 11.93 | 7: iteration 116340/ 173500 | consumed samples: 29783040 | consumed tokens: 60995665920 | elapsed time per iteration (s): 0.09 | learning rate: 6.487E-05 | global batch size: 256 | lm loss: 4.529756E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.439 | TFLOPs: 10.88 | 7: iteration 116350/ 173500 | consumed samples: 29785600 | consumed tokens: 61000908800 | elapsed time per iteration (s): 0.08 | learning rate: 6.485E-05 | global batch size: 256 | lm loss: 4.524308E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.962 | TFLOPs: 11.44 | 7: iteration 116360/ 173500 | consumed samples: 29788160 | consumed tokens: 61006151680 | elapsed time per iteration (s): 0.08 | learning rate: 6.484E-05 | global batch size: 256 | lm loss: 4.511298E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.747 | TFLOPs: 11.90 | 7: iteration 116370/ 173500 | consumed samples: 29790720 | consumed tokens: 61011394560 | elapsed time per iteration (s): 0.09 | learning rate: 6.482E-05 | global batch size: 256 | lm loss: 4.517075E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.285 | TFLOPs: 10.21 | 7: iteration 116380/ 173500 | consumed samples: 29793280 | consumed tokens: 61016637440 | elapsed time per iteration (s): 0.10 | learning rate: 6.481E-05 | global batch size: 256 | lm loss: 4.513718E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2540.190 | TFLOPs: 9.45 | 7: iteration 116390/ 173500 | consumed samples: 29795840 | consumed tokens: 61021880320 | elapsed time per iteration (s): 0.10 | learning rate: 6.479E-05 | global batch size: 256 | lm loss: 4.525157E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.771 | TFLOPs: 9.34 | 7: iteration 116400/ 173500 | consumed samples: 29798400 | consumed tokens: 61027123200 | elapsed time per iteration (s): 0.10 | learning rate: 6.478E-05 | global batch size: 256 | lm loss: 4.515958E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.000 | TFLOPs: 9.26 | 7: iteration 116410/ 173500 | consumed samples: 29800960 | consumed tokens: 61032366080 | elapsed time per iteration (s): 0.11 | learning rate: 6.477E-05 | global batch size: 256 | lm loss: 4.509171E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.765 | TFLOPs: 8.44 | 7: iteration 116420/ 173500 | consumed samples: 29803520 | consumed tokens: 61037608960 | elapsed time per iteration (s): 0.08 | learning rate: 6.475E-05 | global batch size: 256 | lm loss: 4.517688E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.536 | TFLOPs: 11.89 | 7: iteration 116430/ 173500 | consumed samples: 29806080 | consumed tokens: 61042851840 | elapsed time per iteration (s): 0.11 | learning rate: 6.474E-05 | global batch size: 256 | lm loss: 4.520863E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.565 | TFLOPs: 9.03 | 7: iteration 116440/ 173500 | consumed samples: 29808640 | consumed tokens: 61048094720 | elapsed time per iteration (s): 0.09 | learning rate: 6.472E-05 | global batch size: 256 | lm loss: 4.504929E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.081 | TFLOPs: 10.78 | 7: iteration 116450/ 173500 | consumed samples: 29811200 | consumed tokens: 61053337600 | elapsed time per iteration (s): 0.08 | learning rate: 6.471E-05 | global batch size: 256 | lm loss: 4.516443E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.809 | TFLOPs: 11.28 | 7: iteration 116460/ 173500 | consumed samples: 29813760 | consumed tokens: 61058580480 | elapsed time per iteration (s): 0.09 | learning rate: 6.469E-05 | global batch size: 256 | lm loss: 4.521069E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2934.259 | TFLOPs: 10.91 | 7: iteration 116470/ 173500 | consumed samples: 29816320 | consumed tokens: 61063823360 | elapsed time per iteration (s): 0.11 | learning rate: 6.468E-05 | global batch size: 256 | lm loss: 4.519862E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.024 | TFLOPs: 8.44 | 7: iteration 116480/ 173500 | consumed samples: 29818880 | consumed tokens: 61069066240 | elapsed time per iteration (s): 0.09 | learning rate: 6.467E-05 | global batch size: 256 | lm loss: 4.514934E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2820.265 | TFLOPs: 10.49 | 7: iteration 116490/ 173500 | consumed samples: 29821440 | consumed tokens: 61074309120 | elapsed time per iteration (s): 0.08 | learning rate: 6.465E-05 | global batch size: 256 | lm loss: 4.516484E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.500 | TFLOPs: 11.86 | 7: iteration 116500/ 173500 | consumed samples: 29824000 | consumed tokens: 61079552000 | elapsed time per iteration (s): 0.09 | learning rate: 6.464E-05 | global batch size: 256 | lm loss: 4.515787E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2906.211 | TFLOPs: 10.81 | 7: iteration 116510/ 173500 | consumed samples: 29826560 | consumed tokens: 61084794880 | elapsed time per iteration (s): 0.09 | learning rate: 6.462E-05 | global batch size: 256 | lm loss: 4.501225E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2911.727 | TFLOPs: 10.83 | 7: iteration 116520/ 173500 | consumed samples: 29829120 | consumed tokens: 61090037760 | elapsed time per iteration (s): 0.08 | learning rate: 6.461E-05 | global batch size: 256 | lm loss: 4.518014E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.396 | TFLOPs: 11.88 | 7: iteration 116530/ 173500 | consumed samples: 29831680 | consumed tokens: 61095280640 | elapsed time per iteration (s): 0.08 | learning rate: 6.459E-05 | global batch size: 256 | lm loss: 4.521481E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.056 | TFLOPs: 11.81 | 7: iteration 116540/ 173500 | consumed samples: 29834240 | consumed tokens: 61100523520 | elapsed time per iteration (s): 0.08 | learning rate: 6.458E-05 | global batch size: 256 | lm loss: 4.510656E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.145 | TFLOPs: 11.91 | 7: iteration 116550/ 173500 | consumed samples: 29836800 | consumed tokens: 61105766400 | elapsed time per iteration (s): 0.10 | learning rate: 6.457E-05 | global batch size: 256 | lm loss: 4.506322E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2629.342 | TFLOPs: 9.78 | 7: iteration 116560/ 173500 | consumed samples: 29839360 | consumed tokens: 61111009280 | elapsed time per iteration (s): 0.09 | learning rate: 6.455E-05 | global batch size: 256 | lm loss: 4.508253E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2727.073 | TFLOPs: 10.14 | 7: iteration 116570/ 173500 | consumed samples: 29841920 | consumed tokens: 61116252160 | elapsed time per iteration (s): 0.08 | learning rate: 6.454E-05 | global batch size: 256 | lm loss: 4.510036E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.161 | TFLOPs: 11.55 | 7: iteration 116580/ 173500 | consumed samples: 29844480 | consumed tokens: 61121495040 | elapsed time per iteration (s): 0.08 | learning rate: 6.452E-05 | global batch size: 256 | lm loss: 4.510338E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.979 | TFLOPs: 11.85 | 7: iteration 116590/ 173500 | consumed samples: 29847040 | consumed tokens: 61126737920 | elapsed time per iteration (s): 0.09 | learning rate: 6.451E-05 | global batch size: 256 | lm loss: 4.511731E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.264 | TFLOPs: 11.19 | 7: iteration 116600/ 173500 | consumed samples: 29849600 | consumed tokens: 61131980800 | elapsed time per iteration (s): 0.12 | learning rate: 6.450E-05 | global batch size: 256 | lm loss: 4.516249E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.375 | TFLOPs: 7.75 | 7: iteration 116610/ 173500 | consumed samples: 29852160 | consumed tokens: 61137223680 | elapsed time per iteration (s): 0.09 | learning rate: 6.448E-05 | global batch size: 256 | lm loss: 4.510489E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.394 | TFLOPs: 10.72 | 7: iteration 116620/ 173500 | consumed samples: 29854720 | consumed tokens: 61142466560 | elapsed time per iteration (s): 0.08 | learning rate: 6.447E-05 | global batch size: 256 | lm loss: 4.531731E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.192 | TFLOPs: 11.36 | 7: iteration 116630/ 173500 | consumed samples: 29857280 | consumed tokens: 61147709440 | elapsed time per iteration (s): 0.10 | learning rate: 6.445E-05 | global batch size: 256 | lm loss: 4.516211E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.195 | TFLOPs: 9.70 | 7: iteration 116640/ 173500 | consumed samples: 29859840 | consumed tokens: 61152952320 | elapsed time per iteration (s): 0.08 | learning rate: 6.444E-05 | global batch size: 256 | lm loss: 4.505025E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.380 | TFLOPs: 11.27 | 7: iteration 116650/ 173500 | consumed samples: 29862400 | consumed tokens: 61158195200 | elapsed time per iteration (s): 0.09 | learning rate: 6.442E-05 | global batch size: 256 | lm loss: 4.518615E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.037 | TFLOPs: 10.94 | 7: iteration 116660/ 173500 | consumed samples: 29864960 | consumed tokens: 61163438080 | elapsed time per iteration (s): 0.11 | learning rate: 6.441E-05 | global batch size: 256 | lm loss: 4.533487E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2266.714 | TFLOPs: 8.43 | 7: iteration 116670/ 173500 | consumed samples: 29867520 | consumed tokens: 61168680960 | elapsed time per iteration (s): 0.09 | learning rate: 6.440E-05 | global batch size: 256 | lm loss: 4.510561E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.449 | TFLOPs: 10.31 | 7: iteration 116680/ 173500 | consumed samples: 29870080 | consumed tokens: 61173923840 | elapsed time per iteration (s): 0.10 | learning rate: 6.438E-05 | global batch size: 256 | lm loss: 4.524553E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.383 | TFLOPs: 9.84 | 7: iteration 116690/ 173500 | consumed samples: 29872640 | consumed tokens: 61179166720 | elapsed time per iteration (s): 0.10 | learning rate: 6.437E-05 | global batch size: 256 | lm loss: 4.510719E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2522.737 | TFLOPs: 9.38 | 7: iteration 116700/ 173500 | consumed samples: 29875200 | consumed tokens: 61184409600 | elapsed time per iteration (s): 0.13 | learning rate: 6.435E-05 | global batch size: 256 | lm loss: 4.508712E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.258 | TFLOPs: 7.58 | 7: iteration 116710/ 173500 | consumed samples: 29877760 | consumed tokens: 61189652480 | elapsed time per iteration (s): 0.13 | learning rate: 6.434E-05 | global batch size: 256 | lm loss: 4.511958E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.766 | TFLOPs: 7.61 | 7: iteration 116720/ 173500 | consumed samples: 29880320 | consumed tokens: 61194895360 | elapsed time per iteration (s): 0.14 | learning rate: 6.433E-05 | global batch size: 256 | lm loss: 4.512247E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1793.701 | TFLOPs: 6.67 | 7: iteration 116730/ 173500 | consumed samples: 29882880 | consumed tokens: 61200138240 | elapsed time per iteration (s): 0.14 | learning rate: 6.431E-05 | global batch size: 256 | lm loss: 4.507870E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1883.279 | TFLOPs: 7.00 | 7: iteration 116740/ 173500 | consumed samples: 29885440 | consumed tokens: 61205381120 | elapsed time per iteration (s): 0.13 | learning rate: 6.430E-05 | global batch size: 256 | lm loss: 4.511903E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1962.952 | TFLOPs: 7.30 | 7: iteration 116750/ 173500 | consumed samples: 29888000 | consumed tokens: 61210624000 | elapsed time per iteration (s): 0.10 | learning rate: 6.428E-05 | global batch size: 256 | lm loss: 4.516492E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2565.608 | TFLOPs: 9.54 | 7: iteration 116760/ 173500 | consumed samples: 29890560 | consumed tokens: 61215866880 | elapsed time per iteration (s): 0.10 | learning rate: 6.427E-05 | global batch size: 256 | lm loss: 4.512112E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2563.757 | TFLOPs: 9.54 | 7: iteration 116770/ 173500 | consumed samples: 29893120 | consumed tokens: 61221109760 | elapsed time per iteration (s): 0.12 | learning rate: 6.425E-05 | global batch size: 256 | lm loss: 4.516089E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2128.494 | TFLOPs: 7.92 | 7: iteration 116780/ 173500 | consumed samples: 29895680 | consumed tokens: 61226352640 | elapsed time per iteration (s): 0.10 | learning rate: 6.424E-05 | global batch size: 256 | lm loss: 4.519848E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.569 | TFLOPs: 9.60 | 7: iteration 116790/ 173500 | consumed samples: 29898240 | consumed tokens: 61231595520 | elapsed time per iteration (s): 0.11 | learning rate: 6.423E-05 | global batch size: 256 | lm loss: 4.497960E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2425.402 | TFLOPs: 9.02 | 7: iteration 116800/ 173500 | consumed samples: 29900800 | consumed tokens: 61236838400 | elapsed time per iteration (s): 0.12 | learning rate: 6.421E-05 | global batch size: 256 | lm loss: 4.506716E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2180.073 | TFLOPs: 8.11 | 7: iteration 116810/ 173500 | consumed samples: 29903360 | consumed tokens: 61242081280 | elapsed time per iteration (s): 0.11 | learning rate: 6.420E-05 | global batch size: 256 | lm loss: 4.516814E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2276.655 | TFLOPs: 8.47 | 7: iteration 116820/ 173500 | consumed samples: 29905920 | consumed tokens: 61247324160 | elapsed time per iteration (s): 0.11 | learning rate: 6.418E-05 | global batch size: 256 | lm loss: 4.520499E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2307.677 | TFLOPs: 8.58 | 7: iteration 116830/ 173500 | consumed samples: 29908480 | consumed tokens: 61252567040 | elapsed time per iteration (s): 0.09 | learning rate: 6.417E-05 | global batch size: 256 | lm loss: 4.508314E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.104 | TFLOPs: 11.08 | 7: iteration 116840/ 173500 | consumed samples: 29911040 | consumed tokens: 61257809920 | elapsed time per iteration (s): 0.09 | learning rate: 6.415E-05 | global batch size: 256 | lm loss: 4.500566E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.286 | TFLOPs: 10.80 | 7: iteration 116850/ 173500 | consumed samples: 29913600 | consumed tokens: 61263052800 | elapsed time per iteration (s): 0.08 | learning rate: 6.414E-05 | global batch size: 256 | lm loss: 4.521477E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.972 | TFLOPs: 11.71 | 7: iteration 116860/ 173500 | consumed samples: 29916160 | consumed tokens: 61268295680 | elapsed time per iteration (s): 0.09 | learning rate: 6.413E-05 | global batch size: 256 | lm loss: 4.510394E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.875 | TFLOPs: 10.31 | 7: iteration 116870/ 173500 | consumed samples: 29918720 | consumed tokens: 61273538560 | elapsed time per iteration (s): 0.08 | learning rate: 6.411E-05 | global batch size: 256 | lm loss: 4.505814E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.613 | TFLOPs: 11.31 | 7: iteration 116880/ 173500 | consumed samples: 29921280 | consumed tokens: 61278781440 | elapsed time per iteration (s): 0.12 | learning rate: 6.410E-05 | global batch size: 256 | lm loss: 4.524490E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2106.315 | TFLOPs: 7.83 | 7: iteration 116890/ 173500 | consumed samples: 29923840 | consumed tokens: 61284024320 | elapsed time per iteration (s): 0.13 | learning rate: 6.408E-05 | global batch size: 256 | lm loss: 4.513437E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1970.848 | TFLOPs: 7.33 | 7: iteration 116900/ 173500 | consumed samples: 29926400 | consumed tokens: 61289267200 | elapsed time per iteration (s): 0.16 | learning rate: 6.407E-05 | global batch size: 256 | lm loss: 4.516323E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1608.058 | TFLOPs: 5.98 | 7: iteration 116910/ 173500 | consumed samples: 29928960 | consumed tokens: 61294510080 | elapsed time per iteration (s): 0.14 | learning rate: 6.406E-05 | global batch size: 256 | lm loss: 4.507675E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1883.278 | TFLOPs: 7.00 | 7: iteration 116920/ 173500 | consumed samples: 29931520 | consumed tokens: 61299752960 | elapsed time per iteration (s): 0.14 | learning rate: 6.404E-05 | global batch size: 256 | lm loss: 4.505940E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1891.393 | TFLOPs: 7.04 | 7: iteration 116930/ 173500 | consumed samples: 29934080 | consumed tokens: 61304995840 | elapsed time per iteration (s): 0.09 | learning rate: 6.403E-05 | global batch size: 256 | lm loss: 4.519994E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.343 | TFLOPs: 10.55 | 7: iteration 116940/ 173500 | consumed samples: 29936640 | consumed tokens: 61310238720 | elapsed time per iteration (s): 0.12 | learning rate: 6.401E-05 | global batch size: 256 | lm loss: 4.507775E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2145.886 | TFLOPs: 7.98 | 7: iteration 116950/ 173500 | consumed samples: 29939200 | consumed tokens: 61315481600 | elapsed time per iteration (s): 0.13 | learning rate: 6.400E-05 | global batch size: 256 | lm loss: 4.523927E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2007.864 | TFLOPs: 7.47 | 7: iteration 116960/ 173500 | consumed samples: 29941760 | consumed tokens: 61320724480 | elapsed time per iteration (s): 0.09 | learning rate: 6.399E-05 | global batch size: 256 | lm loss: 4.504533E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2849.869 | TFLOPs: 10.60 | 7: iteration 116970/ 173500 | consumed samples: 29944320 | consumed tokens: 61325967360 | elapsed time per iteration (s): 0.08 | learning rate: 6.397E-05 | global batch size: 256 | lm loss: 4.519165E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.250 | TFLOPs: 11.85 | 7: iteration 116980/ 173500 | consumed samples: 29946880 | consumed tokens: 61331210240 | elapsed time per iteration (s): 0.08 | learning rate: 6.396E-05 | global batch size: 256 | lm loss: 4.521690E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.927 | TFLOPs: 11.39 | 7: iteration 116990/ 173500 | consumed samples: 29949440 | consumed tokens: 61336453120 | elapsed time per iteration (s): 0.10 | learning rate: 6.394E-05 | global batch size: 256 | lm loss: 4.508319E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.983 | TFLOPs: 9.48 | 7: iteration 117000/ 173500 | consumed samples: 29952000 | consumed tokens: 61341696000 | elapsed time per iteration (s): 0.11 | learning rate: 6.393E-05 | global batch size: 256 | lm loss: 4.521995E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2369.069 | TFLOPs: 8.81 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 117000 | lm loss value: 4.433918E+00 | lm loss PPL: 8.426091E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 117000 to checkpoints_14m91b100m 0: [2023-03-17 03:05:15,149] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step117000 is begin to save! 0: [2023-03-17 03:05:15,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:05:15,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:05:15,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:05:15,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:05:15,182] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:05:15,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:05:15,185] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:05:15,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:05:15,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:05:15,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:05:15,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:05:15,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:05:15,192] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step117000/mp_rank_00_model_states.pt 0: [2023-03-17 03:05:15,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:05:15,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:05:15,211] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:05:15,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:05:15,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 7: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 6: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 2: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 4: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:05:15,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:05:15,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:05:15,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 3: [2023-03-17 03:05:15,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:05:15,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 1: [2023-03-17 03:05:15,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:05:15,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:05:15,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 5: [2023-03-17 03:05:15,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step117000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:05:15,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step117000 is ready now! 0: successfully saved checkpoint at iteration 117000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.70 7: iteration 117010/ 173500 | consumed samples: 29954560 | consumed tokens: 61346938880 | elapsed time per iteration (s): 0.13 | learning rate: 6.391E-05 | global batch size: 256 | lm loss: 4.503835E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1900.223 | TFLOPs: 7.07 | 7: iteration 117020/ 173500 | consumed samples: 29957120 | consumed tokens: 61352181760 | elapsed time per iteration (s): 0.12 | learning rate: 6.390E-05 | global batch size: 256 | lm loss: 4.519648E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2128.547 | TFLOPs: 7.92 | 7: iteration 117030/ 173500 | consumed samples: 29959680 | consumed tokens: 61357424640 | elapsed time per iteration (s): 0.12 | learning rate: 6.389E-05 | global batch size: 256 | lm loss: 4.506484E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.023 | TFLOPs: 8.15 | 7: iteration 117040/ 173500 | consumed samples: 29962240 | consumed tokens: 61362667520 | elapsed time per iteration (s): 0.09 | learning rate: 6.387E-05 | global batch size: 256 | lm loss: 4.508305E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.604 | TFLOPs: 10.63 | 7: iteration 117050/ 173500 | consumed samples: 29964800 | consumed tokens: 61367910400 | elapsed time per iteration (s): 0.11 | learning rate: 6.386E-05 | global batch size: 256 | lm loss: 4.517996E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.272 | TFLOPs: 8.86 | 7: iteration 117060/ 173500 | consumed samples: 29967360 | consumed tokens: 61373153280 | elapsed time per iteration (s): 0.12 | learning rate: 6.384E-05 | global batch size: 256 | lm loss: 4.522496E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2123.745 | TFLOPs: 7.90 | 7: iteration 117070/ 173500 | consumed samples: 29969920 | consumed tokens: 61378396160 | elapsed time per iteration (s): 0.12 | learning rate: 6.383E-05 | global batch size: 256 | lm loss: 4.527164E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2064.451 | TFLOPs: 7.68 | 7: iteration 117080/ 173500 | consumed samples: 29972480 | consumed tokens: 61383639040 | elapsed time per iteration (s): 0.11 | learning rate: 6.382E-05 | global batch size: 256 | lm loss: 4.514228E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.699 | TFLOPs: 8.86 | 7: iteration 117090/ 173500 | consumed samples: 29975040 | consumed tokens: 61388881920 | elapsed time per iteration (s): 0.11 | learning rate: 6.380E-05 | global batch size: 256 | lm loss: 4.516022E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.032 | TFLOPs: 8.34 | 7: iteration 117100/ 173500 | consumed samples: 29977600 | consumed tokens: 61394124800 | elapsed time per iteration (s): 0.10 | learning rate: 6.379E-05 | global batch size: 256 | lm loss: 4.505976E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2528.134 | TFLOPs: 9.40 | 7: iteration 117110/ 173500 | consumed samples: 29980160 | consumed tokens: 61399367680 | elapsed time per iteration (s): 0.10 | learning rate: 6.377E-05 | global batch size: 256 | lm loss: 4.508688E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.427 | TFLOPs: 9.30 | 7: iteration 117120/ 173500 | consumed samples: 29982720 | consumed tokens: 61404610560 | elapsed time per iteration (s): 0.10 | learning rate: 6.376E-05 | global batch size: 256 | lm loss: 4.524366E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.540 | TFLOPs: 9.34 | 7: iteration 117130/ 173500 | consumed samples: 29985280 | consumed tokens: 61409853440 | elapsed time per iteration (s): 0.09 | learning rate: 6.374E-05 | global batch size: 256 | lm loss: 4.523048E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.108 | TFLOPs: 10.68 | 7: iteration 117140/ 173500 | consumed samples: 29987840 | consumed tokens: 61415096320 | elapsed time per iteration (s): 0.08 | learning rate: 6.373E-05 | global batch size: 256 | lm loss: 4.514346E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.565 | TFLOPs: 11.27 | 7: iteration 117150/ 173500 | consumed samples: 29990400 | consumed tokens: 61420339200 | elapsed time per iteration (s): 0.10 | learning rate: 6.372E-05 | global batch size: 256 | lm loss: 4.503545E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.401 | TFLOPs: 10.00 | 7: iteration 117160/ 173500 | consumed samples: 29992960 | consumed tokens: 61425582080 | elapsed time per iteration (s): 0.11 | learning rate: 6.370E-05 | global batch size: 256 | lm loss: 4.516290E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2234.219 | TFLOPs: 8.31 | 7: iteration 117170/ 173500 | consumed samples: 29995520 | consumed tokens: 61430824960 | elapsed time per iteration (s): 0.10 | learning rate: 6.369E-05 | global batch size: 256 | lm loss: 4.501821E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2613.146 | TFLOPs: 9.72 | 7: iteration 117180/ 173500 | consumed samples: 29998080 | consumed tokens: 61436067840 | elapsed time per iteration (s): 0.08 | learning rate: 6.367E-05 | global batch size: 256 | lm loss: 4.502338E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.515 | TFLOPs: 11.89 | 7: iteration 117190/ 173500 | consumed samples: 30000640 | consumed tokens: 61441310720 | elapsed time per iteration (s): 0.08 | learning rate: 6.366E-05 | global batch size: 256 | lm loss: 4.517197E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.889 | TFLOPs: 11.85 | 7: iteration 117200/ 173500 | consumed samples: 30003200 | consumed tokens: 61446553600 | elapsed time per iteration (s): 0.11 | learning rate: 6.365E-05 | global batch size: 256 | lm loss: 4.510227E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2399.275 | TFLOPs: 8.92 | 7: iteration 117210/ 173500 | consumed samples: 30005760 | consumed tokens: 61451796480 | elapsed time per iteration (s): 0.10 | learning rate: 6.363E-05 | global batch size: 256 | lm loss: 4.511072E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.158 | TFLOPs: 10.02 | 7: iteration 117220/ 173500 | consumed samples: 30008320 | consumed tokens: 61457039360 | elapsed time per iteration (s): 0.10 | learning rate: 6.362E-05 | global batch size: 256 | lm loss: 4.508189E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.181 | TFLOPs: 9.21 | 7: iteration 117230/ 173500 | consumed samples: 30010880 | consumed tokens: 61462282240 | elapsed time per iteration (s): 0.11 | learning rate: 6.360E-05 | global batch size: 256 | lm loss: 4.511639E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.139 | TFLOPs: 9.04 | 7: iteration 117240/ 173500 | consumed samples: 30013440 | consumed tokens: 61467525120 | elapsed time per iteration (s): 0.08 | learning rate: 6.359E-05 | global batch size: 256 | lm loss: 4.520083E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.660 | TFLOPs: 11.31 | 7: iteration 117250/ 173500 | consumed samples: 30016000 | consumed tokens: 61472768000 | elapsed time per iteration (s): 0.10 | learning rate: 6.358E-05 | global batch size: 256 | lm loss: 4.523341E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.385 | TFLOPs: 9.57 | 7: iteration 117260/ 173500 | consumed samples: 30018560 | consumed tokens: 61478010880 | elapsed time per iteration (s): 0.11 | learning rate: 6.356E-05 | global batch size: 256 | lm loss: 4.506102E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.734 | TFLOPs: 8.89 | 7: iteration 117270/ 173500 | consumed samples: 30021120 | consumed tokens: 61483253760 | elapsed time per iteration (s): 0.11 | learning rate: 6.355E-05 | global batch size: 256 | lm loss: 4.522258E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2326.315 | TFLOPs: 8.65 | 7: iteration 117280/ 173500 | consumed samples: 30023680 | consumed tokens: 61488496640 | elapsed time per iteration (s): 0.09 | learning rate: 6.353E-05 | global batch size: 256 | lm loss: 4.511359E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.015 | TFLOPs: 10.74 | 7: iteration 117290/ 173500 | consumed samples: 30026240 | consumed tokens: 61493739520 | elapsed time per iteration (s): 0.10 | learning rate: 6.352E-05 | global batch size: 256 | lm loss: 4.509958E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2606.383 | TFLOPs: 9.69 | 7: iteration 117300/ 173500 | consumed samples: 30028800 | consumed tokens: 61498982400 | elapsed time per iteration (s): 0.11 | learning rate: 6.351E-05 | global batch size: 256 | lm loss: 4.506141E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.087 | TFLOPs: 8.86 | 7: iteration 117310/ 173500 | consumed samples: 30031360 | consumed tokens: 61504225280 | elapsed time per iteration (s): 0.11 | learning rate: 6.349E-05 | global batch size: 256 | lm loss: 4.518480E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2295.269 | TFLOPs: 8.54 | 7: iteration 117320/ 173500 | consumed samples: 30033920 | consumed tokens: 61509468160 | elapsed time per iteration (s): 0.10 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 4.515356E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2440.358 | TFLOPs: 9.08 | 7: iteration 117330/ 173500 | consumed samples: 30036480 | consumed tokens: 61514711040 | elapsed time per iteration (s): 0.08 | learning rate: 6.346E-05 | global batch size: 256 | lm loss: 4.514385E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.016 | TFLOPs: 11.57 | 7: iteration 117340/ 173500 | consumed samples: 30039040 | consumed tokens: 61519953920 | elapsed time per iteration (s): 0.10 | learning rate: 6.345E-05 | global batch size: 256 | lm loss: 4.499128E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2472.739 | TFLOPs: 9.20 | 7: iteration 117350/ 173500 | consumed samples: 30041600 | consumed tokens: 61525196800 | elapsed time per iteration (s): 0.09 | learning rate: 6.343E-05 | global batch size: 256 | lm loss: 4.518875E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.883 | TFLOPs: 10.68 | 7: iteration 117360/ 173500 | consumed samples: 30044160 | consumed tokens: 61530439680 | elapsed time per iteration (s): 0.11 | learning rate: 6.342E-05 | global batch size: 256 | lm loss: 4.526187E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2256.180 | TFLOPs: 8.39 | 7: iteration 117370/ 173500 | consumed samples: 30046720 | consumed tokens: 61535682560 | elapsed time per iteration (s): 0.11 | learning rate: 6.341E-05 | global batch size: 256 | lm loss: 4.515938E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2422.069 | TFLOPs: 9.01 | 7: iteration 117380/ 173500 | consumed samples: 30049280 | consumed tokens: 61540925440 | elapsed time per iteration (s): 0.13 | learning rate: 6.339E-05 | global batch size: 256 | lm loss: 4.507077E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2009.897 | TFLOPs: 7.48 | 7: iteration 117390/ 173500 | consumed samples: 30051840 | consumed tokens: 61546168320 | elapsed time per iteration (s): 0.08 | learning rate: 6.338E-05 | global batch size: 256 | lm loss: 4.515935E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.206 | TFLOPs: 11.45 | 7: iteration 117400/ 173500 | consumed samples: 30054400 | consumed tokens: 61551411200 | elapsed time per iteration (s): 0.10 | learning rate: 6.336E-05 | global batch size: 256 | lm loss: 4.512637E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2474.970 | TFLOPs: 9.21 | 7: iteration 117410/ 173500 | consumed samples: 30056960 | consumed tokens: 61556654080 | elapsed time per iteration (s): 0.09 | learning rate: 6.335E-05 | global batch size: 256 | lm loss: 4.523590E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.041 | TFLOPs: 10.92 | 7: iteration 117420/ 173500 | consumed samples: 30059520 | consumed tokens: 61561896960 | elapsed time per iteration (s): 0.10 | learning rate: 6.334E-05 | global batch size: 256 | lm loss: 4.501819E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.516 | TFLOPs: 9.76 | 7: iteration 117430/ 173500 | consumed samples: 30062080 | consumed tokens: 61567139840 | elapsed time per iteration (s): 0.09 | learning rate: 6.332E-05 | global batch size: 256 | lm loss: 4.525048E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.019 | TFLOPs: 10.68 | 7: iteration 117440/ 173500 | consumed samples: 30064640 | consumed tokens: 61572382720 | elapsed time per iteration (s): 0.09 | learning rate: 6.331E-05 | global batch size: 256 | lm loss: 4.516290E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2939.166 | TFLOPs: 10.93 | 7: iteration 117450/ 173500 | consumed samples: 30067200 | consumed tokens: 61577625600 | elapsed time per iteration (s): 0.09 | learning rate: 6.329E-05 | global batch size: 256 | lm loss: 4.504613E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2836.907 | TFLOPs: 10.55 | 7: iteration 117460/ 173500 | consumed samples: 30069760 | consumed tokens: 61582868480 | elapsed time per iteration (s): 0.11 | learning rate: 6.328E-05 | global batch size: 256 | lm loss: 4.505888E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2354.021 | TFLOPs: 8.76 | 7: iteration 117470/ 173500 | consumed samples: 30072320 | consumed tokens: 61588111360 | elapsed time per iteration (s): 0.11 | learning rate: 6.327E-05 | global batch size: 256 | lm loss: 4.517641E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2364.815 | TFLOPs: 8.80 | 7: iteration 117480/ 173500 | consumed samples: 30074880 | consumed tokens: 61593354240 | elapsed time per iteration (s): 0.09 | learning rate: 6.325E-05 | global batch size: 256 | lm loss: 4.497727E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.723 | TFLOPs: 10.66 | 7: iteration 117490/ 173500 | consumed samples: 30077440 | consumed tokens: 61598597120 | elapsed time per iteration (s): 0.11 | learning rate: 6.324E-05 | global batch size: 256 | lm loss: 4.518752E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.065 | TFLOPs: 8.98 | 7: iteration 117500/ 173500 | consumed samples: 30080000 | consumed tokens: 61603840000 | elapsed time per iteration (s): 0.13 | learning rate: 6.322E-05 | global batch size: 256 | lm loss: 4.507987E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.175 | TFLOPs: 7.30 | 7: iteration 117510/ 173500 | consumed samples: 30082560 | consumed tokens: 61609082880 | elapsed time per iteration (s): 0.11 | learning rate: 6.321E-05 | global batch size: 256 | lm loss: 4.527429E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.524 | TFLOPs: 8.44 | 7: iteration 117520/ 173500 | consumed samples: 30085120 | consumed tokens: 61614325760 | elapsed time per iteration (s): 0.11 | learning rate: 6.320E-05 | global batch size: 256 | lm loss: 4.510377E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2386.305 | TFLOPs: 8.88 | 7: iteration 117530/ 173500 | consumed samples: 30087680 | consumed tokens: 61619568640 | elapsed time per iteration (s): 0.11 | learning rate: 6.318E-05 | global batch size: 256 | lm loss: 4.498584E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2229.998 | TFLOPs: 8.29 | 7: iteration 117540/ 173500 | consumed samples: 30090240 | consumed tokens: 61624811520 | elapsed time per iteration (s): 0.09 | learning rate: 6.317E-05 | global batch size: 256 | lm loss: 4.513780E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.348 | TFLOPs: 10.58 | 7: iteration 117550/ 173500 | consumed samples: 30092800 | consumed tokens: 61630054400 | elapsed time per iteration (s): 0.09 | learning rate: 6.315E-05 | global batch size: 256 | lm loss: 4.516020E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.125 | TFLOPs: 10.23 | 7: iteration 117560/ 173500 | consumed samples: 30095360 | consumed tokens: 61635297280 | elapsed time per iteration (s): 0.12 | learning rate: 6.314E-05 | global batch size: 256 | lm loss: 4.518261E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2205.889 | TFLOPs: 8.20 | 7: iteration 117570/ 173500 | consumed samples: 30097920 | consumed tokens: 61640540160 | elapsed time per iteration (s): 0.08 | learning rate: 6.313E-05 | global batch size: 256 | lm loss: 4.520790E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.729 | TFLOPs: 11.66 | 7: iteration 117580/ 173500 | consumed samples: 30100480 | consumed tokens: 61645783040 | elapsed time per iteration (s): 0.08 | learning rate: 6.311E-05 | global batch size: 256 | lm loss: 4.515012E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.603 | TFLOPs: 11.89 | 7: iteration 117590/ 173500 | consumed samples: 30103040 | consumed tokens: 61651025920 | elapsed time per iteration (s): 0.08 | learning rate: 6.310E-05 | global batch size: 256 | lm loss: 4.516453E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.741 | TFLOPs: 11.82 | 7: iteration 117600/ 173500 | consumed samples: 30105600 | consumed tokens: 61656268800 | elapsed time per iteration (s): 0.09 | learning rate: 6.308E-05 | global batch size: 256 | lm loss: 4.501359E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2755.554 | TFLOPs: 10.25 | 7: iteration 117610/ 173500 | consumed samples: 30108160 | consumed tokens: 61661511680 | elapsed time per iteration (s): 0.09 | learning rate: 6.307E-05 | global batch size: 256 | lm loss: 4.507279E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.769 | TFLOPs: 10.83 | 7: iteration 117620/ 173500 | consumed samples: 30110720 | consumed tokens: 61666754560 | elapsed time per iteration (s): 0.10 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 4.523333E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.585 | TFLOPs: 9.15 | 7: iteration 117630/ 173500 | consumed samples: 30113280 | consumed tokens: 61671997440 | elapsed time per iteration (s): 0.10 | learning rate: 6.304E-05 | global batch size: 256 | lm loss: 4.518033E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.468 | TFLOPs: 9.11 | 7: iteration 117640/ 173500 | consumed samples: 30115840 | consumed tokens: 61677240320 | elapsed time per iteration (s): 0.09 | learning rate: 6.303E-05 | global batch size: 256 | lm loss: 4.520681E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2796.229 | TFLOPs: 10.40 | 7: iteration 117650/ 173500 | consumed samples: 30118400 | consumed tokens: 61682483200 | elapsed time per iteration (s): 0.10 | learning rate: 6.301E-05 | global batch size: 256 | lm loss: 4.524174E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2454.720 | TFLOPs: 9.13 | 7: iteration 117660/ 173500 | consumed samples: 30120960 | consumed tokens: 61687726080 | elapsed time per iteration (s): 0.09 | learning rate: 6.300E-05 | global batch size: 256 | lm loss: 4.518404E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.957 | TFLOPs: 11.14 | 7: iteration 117670/ 173500 | consumed samples: 30123520 | consumed tokens: 61692968960 | elapsed time per iteration (s): 0.11 | learning rate: 6.298E-05 | global batch size: 256 | lm loss: 4.502781E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2259.718 | TFLOPs: 8.41 | 7: iteration 117680/ 173500 | consumed samples: 30126080 | consumed tokens: 61698211840 | elapsed time per iteration (s): 0.11 | learning rate: 6.297E-05 | global batch size: 256 | lm loss: 4.509027E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.921 | TFLOPs: 8.72 | 7: iteration 117690/ 173500 | consumed samples: 30128640 | consumed tokens: 61703454720 | elapsed time per iteration (s): 0.10 | learning rate: 6.296E-05 | global batch size: 256 | lm loss: 4.508521E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.123 | TFLOPs: 9.56 | 7: iteration 117700/ 173500 | consumed samples: 30131200 | consumed tokens: 61708697600 | elapsed time per iteration (s): 0.09 | learning rate: 6.294E-05 | global batch size: 256 | lm loss: 4.515438E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.480 | TFLOPs: 10.64 | 7: iteration 117710/ 173500 | consumed samples: 30133760 | consumed tokens: 61713940480 | elapsed time per iteration (s): 0.08 | learning rate: 6.293E-05 | global batch size: 256 | lm loss: 4.518761E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.577 | TFLOPs: 11.22 | 7: iteration 117720/ 173500 | consumed samples: 30136320 | consumed tokens: 61719183360 | elapsed time per iteration (s): 0.11 | learning rate: 6.291E-05 | global batch size: 256 | lm loss: 4.514425E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.850 | TFLOPs: 8.86 | 7: iteration 117730/ 173500 | consumed samples: 30138880 | consumed tokens: 61724426240 | elapsed time per iteration (s): 0.11 | learning rate: 6.290E-05 | global batch size: 256 | lm loss: 4.517652E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.148 | TFLOPs: 8.46 | 7: iteration 117740/ 173500 | consumed samples: 30141440 | consumed tokens: 61729669120 | elapsed time per iteration (s): 0.10 | learning rate: 6.289E-05 | global batch size: 256 | lm loss: 4.509311E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2564.577 | TFLOPs: 9.54 | 7: iteration 117750/ 173500 | consumed samples: 30144000 | consumed tokens: 61734912000 | elapsed time per iteration (s): 0.09 | learning rate: 6.287E-05 | global batch size: 256 | lm loss: 4.498378E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2889.057 | TFLOPs: 10.75 | 7: iteration 117760/ 173500 | consumed samples: 30146560 | consumed tokens: 61740154880 | elapsed time per iteration (s): 0.13 | learning rate: 6.286E-05 | global batch size: 256 | lm loss: 4.511857E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1900.796 | TFLOPs: 7.07 | 7: iteration 117770/ 173500 | consumed samples: 30149120 | consumed tokens: 61745397760 | elapsed time per iteration (s): 0.17 | learning rate: 6.284E-05 | global batch size: 256 | lm loss: 4.498765E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1525.719 | TFLOPs: 5.68 | 7: iteration 117780/ 173500 | consumed samples: 30151680 | consumed tokens: 61750640640 | elapsed time per iteration (s): 0.12 | learning rate: 6.283E-05 | global batch size: 256 | lm loss: 4.511309E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2181.697 | TFLOPs: 8.11 | 7: iteration 117790/ 173500 | consumed samples: 30154240 | consumed tokens: 61755883520 | elapsed time per iteration (s): 0.10 | learning rate: 6.282E-05 | global batch size: 256 | lm loss: 4.503743E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2628.789 | TFLOPs: 9.78 | 7: iteration 117800/ 173500 | consumed samples: 30156800 | consumed tokens: 61761126400 | elapsed time per iteration (s): 0.11 | learning rate: 6.280E-05 | global batch size: 256 | lm loss: 4.512326E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.912 | TFLOPs: 8.56 | 7: iteration 117810/ 173500 | consumed samples: 30159360 | consumed tokens: 61766369280 | elapsed time per iteration (s): 0.12 | learning rate: 6.279E-05 | global batch size: 256 | lm loss: 4.515679E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2158.379 | TFLOPs: 8.03 | 7: iteration 117820/ 173500 | consumed samples: 30161920 | consumed tokens: 61771612160 | elapsed time per iteration (s): 0.11 | learning rate: 6.277E-05 | global batch size: 256 | lm loss: 4.515236E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.630 | TFLOPs: 8.91 | 7: iteration 117830/ 173500 | consumed samples: 30164480 | consumed tokens: 61776855040 | elapsed time per iteration (s): 0.12 | learning rate: 6.276E-05 | global batch size: 256 | lm loss: 4.508458E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2053.793 | TFLOPs: 7.64 | 7: iteration 117840/ 173500 | consumed samples: 30167040 | consumed tokens: 61782097920 | elapsed time per iteration (s): 0.08 | learning rate: 6.275E-05 | global batch size: 256 | lm loss: 4.518634E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.372 | TFLOPs: 11.49 | 7: iteration 117850/ 173500 | consumed samples: 30169600 | consumed tokens: 61787340800 | elapsed time per iteration (s): 0.09 | learning rate: 6.273E-05 | global batch size: 256 | lm loss: 4.516772E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.901 | TFLOPs: 10.39 | 7: iteration 117860/ 173500 | consumed samples: 30172160 | consumed tokens: 61792583680 | elapsed time per iteration (s): 0.10 | learning rate: 6.272E-05 | global batch size: 256 | lm loss: 4.517740E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.009 | TFLOPs: 10.00 | 7: iteration 117870/ 173500 | consumed samples: 30174720 | consumed tokens: 61797826560 | elapsed time per iteration (s): 0.10 | learning rate: 6.270E-05 | global batch size: 256 | lm loss: 4.510047E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.081 | TFLOPs: 9.92 | 7: iteration 117880/ 173500 | consumed samples: 30177280 | consumed tokens: 61803069440 | elapsed time per iteration (s): 0.09 | learning rate: 6.269E-05 | global batch size: 256 | lm loss: 4.510709E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.298 | TFLOPs: 10.58 | 7: iteration 117890/ 173500 | consumed samples: 30179840 | consumed tokens: 61808312320 | elapsed time per iteration (s): 0.10 | learning rate: 6.268E-05 | global batch size: 256 | lm loss: 4.509942E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.880 | TFLOPs: 9.34 | 7: iteration 117900/ 173500 | consumed samples: 30182400 | consumed tokens: 61813555200 | elapsed time per iteration (s): 0.13 | learning rate: 6.266E-05 | global batch size: 256 | lm loss: 4.523004E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1904.954 | TFLOPs: 7.09 | 7: iteration 117910/ 173500 | consumed samples: 30184960 | consumed tokens: 61818798080 | elapsed time per iteration (s): 0.13 | learning rate: 6.265E-05 | global batch size: 256 | lm loss: 4.508318E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.271 | TFLOPs: 7.35 | 7: iteration 117920/ 173500 | consumed samples: 30187520 | consumed tokens: 61824040960 | elapsed time per iteration (s): 0.14 | learning rate: 6.263E-05 | global batch size: 256 | lm loss: 4.510508E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1893.339 | TFLOPs: 7.04 | 7: iteration 117930/ 173500 | consumed samples: 30190080 | consumed tokens: 61829283840 | elapsed time per iteration (s): 0.09 | learning rate: 6.262E-05 | global batch size: 256 | lm loss: 4.509150E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.917 | TFLOPs: 10.92 | 7: iteration 117940/ 173500 | consumed samples: 30192640 | consumed tokens: 61834526720 | elapsed time per iteration (s): 0.08 | learning rate: 6.261E-05 | global batch size: 256 | lm loss: 4.517296E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.447 | TFLOPs: 11.79 | 7: iteration 117950/ 173500 | consumed samples: 30195200 | consumed tokens: 61839769600 | elapsed time per iteration (s): 0.08 | learning rate: 6.259E-05 | global batch size: 256 | lm loss: 4.516777E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.346 | TFLOPs: 11.90 | 7: iteration 117960/ 173500 | consumed samples: 30197760 | consumed tokens: 61845012480 | elapsed time per iteration (s): 0.10 | learning rate: 6.258E-05 | global batch size: 256 | lm loss: 4.518345E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.561 | TFLOPs: 9.69 | 7: iteration 117970/ 173500 | consumed samples: 30200320 | consumed tokens: 61850255360 | elapsed time per iteration (s): 0.11 | learning rate: 6.256E-05 | global batch size: 256 | lm loss: 4.518565E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.203 | TFLOPs: 8.95 | 7: iteration 117980/ 173500 | consumed samples: 30202880 | consumed tokens: 61855498240 | elapsed time per iteration (s): 0.08 | learning rate: 6.255E-05 | global batch size: 256 | lm loss: 4.516681E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.479 | TFLOPs: 11.75 | 7: iteration 117990/ 173500 | consumed samples: 30205440 | consumed tokens: 61860741120 | elapsed time per iteration (s): 0.19 | learning rate: 6.254E-05 | global batch size: 256 | lm loss: 4.511937E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1323.691 | TFLOPs: 4.92 | 0: [2023-03-17 03:06:58,144] [INFO] [logging.py:68:log_dist] [Rank 0] step=118000, skipped=0, lr=[6.252226684525562e-05, 6.252226684525562e-05, 6.252226684525562e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 118000/ 173500 | consumed samples: 30208000 | consumed tokens: 61865984000 | elapsed time per iteration (s): 0.08 | learning rate: 6.252E-05 | global batch size: 256 | lm loss: 4.503071E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.937 | TFLOPs: 11.60 | 0: steps: 118000 loss: 4.4683 iter time (s): 0.099 samples/sec: 2583.560 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 118000 | lm loss value: 4.398012E+00 | lm loss PPL: 8.128912E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 118000 to checkpoints_14m91b100m 0: [2023-03-17 03:06:58,212] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step118000 is begin to save! 0: [2023-03-17 03:06:58,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:06:58,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:06:58,246] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:06:58,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:06:58,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:06:58,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:06:58,257] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:06:58,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:06:58,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:06:58,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:06:58,272] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:06:58,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:06:58,278] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step118000/mp_rank_00_model_states.pt 0: [2023-03-17 03:06:58,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:06:58,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:06:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:06:58,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,302] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,302] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,303] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,303] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:06:58,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 5: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 6: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 3: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 7: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 4: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:06:58,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 03:06:58,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 1: [2023-03-17 03:06:58,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 2: [2023-03-17 03:06:58,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:06:58,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:06:58,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: [2023-03-17 03:06:58,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:06:58,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step118000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:06:58,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step118000 is ready now! 0: successfully saved checkpoint at iteration 118000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 105.88 7: iteration 118010/ 173500 | consumed samples: 30210560 | consumed tokens: 61871226880 | elapsed time per iteration (s): 0.10 | learning rate: 6.251E-05 | global batch size: 256 | lm loss: 4.516790E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.782 | TFLOPs: 9.92 | 7: iteration 118020/ 173500 | consumed samples: 30213120 | consumed tokens: 61876469760 | elapsed time per iteration (s): 0.08 | learning rate: 6.249E-05 | global batch size: 256 | lm loss: 4.515242E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.141 | TFLOPs: 12.06 | 7: iteration 118030/ 173500 | consumed samples: 30215680 | consumed tokens: 61881712640 | elapsed time per iteration (s): 0.10 | learning rate: 6.248E-05 | global batch size: 256 | lm loss: 4.518950E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.464 | TFLOPs: 9.73 | 7: iteration 118040/ 173500 | consumed samples: 30218240 | consumed tokens: 61886955520 | elapsed time per iteration (s): 0.09 | learning rate: 6.247E-05 | global batch size: 256 | lm loss: 4.513287E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2757.002 | TFLOPs: 10.25 | 7: iteration 118050/ 173500 | consumed samples: 30220800 | consumed tokens: 61892198400 | elapsed time per iteration (s): 0.08 | learning rate: 6.245E-05 | global batch size: 256 | lm loss: 4.510912E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.095 | TFLOPs: 11.24 | 7: iteration 118060/ 173500 | consumed samples: 30223360 | consumed tokens: 61897441280 | elapsed time per iteration (s): 0.09 | learning rate: 6.244E-05 | global batch size: 256 | lm loss: 4.506643E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.252 | TFLOPs: 11.06 | 7: iteration 118070/ 173500 | consumed samples: 30225920 | consumed tokens: 61902684160 | elapsed time per iteration (s): 0.10 | learning rate: 6.242E-05 | global batch size: 256 | lm loss: 4.513514E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2453.656 | TFLOPs: 9.13 | 7: iteration 118080/ 173500 | consumed samples: 30228480 | consumed tokens: 61907927040 | elapsed time per iteration (s): 0.10 | learning rate: 6.241E-05 | global batch size: 256 | lm loss: 4.498361E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.646 | TFLOPs: 9.52 | 7: iteration 118090/ 173500 | consumed samples: 30231040 | consumed tokens: 61913169920 | elapsed time per iteration (s): 0.08 | learning rate: 6.240E-05 | global batch size: 256 | lm loss: 4.507553E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3017.241 | TFLOPs: 11.22 | 7: iteration 118100/ 173500 | consumed samples: 30233600 | consumed tokens: 61918412800 | elapsed time per iteration (s): 0.08 | learning rate: 6.238E-05 | global batch size: 256 | lm loss: 4.513429E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.007 | TFLOPs: 11.66 | 7: iteration 118110/ 173500 | consumed samples: 30236160 | consumed tokens: 61923655680 | elapsed time per iteration (s): 0.08 | learning rate: 6.237E-05 | global batch size: 256 | lm loss: 4.514454E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.367 | TFLOPs: 11.66 | 7: iteration 118120/ 173500 | consumed samples: 30238720 | consumed tokens: 61928898560 | elapsed time per iteration (s): 0.10 | learning rate: 6.235E-05 | global batch size: 256 | lm loss: 4.519708E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.753 | TFLOPs: 9.82 | 7: iteration 118130/ 173500 | consumed samples: 30241280 | consumed tokens: 61934141440 | elapsed time per iteration (s): 0.11 | learning rate: 6.234E-05 | global batch size: 256 | lm loss: 4.520945E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.628 | TFLOPs: 8.69 | 7: iteration 118140/ 173500 | consumed samples: 30243840 | consumed tokens: 61939384320 | elapsed time per iteration (s): 0.12 | learning rate: 6.233E-05 | global batch size: 256 | lm loss: 4.512043E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2155.155 | TFLOPs: 8.02 | 7: iteration 118150/ 173500 | consumed samples: 30246400 | consumed tokens: 61944627200 | elapsed time per iteration (s): 0.12 | learning rate: 6.231E-05 | global batch size: 256 | lm loss: 4.505651E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2215.163 | TFLOPs: 8.24 | 7: iteration 118160/ 173500 | consumed samples: 30248960 | consumed tokens: 61949870080 | elapsed time per iteration (s): 0.09 | learning rate: 6.230E-05 | global batch size: 256 | lm loss: 4.508236E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.814 | TFLOPs: 10.04 | 7: iteration 118170/ 173500 | consumed samples: 30251520 | consumed tokens: 61955112960 | elapsed time per iteration (s): 0.12 | learning rate: 6.228E-05 | global batch size: 256 | lm loss: 4.495546E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.628 | TFLOPs: 8.01 | 7: iteration 118180/ 173500 | consumed samples: 30254080 | consumed tokens: 61960355840 | elapsed time per iteration (s): 0.08 | learning rate: 6.227E-05 | global batch size: 256 | lm loss: 4.519688E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.833 | TFLOPs: 11.34 | 7: iteration 118190/ 173500 | consumed samples: 30256640 | consumed tokens: 61965598720 | elapsed time per iteration (s): 0.10 | learning rate: 6.226E-05 | global batch size: 256 | lm loss: 4.520710E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2571.136 | TFLOPs: 9.56 | 7: iteration 118200/ 173500 | consumed samples: 30259200 | consumed tokens: 61970841600 | elapsed time per iteration (s): 0.09 | learning rate: 6.224E-05 | global batch size: 256 | lm loss: 4.528923E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.530 | TFLOPs: 11.10 | 7: iteration 118210/ 173500 | consumed samples: 30261760 | consumed tokens: 61976084480 | elapsed time per iteration (s): 0.10 | learning rate: 6.223E-05 | global batch size: 256 | lm loss: 4.506244E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.440 | TFLOPs: 10.01 | 7: iteration 118220/ 173500 | consumed samples: 30264320 | consumed tokens: 61981327360 | elapsed time per iteration (s): 0.10 | learning rate: 6.221E-05 | global batch size: 256 | lm loss: 4.515250E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2673.321 | TFLOPs: 9.94 | 7: iteration 118230/ 173500 | consumed samples: 30266880 | consumed tokens: 61986570240 | elapsed time per iteration (s): 0.10 | learning rate: 6.220E-05 | global batch size: 256 | lm loss: 4.513056E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2653.679 | TFLOPs: 9.87 | 7: iteration 118240/ 173500 | consumed samples: 30269440 | consumed tokens: 61991813120 | elapsed time per iteration (s): 0.08 | learning rate: 6.219E-05 | global batch size: 256 | lm loss: 4.511871E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.882 | TFLOPs: 11.36 | 7: iteration 118250/ 173500 | consumed samples: 30272000 | consumed tokens: 61997056000 | elapsed time per iteration (s): 0.11 | learning rate: 6.217E-05 | global batch size: 256 | lm loss: 4.521894E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2290.043 | TFLOPs: 8.52 | 7: iteration 118260/ 173500 | consumed samples: 30274560 | consumed tokens: 62002298880 | elapsed time per iteration (s): 0.09 | learning rate: 6.216E-05 | global batch size: 256 | lm loss: 4.523319E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.062 | TFLOPs: 11.11 | 7: iteration 118270/ 173500 | consumed samples: 30277120 | consumed tokens: 62007541760 | elapsed time per iteration (s): 0.10 | learning rate: 6.215E-05 | global batch size: 256 | lm loss: 4.506217E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2654.150 | TFLOPs: 9.87 | 7: iteration 118280/ 173500 | consumed samples: 30279680 | consumed tokens: 62012784640 | elapsed time per iteration (s): 0.08 | learning rate: 6.213E-05 | global batch size: 256 | lm loss: 4.504754E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.124 | TFLOPs: 11.91 | 7: iteration 118290/ 173500 | consumed samples: 30282240 | consumed tokens: 62018027520 | elapsed time per iteration (s): 0.09 | learning rate: 6.212E-05 | global batch size: 256 | lm loss: 4.508154E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2781.711 | TFLOPs: 10.35 | 7: iteration 118300/ 173500 | consumed samples: 30284800 | consumed tokens: 62023270400 | elapsed time per iteration (s): 0.12 | learning rate: 6.210E-05 | global batch size: 256 | lm loss: 4.507618E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2099.558 | TFLOPs: 7.81 | 7: iteration 118310/ 173500 | consumed samples: 30287360 | consumed tokens: 62028513280 | elapsed time per iteration (s): 0.09 | learning rate: 6.209E-05 | global batch size: 256 | lm loss: 4.515987E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.994 | TFLOPs: 10.24 | 7: iteration 118320/ 173500 | consumed samples: 30289920 | consumed tokens: 62033756160 | elapsed time per iteration (s): 0.09 | learning rate: 6.208E-05 | global batch size: 256 | lm loss: 4.520892E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.398 | TFLOPs: 11.09 | 7: iteration 118330/ 173500 | consumed samples: 30292480 | consumed tokens: 62038999040 | elapsed time per iteration (s): 0.08 | learning rate: 6.206E-05 | global batch size: 256 | lm loss: 4.509715E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.119 | TFLOPs: 11.98 | 7: iteration 118340/ 173500 | consumed samples: 30295040 | consumed tokens: 62044241920 | elapsed time per iteration (s): 0.09 | learning rate: 6.205E-05 | global batch size: 256 | lm loss: 4.507372E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2773.590 | TFLOPs: 10.32 | 7: iteration 118350/ 173500 | consumed samples: 30297600 | consumed tokens: 62049484800 | elapsed time per iteration (s): 0.09 | learning rate: 6.203E-05 | global batch size: 256 | lm loss: 4.514660E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.683 | TFLOPs: 11.20 | 7: iteration 118360/ 173500 | consumed samples: 30300160 | consumed tokens: 62054727680 | elapsed time per iteration (s): 0.09 | learning rate: 6.202E-05 | global batch size: 256 | lm loss: 4.511720E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.487 | TFLOPs: 10.18 | 7: iteration 118370/ 173500 | consumed samples: 30302720 | consumed tokens: 62059970560 | elapsed time per iteration (s): 0.08 | learning rate: 6.201E-05 | global batch size: 256 | lm loss: 4.499799E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.405 | TFLOPs: 12.00 | 7: iteration 118380/ 173500 | consumed samples: 30305280 | consumed tokens: 62065213440 | elapsed time per iteration (s): 0.10 | learning rate: 6.199E-05 | global batch size: 256 | lm loss: 4.507529E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2626.160 | TFLOPs: 9.77 | 7: iteration 118390/ 173500 | consumed samples: 30307840 | consumed tokens: 62070456320 | elapsed time per iteration (s): 0.09 | learning rate: 6.198E-05 | global batch size: 256 | lm loss: 4.509244E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.689 | TFLOPs: 11.03 | 7: iteration 118400/ 173500 | consumed samples: 30310400 | consumed tokens: 62075699200 | elapsed time per iteration (s): 0.11 | learning rate: 6.196E-05 | global batch size: 256 | lm loss: 4.514190E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.336 | TFLOPs: 8.85 | 7: iteration 118410/ 173500 | consumed samples: 30312960 | consumed tokens: 62080942080 | elapsed time per iteration (s): 0.11 | learning rate: 6.195E-05 | global batch size: 256 | lm loss: 4.516275E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2385.881 | TFLOPs: 8.87 | 7: iteration 118420/ 173500 | consumed samples: 30315520 | consumed tokens: 62086184960 | elapsed time per iteration (s): 0.09 | learning rate: 6.194E-05 | global batch size: 256 | lm loss: 4.508132E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2734.456 | TFLOPs: 10.17 | 7: iteration 118430/ 173500 | consumed samples: 30318080 | consumed tokens: 62091427840 | elapsed time per iteration (s): 0.10 | learning rate: 6.192E-05 | global batch size: 256 | lm loss: 4.517546E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2533.221 | TFLOPs: 9.42 | 7: iteration 118440/ 173500 | consumed samples: 30320640 | consumed tokens: 62096670720 | elapsed time per iteration (s): 0.08 | learning rate: 6.191E-05 | global batch size: 256 | lm loss: 4.520569E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.564 | TFLOPs: 11.96 | 7: iteration 118450/ 173500 | consumed samples: 30323200 | consumed tokens: 62101913600 | elapsed time per iteration (s): 0.08 | learning rate: 6.189E-05 | global batch size: 256 | lm loss: 4.523714E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.901 | TFLOPs: 11.69 | 7: iteration 118460/ 173500 | consumed samples: 30325760 | consumed tokens: 62107156480 | elapsed time per iteration (s): 0.08 | learning rate: 6.188E-05 | global batch size: 256 | lm loss: 4.504372E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.359 | TFLOPs: 11.31 | 7: iteration 118470/ 173500 | consumed samples: 30328320 | consumed tokens: 62112399360 | elapsed time per iteration (s): 0.08 | learning rate: 6.187E-05 | global batch size: 256 | lm loss: 4.502943E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.050 | TFLOPs: 12.01 | 7: iteration 118480/ 173500 | consumed samples: 30330880 | consumed tokens: 62117642240 | elapsed time per iteration (s): 0.08 | learning rate: 6.185E-05 | global batch size: 256 | lm loss: 4.527845E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.910 | TFLOPs: 12.01 | 7: iteration 118490/ 173500 | consumed samples: 30333440 | consumed tokens: 62122885120 | elapsed time per iteration (s): 0.10 | learning rate: 6.184E-05 | global batch size: 256 | lm loss: 4.529507E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.468 | TFLOPs: 9.46 | 7: iteration 118500/ 173500 | consumed samples: 30336000 | consumed tokens: 62128128000 | elapsed time per iteration (s): 0.12 | learning rate: 6.183E-05 | global batch size: 256 | lm loss: 4.519365E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.740 | TFLOPs: 7.82 | 7: iteration 118510/ 173500 | consumed samples: 30338560 | consumed tokens: 62133370880 | elapsed time per iteration (s): 0.10 | learning rate: 6.181E-05 | global batch size: 256 | lm loss: 4.517144E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2600.974 | TFLOPs: 9.67 | 7: iteration 118520/ 173500 | consumed samples: 30341120 | consumed tokens: 62138613760 | elapsed time per iteration (s): 0.08 | learning rate: 6.180E-05 | global batch size: 256 | lm loss: 4.509994E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.809 | TFLOPs: 11.86 | 7: iteration 118530/ 173500 | consumed samples: 30343680 | consumed tokens: 62143856640 | elapsed time per iteration (s): 0.08 | learning rate: 6.178E-05 | global batch size: 256 | lm loss: 4.514863E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.928 | TFLOPs: 11.60 | 7: iteration 118540/ 173500 | consumed samples: 30346240 | consumed tokens: 62149099520 | elapsed time per iteration (s): 0.09 | learning rate: 6.177E-05 | global batch size: 256 | lm loss: 4.508087E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.455 | TFLOPs: 10.49 | 7: iteration 118550/ 173500 | consumed samples: 30348800 | consumed tokens: 62154342400 | elapsed time per iteration (s): 0.10 | learning rate: 6.176E-05 | global batch size: 256 | lm loss: 4.514298E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2654.196 | TFLOPs: 9.87 | 7: iteration 118560/ 173500 | consumed samples: 30351360 | consumed tokens: 62159585280 | elapsed time per iteration (s): 0.10 | learning rate: 6.174E-05 | global batch size: 256 | lm loss: 4.510850E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.733 | TFLOPs: 9.74 | 7: iteration 118570/ 173500 | consumed samples: 30353920 | consumed tokens: 62164828160 | elapsed time per iteration (s): 0.09 | learning rate: 6.173E-05 | global batch size: 256 | lm loss: 4.532550E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.638 | TFLOPs: 11.17 | 7: iteration 118580/ 173500 | consumed samples: 30356480 | consumed tokens: 62170071040 | elapsed time per iteration (s): 0.09 | learning rate: 6.171E-05 | global batch size: 256 | lm loss: 4.497615E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.580 | TFLOPs: 10.81 | 7: iteration 118590/ 173500 | consumed samples: 30359040 | consumed tokens: 62175313920 | elapsed time per iteration (s): 0.08 | learning rate: 6.170E-05 | global batch size: 256 | lm loss: 4.511624E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.227 | TFLOPs: 11.20 | 7: iteration 118600/ 173500 | consumed samples: 30361600 | consumed tokens: 62180556800 | elapsed time per iteration (s): 0.08 | learning rate: 6.169E-05 | global batch size: 256 | lm loss: 4.510589E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.317 | TFLOPs: 11.49 | 7: iteration 118610/ 173500 | consumed samples: 30364160 | consumed tokens: 62185799680 | elapsed time per iteration (s): 0.10 | learning rate: 6.167E-05 | global batch size: 256 | lm loss: 4.519227E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2658.381 | TFLOPs: 9.89 | 7: iteration 118620/ 173500 | consumed samples: 30366720 | consumed tokens: 62191042560 | elapsed time per iteration (s): 0.12 | learning rate: 6.166E-05 | global batch size: 256 | lm loss: 4.496719E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2179.550 | TFLOPs: 8.11 | 7: iteration 118630/ 173500 | consumed samples: 30369280 | consumed tokens: 62196285440 | elapsed time per iteration (s): 0.10 | learning rate: 6.164E-05 | global batch size: 256 | lm loss: 4.519516E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2625.763 | TFLOPs: 9.77 | 7: iteration 118640/ 173500 | consumed samples: 30371840 | consumed tokens: 62201528320 | elapsed time per iteration (s): 0.10 | learning rate: 6.163E-05 | global batch size: 256 | lm loss: 4.517644E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2493.673 | TFLOPs: 9.28 | 7: iteration 118650/ 173500 | consumed samples: 30374400 | consumed tokens: 62206771200 | elapsed time per iteration (s): 0.10 | learning rate: 6.162E-05 | global batch size: 256 | lm loss: 4.527673E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2589.820 | TFLOPs: 9.63 | 7: iteration 118660/ 173500 | consumed samples: 30376960 | consumed tokens: 62212014080 | elapsed time per iteration (s): 0.09 | learning rate: 6.160E-05 | global batch size: 256 | lm loss: 4.514608E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.469 | TFLOPs: 10.41 | 7: iteration 118670/ 173500 | consumed samples: 30379520 | consumed tokens: 62217256960 | elapsed time per iteration (s): 0.09 | learning rate: 6.159E-05 | global batch size: 256 | lm loss: 4.508154E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2867.205 | TFLOPs: 10.66 | 7: iteration 118680/ 173500 | consumed samples: 30382080 | consumed tokens: 62222499840 | elapsed time per iteration (s): 0.08 | learning rate: 6.158E-05 | global batch size: 256 | lm loss: 4.530495E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.756 | TFLOPs: 11.33 | 7: iteration 118690/ 173500 | consumed samples: 30384640 | consumed tokens: 62227742720 | elapsed time per iteration (s): 0.10 | learning rate: 6.156E-05 | global batch size: 256 | lm loss: 4.524904E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2605.053 | TFLOPs: 9.69 | 7: iteration 118700/ 173500 | consumed samples: 30387200 | consumed tokens: 62232985600 | elapsed time per iteration (s): 0.09 | learning rate: 6.155E-05 | global batch size: 256 | lm loss: 4.508244E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2822.860 | TFLOPs: 10.50 | 7: iteration 118710/ 173500 | consumed samples: 30389760 | consumed tokens: 62238228480 | elapsed time per iteration (s): 0.10 | learning rate: 6.153E-05 | global batch size: 256 | lm loss: 4.531994E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.447 | TFLOPs: 9.21 | 7: iteration 118720/ 173500 | consumed samples: 30392320 | consumed tokens: 62243471360 | elapsed time per iteration (s): 0.10 | learning rate: 6.152E-05 | global batch size: 256 | lm loss: 4.531295E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.646 | TFLOPs: 9.16 | 7: iteration 118730/ 173500 | consumed samples: 30394880 | consumed tokens: 62248714240 | elapsed time per iteration (s): 0.10 | learning rate: 6.151E-05 | global batch size: 256 | lm loss: 4.509295E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.200 | TFLOPs: 9.72 | 7: iteration 118740/ 173500 | consumed samples: 30397440 | consumed tokens: 62253957120 | elapsed time per iteration (s): 0.10 | learning rate: 6.149E-05 | global batch size: 256 | lm loss: 4.505772E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2637.910 | TFLOPs: 9.81 | 7: iteration 118750/ 173500 | consumed samples: 30400000 | consumed tokens: 62259200000 | elapsed time per iteration (s): 0.10 | learning rate: 6.148E-05 | global batch size: 256 | lm loss: 4.510756E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2439.410 | TFLOPs: 9.07 | 7: iteration 118760/ 173500 | consumed samples: 30402560 | consumed tokens: 62264442880 | elapsed time per iteration (s): 0.11 | learning rate: 6.146E-05 | global batch size: 256 | lm loss: 4.514454E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2420.042 | TFLOPs: 9.00 | 7: iteration 118770/ 173500 | consumed samples: 30405120 | consumed tokens: 62269685760 | elapsed time per iteration (s): 0.12 | learning rate: 6.145E-05 | global batch size: 256 | lm loss: 4.509692E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2225.822 | TFLOPs: 8.28 | 7: iteration 118780/ 173500 | consumed samples: 30407680 | consumed tokens: 62274928640 | elapsed time per iteration (s): 0.09 | learning rate: 6.144E-05 | global batch size: 256 | lm loss: 4.515515E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2802.484 | TFLOPs: 10.42 | 7: iteration 118790/ 173500 | consumed samples: 30410240 | consumed tokens: 62280171520 | elapsed time per iteration (s): 0.08 | learning rate: 6.142E-05 | global batch size: 256 | lm loss: 4.521074E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.380 | TFLOPs: 11.34 | 7: iteration 118800/ 173500 | consumed samples: 30412800 | consumed tokens: 62285414400 | elapsed time per iteration (s): 0.09 | learning rate: 6.141E-05 | global batch size: 256 | lm loss: 4.522499E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2886.910 | TFLOPs: 10.74 | 7: iteration 118810/ 173500 | consumed samples: 30415360 | consumed tokens: 62290657280 | elapsed time per iteration (s): 0.08 | learning rate: 6.139E-05 | global batch size: 256 | lm loss: 4.503930E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.745 | TFLOPs: 11.85 | 7: iteration 118820/ 173500 | consumed samples: 30417920 | consumed tokens: 62295900160 | elapsed time per iteration (s): 0.08 | learning rate: 6.138E-05 | global batch size: 256 | lm loss: 4.516496E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.707 | TFLOPs: 11.39 | 7: iteration 118830/ 173500 | consumed samples: 30420480 | consumed tokens: 62301143040 | elapsed time per iteration (s): 0.10 | learning rate: 6.137E-05 | global batch size: 256 | lm loss: 4.515177E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.884 | TFLOPs: 9.29 | 7: iteration 118840/ 173500 | consumed samples: 30423040 | consumed tokens: 62306385920 | elapsed time per iteration (s): 0.12 | learning rate: 6.135E-05 | global batch size: 256 | lm loss: 4.509793E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2200.519 | TFLOPs: 8.18 | 7: iteration 118850/ 173500 | consumed samples: 30425600 | consumed tokens: 62311628800 | elapsed time per iteration (s): 0.10 | learning rate: 6.134E-05 | global batch size: 256 | lm loss: 4.520657E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.815 | TFLOPs: 9.33 | 7: iteration 118860/ 173500 | consumed samples: 30428160 | consumed tokens: 62316871680 | elapsed time per iteration (s): 0.09 | learning rate: 6.133E-05 | global batch size: 256 | lm loss: 4.507367E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.740 | TFLOPs: 11.02 | 7: iteration 118870/ 173500 | consumed samples: 30430720 | consumed tokens: 62322114560 | elapsed time per iteration (s): 0.09 | learning rate: 6.131E-05 | global batch size: 256 | lm loss: 4.506540E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.807 | TFLOPs: 10.53 | 7: iteration 118880/ 173500 | consumed samples: 30433280 | consumed tokens: 62327357440 | elapsed time per iteration (s): 0.10 | learning rate: 6.130E-05 | global batch size: 256 | lm loss: 4.507970E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.117 | TFLOPs: 9.59 | 7: iteration 118890/ 173500 | consumed samples: 30435840 | consumed tokens: 62332600320 | elapsed time per iteration (s): 0.10 | learning rate: 6.128E-05 | global batch size: 256 | lm loss: 4.509314E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.746 | TFLOPs: 9.61 | 7: iteration 118900/ 173500 | consumed samples: 30438400 | consumed tokens: 62337843200 | elapsed time per iteration (s): 0.09 | learning rate: 6.127E-05 | global batch size: 256 | lm loss: 4.512297E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2827.084 | TFLOPs: 10.52 | 7: iteration 118910/ 173500 | consumed samples: 30440960 | consumed tokens: 62343086080 | elapsed time per iteration (s): 0.08 | learning rate: 6.126E-05 | global batch size: 256 | lm loss: 4.517265E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.277 | TFLOPs: 11.29 | 7: iteration 118920/ 173500 | consumed samples: 30443520 | consumed tokens: 62348328960 | elapsed time per iteration (s): 0.08 | learning rate: 6.124E-05 | global batch size: 256 | lm loss: 4.514922E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.590 | TFLOPs: 11.28 | 7: iteration 118930/ 173500 | consumed samples: 30446080 | consumed tokens: 62353571840 | elapsed time per iteration (s): 0.09 | learning rate: 6.123E-05 | global batch size: 256 | lm loss: 4.512547E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.698 | TFLOPs: 10.53 | 7: iteration 118940/ 173500 | consumed samples: 30448640 | consumed tokens: 62358814720 | elapsed time per iteration (s): 0.13 | learning rate: 6.121E-05 | global batch size: 256 | lm loss: 4.518821E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2013.972 | TFLOPs: 7.49 | 7: iteration 118950/ 173500 | consumed samples: 30451200 | consumed tokens: 62364057600 | elapsed time per iteration (s): 0.09 | learning rate: 6.120E-05 | global batch size: 256 | lm loss: 4.520477E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.294 | TFLOPs: 11.04 | 7: iteration 118960/ 173500 | consumed samples: 30453760 | consumed tokens: 62369300480 | elapsed time per iteration (s): 0.08 | learning rate: 6.119E-05 | global batch size: 256 | lm loss: 4.513929E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.678 | TFLOPs: 12.04 | 7: iteration 118970/ 173500 | consumed samples: 30456320 | consumed tokens: 62374543360 | elapsed time per iteration (s): 0.08 | learning rate: 6.117E-05 | global batch size: 256 | lm loss: 4.531847E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.089 | TFLOPs: 12.00 | 7: iteration 118980/ 173500 | consumed samples: 30458880 | consumed tokens: 62379786240 | elapsed time per iteration (s): 0.08 | learning rate: 6.116E-05 | global batch size: 256 | lm loss: 4.511935E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.548 | TFLOPs: 11.87 | 7: iteration 118990/ 173500 | consumed samples: 30461440 | consumed tokens: 62385029120 | elapsed time per iteration (s): 0.08 | learning rate: 6.115E-05 | global batch size: 256 | lm loss: 4.524177E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.395 | TFLOPs: 11.44 | 7: iteration 119000/ 173500 | consumed samples: 30464000 | consumed tokens: 62390272000 | elapsed time per iteration (s): 0.08 | learning rate: 6.113E-05 | global batch size: 256 | lm loss: 4.511967E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.663 | TFLOPs: 11.61 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 119000 | lm loss value: 4.407117E+00 | lm loss PPL: 8.203261E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 119000 to checkpoints_14m91b100m 0: [2023-03-17 03:08:31,908] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step119000 is begin to save! 0: [2023-03-17 03:08:31,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:08:31,946] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:08:31,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:08:31,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:08:31,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:08:31,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:08:31,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:08:31,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:08:31,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:08:31,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:08:31,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:08:31,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:08:31,960] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step119000/mp_rank_00_model_states.pt 0: [2023-03-17 03:08:31,960] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:08:31,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:08:31,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:08:31,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 2: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:08:31,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:08:31,993] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 1: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 3: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 6: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 5: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 4: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step119000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 7: [2023-03-17 03:08:31,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step119000 is ready now! 0: successfully saved checkpoint at iteration 119000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 89.65 7: iteration 119010/ 173500 | consumed samples: 30466560 | consumed tokens: 62395514880 | elapsed time per iteration (s): 0.09 | learning rate: 6.112E-05 | global batch size: 256 | lm loss: 4.510616E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.552 | TFLOPs: 10.27 | 7: iteration 119020/ 173500 | consumed samples: 30469120 | consumed tokens: 62400757760 | elapsed time per iteration (s): 0.08 | learning rate: 6.110E-05 | global batch size: 256 | lm loss: 4.504136E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.803 | TFLOPs: 11.68 | 7: iteration 119030/ 173500 | consumed samples: 30471680 | consumed tokens: 62406000640 | elapsed time per iteration (s): 0.08 | learning rate: 6.109E-05 | global batch size: 256 | lm loss: 4.505150E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.209 | TFLOPs: 11.56 | 7: iteration 119040/ 173500 | consumed samples: 30474240 | consumed tokens: 62411243520 | elapsed time per iteration (s): 0.09 | learning rate: 6.108E-05 | global batch size: 256 | lm loss: 4.514807E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.112 | TFLOPs: 11.15 | 7: iteration 119050/ 173500 | consumed samples: 30476800 | consumed tokens: 62416486400 | elapsed time per iteration (s): 0.08 | learning rate: 6.106E-05 | global batch size: 256 | lm loss: 4.511734E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.177 | TFLOPs: 11.55 | 7: iteration 119060/ 173500 | consumed samples: 30479360 | consumed tokens: 62421729280 | elapsed time per iteration (s): 0.09 | learning rate: 6.105E-05 | global batch size: 256 | lm loss: 4.508255E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.806 | TFLOPs: 10.54 | 7: iteration 119070/ 173500 | consumed samples: 30481920 | consumed tokens: 62426972160 | elapsed time per iteration (s): 0.09 | learning rate: 6.104E-05 | global batch size: 256 | lm loss: 4.521632E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.819 | TFLOPs: 10.58 | 7: iteration 119080/ 173500 | consumed samples: 30484480 | consumed tokens: 62432215040 | elapsed time per iteration (s): 0.11 | learning rate: 6.102E-05 | global batch size: 256 | lm loss: 4.501214E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2409.417 | TFLOPs: 8.96 | 7: iteration 119090/ 173500 | consumed samples: 30487040 | consumed tokens: 62437457920 | elapsed time per iteration (s): 0.10 | learning rate: 6.101E-05 | global batch size: 256 | lm loss: 4.503079E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.274 | TFLOPs: 9.82 | 7: iteration 119100/ 173500 | consumed samples: 30489600 | consumed tokens: 62442700800 | elapsed time per iteration (s): 0.13 | learning rate: 6.099E-05 | global batch size: 256 | lm loss: 4.512557E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1928.301 | TFLOPs: 7.17 | 7: iteration 119110/ 173500 | consumed samples: 30492160 | consumed tokens: 62447943680 | elapsed time per iteration (s): 0.13 | learning rate: 6.098E-05 | global batch size: 256 | lm loss: 4.504759E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.316 | TFLOPs: 7.46 | 7: iteration 119120/ 173500 | consumed samples: 30494720 | consumed tokens: 62453186560 | elapsed time per iteration (s): 0.13 | learning rate: 6.097E-05 | global batch size: 256 | lm loss: 4.516747E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.286 | TFLOPs: 7.28 | 7: iteration 119130/ 173500 | consumed samples: 30497280 | consumed tokens: 62458429440 | elapsed time per iteration (s): 0.13 | learning rate: 6.095E-05 | global batch size: 256 | lm loss: 4.506781E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.367 | TFLOPs: 7.33 | 7: iteration 119140/ 173500 | consumed samples: 30499840 | consumed tokens: 62463672320 | elapsed time per iteration (s): 0.10 | learning rate: 6.094E-05 | global batch size: 256 | lm loss: 4.525225E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.138 | TFLOPs: 9.70 | 7: iteration 119150/ 173500 | consumed samples: 30502400 | consumed tokens: 62468915200 | elapsed time per iteration (s): 0.10 | learning rate: 6.092E-05 | global batch size: 256 | lm loss: 4.500368E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2634.550 | TFLOPs: 9.80 | 7: iteration 119160/ 173500 | consumed samples: 30504960 | consumed tokens: 62474158080 | elapsed time per iteration (s): 0.08 | learning rate: 6.091E-05 | global batch size: 256 | lm loss: 4.507779E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.356 | TFLOPs: 11.53 | 7: iteration 119170/ 173500 | consumed samples: 30507520 | consumed tokens: 62479400960 | elapsed time per iteration (s): 0.08 | learning rate: 6.090E-05 | global batch size: 256 | lm loss: 4.513458E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.149 | TFLOPs: 11.78 | 7: iteration 119180/ 173500 | consumed samples: 30510080 | consumed tokens: 62484643840 | elapsed time per iteration (s): 0.11 | learning rate: 6.088E-05 | global batch size: 256 | lm loss: 4.503044E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.490 | TFLOPs: 8.91 | 7: iteration 119190/ 173500 | consumed samples: 30512640 | consumed tokens: 62489886720 | elapsed time per iteration (s): 0.13 | learning rate: 6.087E-05 | global batch size: 256 | lm loss: 4.515467E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.322 | TFLOPs: 7.50 | 7: iteration 119200/ 173500 | consumed samples: 30515200 | consumed tokens: 62495129600 | elapsed time per iteration (s): 0.09 | learning rate: 6.086E-05 | global batch size: 256 | lm loss: 4.508605E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.883 | TFLOPs: 10.03 | 7: iteration 119210/ 173500 | consumed samples: 30517760 | consumed tokens: 62500372480 | elapsed time per iteration (s): 0.09 | learning rate: 6.084E-05 | global batch size: 256 | lm loss: 4.521034E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2888.918 | TFLOPs: 10.75 | 7: iteration 119220/ 173500 | consumed samples: 30520320 | consumed tokens: 62505615360 | elapsed time per iteration (s): 0.10 | learning rate: 6.083E-05 | global batch size: 256 | lm loss: 4.513131E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2552.329 | TFLOPs: 9.49 | 7: iteration 119230/ 173500 | consumed samples: 30522880 | consumed tokens: 62510858240 | elapsed time per iteration (s): 0.09 | learning rate: 6.081E-05 | global batch size: 256 | lm loss: 4.508854E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.162 | TFLOPs: 10.58 | 7: iteration 119240/ 173500 | consumed samples: 30525440 | consumed tokens: 62516101120 | elapsed time per iteration (s): 0.12 | learning rate: 6.080E-05 | global batch size: 256 | lm loss: 4.508844E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2085.215 | TFLOPs: 7.76 | 7: iteration 119250/ 173500 | consumed samples: 30528000 | consumed tokens: 62521344000 | elapsed time per iteration (s): 0.09 | learning rate: 6.079E-05 | global batch size: 256 | lm loss: 4.503555E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.826 | TFLOPs: 10.51 | 7: iteration 119260/ 173500 | consumed samples: 30530560 | consumed tokens: 62526586880 | elapsed time per iteration (s): 0.10 | learning rate: 6.077E-05 | global batch size: 256 | lm loss: 4.503898E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.348 | TFLOPs: 9.72 | 7: iteration 119270/ 173500 | consumed samples: 30533120 | consumed tokens: 62531829760 | elapsed time per iteration (s): 0.11 | learning rate: 6.076E-05 | global batch size: 256 | lm loss: 4.521940E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2419.364 | TFLOPs: 9.00 | 7: iteration 119280/ 173500 | consumed samples: 30535680 | consumed tokens: 62537072640 | elapsed time per iteration (s): 0.12 | learning rate: 6.075E-05 | global batch size: 256 | lm loss: 4.513377E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2187.009 | TFLOPs: 8.13 | 7: iteration 119290/ 173500 | consumed samples: 30538240 | consumed tokens: 62542315520 | elapsed time per iteration (s): 0.11 | learning rate: 6.073E-05 | global batch size: 256 | lm loss: 4.527122E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2266.584 | TFLOPs: 8.43 | 7: iteration 119300/ 173500 | consumed samples: 30540800 | consumed tokens: 62547558400 | elapsed time per iteration (s): 0.10 | learning rate: 6.072E-05 | global batch size: 256 | lm loss: 4.516338E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2656.411 | TFLOPs: 9.88 | 7: iteration 119310/ 173500 | consumed samples: 30543360 | consumed tokens: 62552801280 | elapsed time per iteration (s): 0.09 | learning rate: 6.070E-05 | global batch size: 256 | lm loss: 4.517566E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.013 | TFLOPs: 10.48 | 7: iteration 119320/ 173500 | consumed samples: 30545920 | consumed tokens: 62558044160 | elapsed time per iteration (s): 0.12 | learning rate: 6.069E-05 | global batch size: 256 | lm loss: 4.512263E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2079.597 | TFLOPs: 7.74 | 7: iteration 119330/ 173500 | consumed samples: 30548480 | consumed tokens: 62563287040 | elapsed time per iteration (s): 0.13 | learning rate: 6.068E-05 | global batch size: 256 | lm loss: 4.514106E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2012.138 | TFLOPs: 7.48 | 7: iteration 119340/ 173500 | consumed samples: 30551040 | consumed tokens: 62568529920 | elapsed time per iteration (s): 0.09 | learning rate: 6.066E-05 | global batch size: 256 | lm loss: 4.513533E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.352 | TFLOPs: 10.32 | 7: iteration 119350/ 173500 | consumed samples: 30553600 | consumed tokens: 62573772800 | elapsed time per iteration (s): 0.08 | learning rate: 6.065E-05 | global batch size: 256 | lm loss: 4.511020E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.373 | TFLOPs: 12.01 | 7: iteration 119360/ 173500 | consumed samples: 30556160 | consumed tokens: 62579015680 | elapsed time per iteration (s): 0.09 | learning rate: 6.064E-05 | global batch size: 256 | lm loss: 4.513367E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2870.165 | TFLOPs: 10.68 | 7: iteration 119370/ 173500 | consumed samples: 30558720 | consumed tokens: 62584258560 | elapsed time per iteration (s): 0.09 | learning rate: 6.062E-05 | global batch size: 256 | lm loss: 4.514164E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.339 | TFLOPs: 10.23 | 7: iteration 119380/ 173500 | consumed samples: 30561280 | consumed tokens: 62589501440 | elapsed time per iteration (s): 0.08 | learning rate: 6.061E-05 | global batch size: 256 | lm loss: 4.506294E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.032 | TFLOPs: 12.00 | 7: iteration 119390/ 173500 | consumed samples: 30563840 | consumed tokens: 62594744320 | elapsed time per iteration (s): 0.10 | learning rate: 6.059E-05 | global batch size: 256 | lm loss: 4.506641E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2668.453 | TFLOPs: 9.93 | 7: iteration 119400/ 173500 | consumed samples: 30566400 | consumed tokens: 62599987200 | elapsed time per iteration (s): 0.09 | learning rate: 6.058E-05 | global batch size: 256 | lm loss: 4.504410E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.141 | TFLOPs: 10.97 | 7: iteration 119410/ 173500 | consumed samples: 30568960 | consumed tokens: 62605230080 | elapsed time per iteration (s): 0.08 | learning rate: 6.057E-05 | global batch size: 256 | lm loss: 4.513997E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.491 | TFLOPs: 12.04 | 7: iteration 119420/ 173500 | consumed samples: 30571520 | consumed tokens: 62610472960 | elapsed time per iteration (s): 0.10 | learning rate: 6.055E-05 | global batch size: 256 | lm loss: 4.525213E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.863 | TFLOPs: 9.80 | 7: iteration 119430/ 173500 | consumed samples: 30574080 | consumed tokens: 62615715840 | elapsed time per iteration (s): 0.09 | learning rate: 6.054E-05 | global batch size: 256 | lm loss: 4.527174E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2732.583 | TFLOPs: 10.16 | 7: iteration 119440/ 173500 | consumed samples: 30576640 | consumed tokens: 62620958720 | elapsed time per iteration (s): 0.12 | learning rate: 6.053E-05 | global batch size: 256 | lm loss: 4.515898E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.792 | TFLOPs: 8.21 | 7: iteration 119450/ 173500 | consumed samples: 30579200 | consumed tokens: 62626201600 | elapsed time per iteration (s): 0.11 | learning rate: 6.051E-05 | global batch size: 256 | lm loss: 4.517600E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2231.522 | TFLOPs: 8.30 | 7: iteration 119460/ 173500 | consumed samples: 30581760 | consumed tokens: 62631444480 | elapsed time per iteration (s): 0.08 | learning rate: 6.050E-05 | global batch size: 256 | lm loss: 4.508953E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.344 | TFLOPs: 11.55 | 7: iteration 119470/ 173500 | consumed samples: 30584320 | consumed tokens: 62636687360 | elapsed time per iteration (s): 0.08 | learning rate: 6.048E-05 | global batch size: 256 | lm loss: 4.511368E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.186 | TFLOPs: 11.97 | 7: iteration 119480/ 173500 | consumed samples: 30586880 | consumed tokens: 62641930240 | elapsed time per iteration (s): 0.08 | learning rate: 6.047E-05 | global batch size: 256 | lm loss: 4.510965E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.412 | TFLOPs: 11.97 | 7: iteration 119490/ 173500 | consumed samples: 30589440 | consumed tokens: 62647173120 | elapsed time per iteration (s): 0.08 | learning rate: 6.046E-05 | global batch size: 256 | lm loss: 4.501793E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.871 | TFLOPs: 11.64 | 7: iteration 119500/ 173500 | consumed samples: 30592000 | consumed tokens: 62652416000 | elapsed time per iteration (s): 0.11 | learning rate: 6.044E-05 | global batch size: 256 | lm loss: 4.500732E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.810 | TFLOPs: 8.88 | 7: iteration 119510/ 173500 | consumed samples: 30594560 | consumed tokens: 62657658880 | elapsed time per iteration (s): 0.10 | learning rate: 6.043E-05 | global batch size: 256 | lm loss: 4.517062E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.339 | TFLOPs: 9.19 | 7: iteration 119520/ 173500 | consumed samples: 30597120 | consumed tokens: 62662901760 | elapsed time per iteration (s): 0.10 | learning rate: 6.042E-05 | global batch size: 256 | lm loss: 4.517377E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2571.017 | TFLOPs: 9.56 | 7: iteration 119530/ 173500 | consumed samples: 30599680 | consumed tokens: 62668144640 | elapsed time per iteration (s): 0.12 | learning rate: 6.040E-05 | global batch size: 256 | lm loss: 4.522759E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.739 | TFLOPs: 7.75 | 7: iteration 119540/ 173500 | consumed samples: 30602240 | consumed tokens: 62673387520 | elapsed time per iteration (s): 0.12 | learning rate: 6.039E-05 | global batch size: 256 | lm loss: 4.507712E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2103.158 | TFLOPs: 7.82 | 7: iteration 119550/ 173500 | consumed samples: 30604800 | consumed tokens: 62678630400 | elapsed time per iteration (s): 0.12 | learning rate: 6.037E-05 | global batch size: 256 | lm loss: 4.523671E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2190.375 | TFLOPs: 8.15 | 7: iteration 119560/ 173500 | consumed samples: 30607360 | consumed tokens: 62683873280 | elapsed time per iteration (s): 0.10 | learning rate: 6.036E-05 | global batch size: 256 | lm loss: 4.514110E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.925 | TFLOPs: 9.83 | 7: iteration 119570/ 173500 | consumed samples: 30609920 | consumed tokens: 62689116160 | elapsed time per iteration (s): 0.08 | learning rate: 6.035E-05 | global batch size: 256 | lm loss: 4.519008E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.765 | TFLOPs: 11.96 | 7: iteration 119580/ 173500 | consumed samples: 30612480 | consumed tokens: 62694359040 | elapsed time per iteration (s): 0.08 | learning rate: 6.033E-05 | global batch size: 256 | lm loss: 4.510140E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.740 | TFLOPs: 11.59 | 7: iteration 119590/ 173500 | consumed samples: 30615040 | consumed tokens: 62699601920 | elapsed time per iteration (s): 0.11 | learning rate: 6.032E-05 | global batch size: 256 | lm loss: 4.504460E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.113 | TFLOPs: 9.02 | 7: iteration 119600/ 173500 | consumed samples: 30617600 | consumed tokens: 62704844800 | elapsed time per iteration (s): 0.11 | learning rate: 6.031E-05 | global batch size: 256 | lm loss: 4.512882E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.512 | TFLOPs: 8.88 | 7: iteration 119610/ 173500 | consumed samples: 30620160 | consumed tokens: 62710087680 | elapsed time per iteration (s): 0.11 | learning rate: 6.029E-05 | global batch size: 256 | lm loss: 4.518034E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2391.821 | TFLOPs: 8.90 | 7: iteration 119620/ 173500 | consumed samples: 30622720 | consumed tokens: 62715330560 | elapsed time per iteration (s): 0.09 | learning rate: 6.028E-05 | global batch size: 256 | lm loss: 4.516090E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.685 | TFLOPs: 10.44 | 7: iteration 119630/ 173500 | consumed samples: 30625280 | consumed tokens: 62720573440 | elapsed time per iteration (s): 0.08 | learning rate: 6.026E-05 | global batch size: 256 | lm loss: 4.510550E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.431 | TFLOPs: 11.69 | 7: iteration 119640/ 173500 | consumed samples: 30627840 | consumed tokens: 62725816320 | elapsed time per iteration (s): 0.09 | learning rate: 6.025E-05 | global batch size: 256 | lm loss: 4.507671E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.175 | TFLOPs: 10.15 | 7: iteration 119650/ 173500 | consumed samples: 30630400 | consumed tokens: 62731059200 | elapsed time per iteration (s): 0.11 | learning rate: 6.024E-05 | global batch size: 256 | lm loss: 4.522006E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.136 | TFLOPs: 8.86 | 7: iteration 119660/ 173500 | consumed samples: 30632960 | consumed tokens: 62736302080 | elapsed time per iteration (s): 0.09 | learning rate: 6.022E-05 | global batch size: 256 | lm loss: 4.506175E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.752 | TFLOPs: 10.03 | 7: iteration 119670/ 173500 | consumed samples: 30635520 | consumed tokens: 62741544960 | elapsed time per iteration (s): 0.09 | learning rate: 6.021E-05 | global batch size: 256 | lm loss: 4.515086E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.781 | TFLOPs: 10.54 | 7: iteration 119680/ 173500 | consumed samples: 30638080 | consumed tokens: 62746787840 | elapsed time per iteration (s): 0.10 | learning rate: 6.020E-05 | global batch size: 256 | lm loss: 4.502181E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.909 | TFLOPs: 9.61 | 7: iteration 119690/ 173500 | consumed samples: 30640640 | consumed tokens: 62752030720 | elapsed time per iteration (s): 0.09 | learning rate: 6.018E-05 | global batch size: 256 | lm loss: 4.506181E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2805.595 | TFLOPs: 10.44 | 7: iteration 119700/ 173500 | consumed samples: 30643200 | consumed tokens: 62757273600 | elapsed time per iteration (s): 0.08 | learning rate: 6.017E-05 | global batch size: 256 | lm loss: 4.514854E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.769 | TFLOPs: 12.02 | 7: iteration 119710/ 173500 | consumed samples: 30645760 | consumed tokens: 62762516480 | elapsed time per iteration (s): 0.08 | learning rate: 6.015E-05 | global batch size: 256 | lm loss: 4.505341E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.740 | TFLOPs: 12.01 | 7: iteration 119720/ 173500 | consumed samples: 30648320 | consumed tokens: 62767759360 | elapsed time per iteration (s): 0.08 | learning rate: 6.014E-05 | global batch size: 256 | lm loss: 4.523701E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.902 | TFLOPs: 11.66 | 7: iteration 119730/ 173500 | consumed samples: 30650880 | consumed tokens: 62773002240 | elapsed time per iteration (s): 0.11 | learning rate: 6.013E-05 | global batch size: 256 | lm loss: 4.515084E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.421 | TFLOPs: 8.82 | 7: iteration 119740/ 173500 | consumed samples: 30653440 | consumed tokens: 62778245120 | elapsed time per iteration (s): 0.08 | learning rate: 6.011E-05 | global batch size: 256 | lm loss: 4.511660E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.588 | TFLOPs: 11.47 | 7: iteration 119750/ 173500 | consumed samples: 30656000 | consumed tokens: 62783488000 | elapsed time per iteration (s): 0.11 | learning rate: 6.010E-05 | global batch size: 256 | lm loss: 4.510358E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.481 | TFLOPs: 9.04 | 7: iteration 119760/ 173500 | consumed samples: 30658560 | consumed tokens: 62788730880 | elapsed time per iteration (s): 0.12 | learning rate: 6.009E-05 | global batch size: 256 | lm loss: 4.511676E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2172.495 | TFLOPs: 8.08 | 7: iteration 119770/ 173500 | consumed samples: 30661120 | consumed tokens: 62793973760 | elapsed time per iteration (s): 0.10 | learning rate: 6.007E-05 | global batch size: 256 | lm loss: 4.517672E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2555.277 | TFLOPs: 9.50 | 7: iteration 119780/ 173500 | consumed samples: 30663680 | consumed tokens: 62799216640 | elapsed time per iteration (s): 0.09 | learning rate: 6.006E-05 | global batch size: 256 | lm loss: 4.522186E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2848.190 | TFLOPs: 10.59 | 7: iteration 119790/ 173500 | consumed samples: 30666240 | consumed tokens: 62804459520 | elapsed time per iteration (s): 0.09 | learning rate: 6.004E-05 | global batch size: 256 | lm loss: 4.517076E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.085 | TFLOPs: 10.30 | 7: iteration 119800/ 173500 | consumed samples: 30668800 | consumed tokens: 62809702400 | elapsed time per iteration (s): 0.08 | learning rate: 6.003E-05 | global batch size: 256 | lm loss: 4.514718E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.578 | TFLOPs: 11.27 | 7: iteration 119810/ 173500 | consumed samples: 30671360 | consumed tokens: 62814945280 | elapsed time per iteration (s): 0.10 | learning rate: 6.002E-05 | global batch size: 256 | lm loss: 4.505295E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.414 | TFLOPs: 9.16 | 7: iteration 119820/ 173500 | consumed samples: 30673920 | consumed tokens: 62820188160 | elapsed time per iteration (s): 0.11 | learning rate: 6.000E-05 | global batch size: 256 | lm loss: 4.512714E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.865 | TFLOPs: 8.86 | 7: iteration 119830/ 173500 | consumed samples: 30676480 | consumed tokens: 62825431040 | elapsed time per iteration (s): 0.11 | learning rate: 5.999E-05 | global batch size: 256 | lm loss: 4.519755E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2326.730 | TFLOPs: 8.65 | 7: iteration 119840/ 173500 | consumed samples: 30679040 | consumed tokens: 62830673920 | elapsed time per iteration (s): 0.08 | learning rate: 5.998E-05 | global batch size: 256 | lm loss: 4.516360E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.582 | TFLOPs: 11.72 | 7: iteration 119850/ 173500 | consumed samples: 30681600 | consumed tokens: 62835916800 | elapsed time per iteration (s): 0.08 | learning rate: 5.996E-05 | global batch size: 256 | lm loss: 4.504680E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.713 | TFLOPs: 11.99 | 7: iteration 119860/ 173500 | consumed samples: 30684160 | consumed tokens: 62841159680 | elapsed time per iteration (s): 0.09 | learning rate: 5.995E-05 | global batch size: 256 | lm loss: 4.522157E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.214 | TFLOPs: 10.03 | 7: iteration 119870/ 173500 | consumed samples: 30686720 | consumed tokens: 62846402560 | elapsed time per iteration (s): 0.11 | learning rate: 5.994E-05 | global batch size: 256 | lm loss: 4.494723E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.120 | TFLOPs: 8.88 | 7: iteration 119880/ 173500 | consumed samples: 30689280 | consumed tokens: 62851645440 | elapsed time per iteration (s): 0.10 | learning rate: 5.992E-05 | global batch size: 256 | lm loss: 4.520412E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2495.437 | TFLOPs: 9.28 | 7: iteration 119890/ 173500 | consumed samples: 30691840 | consumed tokens: 62856888320 | elapsed time per iteration (s): 0.08 | learning rate: 5.991E-05 | global batch size: 256 | lm loss: 4.514754E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.209 | TFLOPs: 11.99 | 7: iteration 119900/ 173500 | consumed samples: 30694400 | consumed tokens: 62862131200 | elapsed time per iteration (s): 0.08 | learning rate: 5.989E-05 | global batch size: 256 | lm loss: 4.513260E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.988 | TFLOPs: 11.33 | 7: iteration 119910/ 173500 | consumed samples: 30696960 | consumed tokens: 62867374080 | elapsed time per iteration (s): 0.10 | learning rate: 5.988E-05 | global batch size: 256 | lm loss: 4.503946E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2541.853 | TFLOPs: 9.45 | 7: iteration 119920/ 173500 | consumed samples: 30699520 | consumed tokens: 62872616960 | elapsed time per iteration (s): 0.10 | learning rate: 5.987E-05 | global batch size: 256 | lm loss: 4.511756E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.793 | TFLOPs: 9.16 | 7: iteration 119930/ 173500 | consumed samples: 30702080 | consumed tokens: 62877859840 | elapsed time per iteration (s): 0.09 | learning rate: 5.985E-05 | global batch size: 256 | lm loss: 4.521042E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.002 | TFLOPs: 10.17 | 7: iteration 119940/ 173500 | consumed samples: 30704640 | consumed tokens: 62883102720 | elapsed time per iteration (s): 0.09 | learning rate: 5.984E-05 | global batch size: 256 | lm loss: 4.498829E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.433 | TFLOPs: 10.09 | 7: iteration 119950/ 173500 | consumed samples: 30707200 | consumed tokens: 62888345600 | elapsed time per iteration (s): 0.08 | learning rate: 5.983E-05 | global batch size: 256 | lm loss: 4.516192E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.459 | TFLOPs: 12.06 | 7: iteration 119960/ 173500 | consumed samples: 30709760 | consumed tokens: 62893588480 | elapsed time per iteration (s): 0.09 | learning rate: 5.981E-05 | global batch size: 256 | lm loss: 4.515641E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.761 | TFLOPs: 10.74 | 7: iteration 119970/ 173500 | consumed samples: 30712320 | consumed tokens: 62898831360 | elapsed time per iteration (s): 0.10 | learning rate: 5.980E-05 | global batch size: 256 | lm loss: 4.502149E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.731 | TFLOPs: 9.65 | 7: iteration 119980/ 173500 | consumed samples: 30714880 | consumed tokens: 62904074240 | elapsed time per iteration (s): 0.13 | learning rate: 5.979E-05 | global batch size: 256 | lm loss: 4.511662E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.269 | TFLOPs: 7.50 | 7: iteration 119990/ 173500 | consumed samples: 30717440 | consumed tokens: 62909317120 | elapsed time per iteration (s): 0.09 | learning rate: 5.977E-05 | global batch size: 256 | lm loss: 4.521469E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.963 | TFLOPs: 10.21 | 0: [2023-03-17 03:10:09,373] [INFO] [logging.py:68:log_dist] [Rank 0] step=120000, skipped=0, lr=[5.975780833100023e-05, 5.975780833100023e-05, 5.975780833100023e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 120000/ 173500 | consumed samples: 30720000 | consumed tokens: 62914560000 | elapsed time per iteration (s): 0.08 | learning rate: 5.976E-05 | global batch size: 256 | lm loss: 4.519070E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.255 | TFLOPs: 12.03 | 0: steps: 120000 loss: 4.5612 iter time (s): 0.095 samples/sec: 2698.341 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 120000 | lm loss value: 4.428642E+00 | lm loss PPL: 8.381750E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 120000 to checkpoints_14m91b100m 0: [2023-03-17 03:10:09,431] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step120000 is begin to save! 0: [2023-03-17 03:10:09,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:10:09,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:10:09,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:10:09,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:10:09,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:10:09,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:10:09,467] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:10:09,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:10:09,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:10:09,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:10:09,473] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:10:09,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:10:09,474] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step120000/mp_rank_00_model_states.pt 0: [2023-03-17 03:10:09,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:10:09,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:10:09,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:10:09,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 2: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 5: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:10:09,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 7: [2023-03-17 03:10:09,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:10:09,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:10:09,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:10:09,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 3: [2023-03-17 03:10:09,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 4: [2023-03-17 03:10:09,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:10:09,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:10:09,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step120000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 6: [2023-03-17 03:10:09,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step120000 is ready now! 0: successfully saved checkpoint at iteration 120000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 90.32 7: iteration 120010/ 173500 | consumed samples: 30722560 | consumed tokens: 62919802880 | elapsed time per iteration (s): 0.11 | learning rate: 5.974E-05 | global batch size: 256 | lm loss: 4.491905E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2309.030 | TFLOPs: 8.59 | 7: iteration 120020/ 173500 | consumed samples: 30725120 | consumed tokens: 62925045760 | elapsed time per iteration (s): 0.13 | learning rate: 5.973E-05 | global batch size: 256 | lm loss: 4.525111E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.334 | TFLOPs: 7.37 | 7: iteration 120030/ 173500 | consumed samples: 30727680 | consumed tokens: 62930288640 | elapsed time per iteration (s): 0.12 | learning rate: 5.972E-05 | global batch size: 256 | lm loss: 4.516821E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2073.518 | TFLOPs: 7.71 | 7: iteration 120040/ 173500 | consumed samples: 30730240 | consumed tokens: 62935531520 | elapsed time per iteration (s): 0.08 | learning rate: 5.970E-05 | global batch size: 256 | lm loss: 4.509982E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.155 | TFLOPs: 11.68 | 7: iteration 120050/ 173500 | consumed samples: 30732800 | consumed tokens: 62940774400 | elapsed time per iteration (s): 0.11 | learning rate: 5.969E-05 | global batch size: 256 | lm loss: 4.509993E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2243.201 | TFLOPs: 8.34 | 7: iteration 120060/ 173500 | consumed samples: 30735360 | consumed tokens: 62946017280 | elapsed time per iteration (s): 0.12 | learning rate: 5.968E-05 | global batch size: 256 | lm loss: 4.510867E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2072.956 | TFLOPs: 7.71 | 7: iteration 120070/ 173500 | consumed samples: 30737920 | consumed tokens: 62951260160 | elapsed time per iteration (s): 0.10 | learning rate: 5.966E-05 | global batch size: 256 | lm loss: 4.514182E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2467.745 | TFLOPs: 9.18 | 7: iteration 120080/ 173500 | consumed samples: 30740480 | consumed tokens: 62956503040 | elapsed time per iteration (s): 0.09 | learning rate: 5.965E-05 | global batch size: 256 | lm loss: 4.522678E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.208 | TFLOPs: 10.90 | 7: iteration 120090/ 173500 | consumed samples: 30743040 | consumed tokens: 62961745920 | elapsed time per iteration (s): 0.13 | learning rate: 5.963E-05 | global batch size: 256 | lm loss: 4.517233E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1959.361 | TFLOPs: 7.29 | 7: iteration 120100/ 173500 | consumed samples: 30745600 | consumed tokens: 62966988800 | elapsed time per iteration (s): 0.15 | learning rate: 5.962E-05 | global batch size: 256 | lm loss: 4.517452E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1681.521 | TFLOPs: 6.25 | 7: iteration 120110/ 173500 | consumed samples: 30748160 | consumed tokens: 62972231680 | elapsed time per iteration (s): 0.15 | learning rate: 5.961E-05 | global batch size: 256 | lm loss: 4.506142E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1758.763 | TFLOPs: 6.54 | 7: iteration 120120/ 173500 | consumed samples: 30750720 | consumed tokens: 62977474560 | elapsed time per iteration (s): 0.12 | learning rate: 5.959E-05 | global batch size: 256 | lm loss: 4.515656E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.242 | TFLOPs: 7.70 | 7: iteration 120130/ 173500 | consumed samples: 30753280 | consumed tokens: 62982717440 | elapsed time per iteration (s): 0.12 | learning rate: 5.958E-05 | global batch size: 256 | lm loss: 4.514206E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2057.773 | TFLOPs: 7.65 | 7: iteration 120140/ 173500 | consumed samples: 30755840 | consumed tokens: 62987960320 | elapsed time per iteration (s): 0.10 | learning rate: 5.957E-05 | global batch size: 256 | lm loss: 4.515689E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2440.359 | TFLOPs: 9.08 | 7: iteration 120150/ 173500 | consumed samples: 30758400 | consumed tokens: 62993203200 | elapsed time per iteration (s): 0.08 | learning rate: 5.955E-05 | global batch size: 256 | lm loss: 4.515341E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.974 | TFLOPs: 12.03 | 7: iteration 120160/ 173500 | consumed samples: 30760960 | consumed tokens: 62998446080 | elapsed time per iteration (s): 0.08 | learning rate: 5.954E-05 | global batch size: 256 | lm loss: 4.509542E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.647 | TFLOPs: 11.96 | 7: iteration 120170/ 173500 | consumed samples: 30763520 | consumed tokens: 63003688960 | elapsed time per iteration (s): 0.08 | learning rate: 5.953E-05 | global batch size: 256 | lm loss: 4.515942E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.890 | TFLOPs: 12.01 | 7: iteration 120180/ 173500 | consumed samples: 30766080 | consumed tokens: 63008931840 | elapsed time per iteration (s): 0.08 | learning rate: 5.951E-05 | global batch size: 256 | lm loss: 4.498722E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.524 | TFLOPs: 11.98 | 7: iteration 120190/ 173500 | consumed samples: 30768640 | consumed tokens: 63014174720 | elapsed time per iteration (s): 0.08 | learning rate: 5.950E-05 | global batch size: 256 | lm loss: 4.521620E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.048 | TFLOPs: 11.96 | 7: iteration 120200/ 173500 | consumed samples: 30771200 | consumed tokens: 63019417600 | elapsed time per iteration (s): 0.08 | learning rate: 5.948E-05 | global batch size: 256 | lm loss: 4.518761E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.568 | TFLOPs: 11.99 | 7: iteration 120210/ 173500 | consumed samples: 30773760 | consumed tokens: 63024660480 | elapsed time per iteration (s): 0.12 | learning rate: 5.947E-05 | global batch size: 256 | lm loss: 4.518247E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2179.608 | TFLOPs: 8.11 | 7: iteration 120220/ 173500 | consumed samples: 30776320 | consumed tokens: 63029903360 | elapsed time per iteration (s): 0.08 | learning rate: 5.946E-05 | global batch size: 256 | lm loss: 4.519291E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.248 | TFLOPs: 11.84 | 7: iteration 120230/ 173500 | consumed samples: 30778880 | consumed tokens: 63035146240 | elapsed time per iteration (s): 0.13 | learning rate: 5.944E-05 | global batch size: 256 | lm loss: 4.510725E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.843 | TFLOPs: 7.37 | 7: iteration 120240/ 173500 | consumed samples: 30781440 | consumed tokens: 63040389120 | elapsed time per iteration (s): 0.08 | learning rate: 5.943E-05 | global batch size: 256 | lm loss: 4.516364E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.437 | TFLOPs: 11.28 | 7: iteration 120250/ 173500 | consumed samples: 30784000 | consumed tokens: 63045632000 | elapsed time per iteration (s): 0.08 | learning rate: 5.942E-05 | global batch size: 256 | lm loss: 4.516014E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.998 | TFLOPs: 11.94 | 7: iteration 120260/ 173500 | consumed samples: 30786560 | consumed tokens: 63050874880 | elapsed time per iteration (s): 0.08 | learning rate: 5.940E-05 | global batch size: 256 | lm loss: 4.512450E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.143 | TFLOPs: 11.91 | 7: iteration 120270/ 173500 | consumed samples: 30789120 | consumed tokens: 63056117760 | elapsed time per iteration (s): 0.09 | learning rate: 5.939E-05 | global batch size: 256 | lm loss: 4.514252E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.231 | TFLOPs: 10.10 | 7: iteration 120280/ 173500 | consumed samples: 30791680 | consumed tokens: 63061360640 | elapsed time per iteration (s): 0.11 | learning rate: 5.938E-05 | global batch size: 256 | lm loss: 4.499328E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2418.094 | TFLOPs: 8.99 | 7: iteration 120290/ 173500 | consumed samples: 30794240 | consumed tokens: 63066603520 | elapsed time per iteration (s): 0.11 | learning rate: 5.936E-05 | global batch size: 256 | lm loss: 4.510212E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2358.629 | TFLOPs: 8.77 | 7: iteration 120300/ 173500 | consumed samples: 30796800 | consumed tokens: 63071846400 | elapsed time per iteration (s): 0.08 | learning rate: 5.935E-05 | global batch size: 256 | lm loss: 4.516888E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.288 | TFLOPs: 11.86 | 7: iteration 120310/ 173500 | consumed samples: 30799360 | consumed tokens: 63077089280 | elapsed time per iteration (s): 0.08 | learning rate: 5.934E-05 | global batch size: 256 | lm loss: 4.508730E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.004 | TFLOPs: 11.87 | 7: iteration 120320/ 173500 | consumed samples: 30801920 | consumed tokens: 63082332160 | elapsed time per iteration (s): 0.11 | learning rate: 5.932E-05 | global batch size: 256 | lm loss: 4.522569E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2383.276 | TFLOPs: 8.86 | 7: iteration 120330/ 173500 | consumed samples: 30804480 | consumed tokens: 63087575040 | elapsed time per iteration (s): 0.10 | learning rate: 5.931E-05 | global batch size: 256 | lm loss: 4.509205E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.086 | TFLOPs: 9.82 | 7: iteration 120340/ 173500 | consumed samples: 30807040 | consumed tokens: 63092817920 | elapsed time per iteration (s): 0.08 | learning rate: 5.929E-05 | global batch size: 256 | lm loss: 4.524882E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.543 | TFLOPs: 11.89 | 7: iteration 120350/ 173500 | consumed samples: 30809600 | consumed tokens: 63098060800 | elapsed time per iteration (s): 0.08 | learning rate: 5.928E-05 | global batch size: 256 | lm loss: 4.513112E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.048 | TFLOPs: 11.95 | 7: iteration 120360/ 173500 | consumed samples: 30812160 | consumed tokens: 63103303680 | elapsed time per iteration (s): 0.08 | learning rate: 5.927E-05 | global batch size: 256 | lm loss: 4.504681E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.900 | TFLOPs: 11.68 | 7: iteration 120370/ 173500 | consumed samples: 30814720 | consumed tokens: 63108546560 | elapsed time per iteration (s): 0.09 | learning rate: 5.925E-05 | global batch size: 256 | lm loss: 4.511895E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.154 | TFLOPs: 11.16 | 7: iteration 120380/ 173500 | consumed samples: 30817280 | consumed tokens: 63113789440 | elapsed time per iteration (s): 0.08 | learning rate: 5.924E-05 | global batch size: 256 | lm loss: 4.501408E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.811 | TFLOPs: 11.88 | 7: iteration 120390/ 173500 | consumed samples: 30819840 | consumed tokens: 63119032320 | elapsed time per iteration (s): 0.12 | learning rate: 5.923E-05 | global batch size: 256 | lm loss: 4.516345E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2060.742 | TFLOPs: 7.67 | 7: iteration 120400/ 173500 | consumed samples: 30822400 | consumed tokens: 63124275200 | elapsed time per iteration (s): 0.08 | learning rate: 5.921E-05 | global batch size: 256 | lm loss: 4.519101E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.768 | TFLOPs: 11.23 | 7: iteration 120410/ 173500 | consumed samples: 30824960 | consumed tokens: 63129518080 | elapsed time per iteration (s): 0.08 | learning rate: 5.920E-05 | global batch size: 256 | lm loss: 4.506040E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.351 | TFLOPs: 11.93 | 7: iteration 120420/ 173500 | consumed samples: 30827520 | consumed tokens: 63134760960 | elapsed time per iteration (s): 0.08 | learning rate: 5.919E-05 | global batch size: 256 | lm loss: 4.516341E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.381 | TFLOPs: 11.93 | 7: iteration 120430/ 173500 | consumed samples: 30830080 | consumed tokens: 63140003840 | elapsed time per iteration (s): 0.08 | learning rate: 5.917E-05 | global batch size: 256 | lm loss: 4.502186E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.793 | TFLOPs: 11.92 | 7: iteration 120440/ 173500 | consumed samples: 30832640 | consumed tokens: 63145246720 | elapsed time per iteration (s): 0.09 | learning rate: 5.916E-05 | global batch size: 256 | lm loss: 4.510395E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.811 | TFLOPs: 10.82 | 7: iteration 120450/ 173500 | consumed samples: 30835200 | consumed tokens: 63150489600 | elapsed time per iteration (s): 0.08 | learning rate: 5.914E-05 | global batch size: 256 | lm loss: 4.517229E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.024 | TFLOPs: 11.87 | 7: iteration 120460/ 173500 | consumed samples: 30837760 | consumed tokens: 63155732480 | elapsed time per iteration (s): 0.08 | learning rate: 5.913E-05 | global batch size: 256 | lm loss: 4.517228E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.420 | TFLOPs: 11.94 | 7: iteration 120470/ 173500 | consumed samples: 30840320 | consumed tokens: 63160975360 | elapsed time per iteration (s): 0.08 | learning rate: 5.912E-05 | global batch size: 256 | lm loss: 4.502242E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.257 | TFLOPs: 12.03 | 7: iteration 120480/ 173500 | consumed samples: 30842880 | consumed tokens: 63166218240 | elapsed time per iteration (s): 0.08 | learning rate: 5.910E-05 | global batch size: 256 | lm loss: 4.518415E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.411 | TFLOPs: 12.06 | 7: iteration 120490/ 173500 | consumed samples: 30845440 | consumed tokens: 63171461120 | elapsed time per iteration (s): 0.09 | learning rate: 5.909E-05 | global batch size: 256 | lm loss: 4.522074E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.551 | TFLOPs: 10.10 | 7: iteration 120500/ 173500 | consumed samples: 30848000 | consumed tokens: 63176704000 | elapsed time per iteration (s): 0.10 | learning rate: 5.908E-05 | global batch size: 256 | lm loss: 4.524944E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.186 | TFLOPs: 9.76 | 7: iteration 120510/ 173500 | consumed samples: 30850560 | consumed tokens: 63181946880 | elapsed time per iteration (s): 0.09 | learning rate: 5.906E-05 | global batch size: 256 | lm loss: 4.532244E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2848.498 | TFLOPs: 10.60 | 7: iteration 120520/ 173500 | consumed samples: 30853120 | consumed tokens: 63187189760 | elapsed time per iteration (s): 0.09 | learning rate: 5.905E-05 | global batch size: 256 | lm loss: 4.520990E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.955 | TFLOPs: 10.61 | 7: iteration 120530/ 173500 | consumed samples: 30855680 | consumed tokens: 63192432640 | elapsed time per iteration (s): 0.08 | learning rate: 5.904E-05 | global batch size: 256 | lm loss: 4.526742E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.543 | TFLOPs: 11.56 | 7: iteration 120540/ 173500 | consumed samples: 30858240 | consumed tokens: 63197675520 | elapsed time per iteration (s): 0.11 | learning rate: 5.902E-05 | global batch size: 256 | lm loss: 4.514063E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.271 | TFLOPs: 8.88 | 7: iteration 120550/ 173500 | consumed samples: 30860800 | consumed tokens: 63202918400 | elapsed time per iteration (s): 0.13 | learning rate: 5.901E-05 | global batch size: 256 | lm loss: 4.513180E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.666 | TFLOPs: 7.61 | 7: iteration 120560/ 173500 | consumed samples: 30863360 | consumed tokens: 63208161280 | elapsed time per iteration (s): 0.13 | learning rate: 5.900E-05 | global batch size: 256 | lm loss: 4.506418E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1900.640 | TFLOPs: 7.07 | 7: iteration 120570/ 173500 | consumed samples: 30865920 | consumed tokens: 63213404160 | elapsed time per iteration (s): 0.14 | learning rate: 5.898E-05 | global batch size: 256 | lm loss: 4.523448E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1851.896 | TFLOPs: 6.89 | 7: iteration 120580/ 173500 | consumed samples: 30868480 | consumed tokens: 63218647040 | elapsed time per iteration (s): 0.12 | learning rate: 5.897E-05 | global batch size: 256 | lm loss: 4.514284E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2069.182 | TFLOPs: 7.70 | 7: iteration 120590/ 173500 | consumed samples: 30871040 | consumed tokens: 63223889920 | elapsed time per iteration (s): 0.13 | learning rate: 5.895E-05 | global batch size: 256 | lm loss: 4.507962E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2028.950 | TFLOPs: 7.55 | 7: iteration 120600/ 173500 | consumed samples: 30873600 | consumed tokens: 63229132800 | elapsed time per iteration (s): 0.12 | learning rate: 5.894E-05 | global batch size: 256 | lm loss: 4.515678E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2094.400 | TFLOPs: 7.79 | 7: iteration 120610/ 173500 | consumed samples: 30876160 | consumed tokens: 63234375680 | elapsed time per iteration (s): 0.11 | learning rate: 5.893E-05 | global batch size: 256 | lm loss: 4.505126E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2291.046 | TFLOPs: 8.52 | 7: iteration 120620/ 173500 | consumed samples: 30878720 | consumed tokens: 63239618560 | elapsed time per iteration (s): 0.10 | learning rate: 5.891E-05 | global batch size: 256 | lm loss: 4.511344E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2594.890 | TFLOPs: 9.65 | 7: iteration 120630/ 173500 | consumed samples: 30881280 | consumed tokens: 63244861440 | elapsed time per iteration (s): 0.08 | learning rate: 5.890E-05 | global batch size: 256 | lm loss: 4.511177E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.514 | TFLOPs: 11.87 | 7: iteration 120640/ 173500 | consumed samples: 30883840 | consumed tokens: 63250104320 | elapsed time per iteration (s): 0.08 | learning rate: 5.889E-05 | global batch size: 256 | lm loss: 4.497569E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.597 | TFLOPs: 11.91 | 7: iteration 120650/ 173500 | consumed samples: 30886400 | consumed tokens: 63255347200 | elapsed time per iteration (s): 0.09 | learning rate: 5.887E-05 | global batch size: 256 | lm loss: 4.522362E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.729 | TFLOPs: 11.02 | 7: iteration 120660/ 173500 | consumed samples: 30888960 | consumed tokens: 63260590080 | elapsed time per iteration (s): 0.12 | learning rate: 5.886E-05 | global batch size: 256 | lm loss: 4.502857E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.446 | TFLOPs: 7.78 | 7: iteration 120670/ 173500 | consumed samples: 30891520 | consumed tokens: 63265832960 | elapsed time per iteration (s): 0.09 | learning rate: 5.885E-05 | global batch size: 256 | lm loss: 4.526655E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.781 | TFLOPs: 11.06 | 7: iteration 120680/ 173500 | consumed samples: 30894080 | consumed tokens: 63271075840 | elapsed time per iteration (s): 0.08 | learning rate: 5.883E-05 | global batch size: 256 | lm loss: 4.496103E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.157 | TFLOPs: 11.48 | 7: iteration 120690/ 173500 | consumed samples: 30896640 | consumed tokens: 63276318720 | elapsed time per iteration (s): 0.08 | learning rate: 5.882E-05 | global batch size: 256 | lm loss: 4.510189E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.487 | TFLOPs: 11.90 | 7: iteration 120700/ 173500 | consumed samples: 30899200 | consumed tokens: 63281561600 | elapsed time per iteration (s): 0.08 | learning rate: 5.881E-05 | global batch size: 256 | lm loss: 4.515856E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.077 | TFLOPs: 11.92 | 7: iteration 120710/ 173500 | consumed samples: 30901760 | consumed tokens: 63286804480 | elapsed time per iteration (s): 0.08 | learning rate: 5.879E-05 | global batch size: 256 | lm loss: 4.525790E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.496 | TFLOPs: 11.88 | 7: iteration 120720/ 173500 | consumed samples: 30904320 | consumed tokens: 63292047360 | elapsed time per iteration (s): 0.09 | learning rate: 5.878E-05 | global batch size: 256 | lm loss: 4.511609E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2717.333 | TFLOPs: 10.11 | 7: iteration 120730/ 173500 | consumed samples: 30906880 | consumed tokens: 63297290240 | elapsed time per iteration (s): 0.08 | learning rate: 5.877E-05 | global batch size: 256 | lm loss: 4.505162E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.349 | TFLOPs: 11.98 | 7: iteration 120740/ 173500 | consumed samples: 30909440 | consumed tokens: 63302533120 | elapsed time per iteration (s): 0.09 | learning rate: 5.875E-05 | global batch size: 256 | lm loss: 4.522512E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.844 | TFLOPs: 10.58 | 7: iteration 120750/ 173500 | consumed samples: 30912000 | consumed tokens: 63307776000 | elapsed time per iteration (s): 0.09 | learning rate: 5.874E-05 | global batch size: 256 | lm loss: 4.511123E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.988 | TFLOPs: 10.48 | 7: iteration 120760/ 173500 | consumed samples: 30914560 | consumed tokens: 63313018880 | elapsed time per iteration (s): 0.08 | learning rate: 5.872E-05 | global batch size: 256 | lm loss: 4.521485E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.653 | TFLOPs: 11.98 | 7: iteration 120770/ 173500 | consumed samples: 30917120 | consumed tokens: 63318261760 | elapsed time per iteration (s): 0.08 | learning rate: 5.871E-05 | global batch size: 256 | lm loss: 4.497460E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.781 | TFLOPs: 11.49 | 7: iteration 120780/ 173500 | consumed samples: 30919680 | consumed tokens: 63323504640 | elapsed time per iteration (s): 0.08 | learning rate: 5.870E-05 | global batch size: 256 | lm loss: 4.505965E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.327 | TFLOPs: 11.74 | 7: iteration 120790/ 173500 | consumed samples: 30922240 | consumed tokens: 63328747520 | elapsed time per iteration (s): 0.08 | learning rate: 5.868E-05 | global batch size: 256 | lm loss: 4.519312E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.323 | TFLOPs: 12.04 | 7: iteration 120800/ 173500 | consumed samples: 30924800 | consumed tokens: 63333990400 | elapsed time per iteration (s): 0.08 | learning rate: 5.867E-05 | global batch size: 256 | lm loss: 4.508656E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.228 | TFLOPs: 12.04 | 7: iteration 120810/ 173500 | consumed samples: 30927360 | consumed tokens: 63339233280 | elapsed time per iteration (s): 0.08 | learning rate: 5.866E-05 | global batch size: 256 | lm loss: 4.513965E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.201 | TFLOPs: 11.78 | 7: iteration 120820/ 173500 | consumed samples: 30929920 | consumed tokens: 63344476160 | elapsed time per iteration (s): 0.08 | learning rate: 5.864E-05 | global batch size: 256 | lm loss: 4.515572E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.374 | TFLOPs: 12.00 | 7: iteration 120830/ 173500 | consumed samples: 30932480 | consumed tokens: 63349719040 | elapsed time per iteration (s): 0.08 | learning rate: 5.863E-05 | global batch size: 256 | lm loss: 4.517765E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.338 | TFLOPs: 11.66 | 7: iteration 120840/ 173500 | consumed samples: 30935040 | consumed tokens: 63354961920 | elapsed time per iteration (s): 0.08 | learning rate: 5.862E-05 | global batch size: 256 | lm loss: 4.504276E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.741 | TFLOPs: 11.88 | 7: iteration 120850/ 173500 | consumed samples: 30937600 | consumed tokens: 63360204800 | elapsed time per iteration (s): 0.10 | learning rate: 5.860E-05 | global batch size: 256 | lm loss: 4.515634E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2464.924 | TFLOPs: 9.17 | 7: iteration 120860/ 173500 | consumed samples: 30940160 | consumed tokens: 63365447680 | elapsed time per iteration (s): 0.12 | learning rate: 5.859E-05 | global batch size: 256 | lm loss: 4.503027E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.022 | TFLOPs: 8.04 | 7: iteration 120870/ 173500 | consumed samples: 30942720 | consumed tokens: 63370690560 | elapsed time per iteration (s): 0.09 | learning rate: 5.858E-05 | global batch size: 256 | lm loss: 4.516527E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.423 | TFLOPs: 11.19 | 7: iteration 120880/ 173500 | consumed samples: 30945280 | consumed tokens: 63375933440 | elapsed time per iteration (s): 0.08 | learning rate: 5.856E-05 | global batch size: 256 | lm loss: 4.524602E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.480 | TFLOPs: 11.54 | 7: iteration 120890/ 173500 | consumed samples: 30947840 | consumed tokens: 63381176320 | elapsed time per iteration (s): 0.10 | learning rate: 5.855E-05 | global batch size: 256 | lm loss: 4.516476E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.889 | TFLOPs: 9.72 | 7: iteration 120900/ 173500 | consumed samples: 30950400 | consumed tokens: 63386419200 | elapsed time per iteration (s): 0.08 | learning rate: 5.854E-05 | global batch size: 256 | lm loss: 4.519102E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.574 | TFLOPs: 11.50 | 7: iteration 120910/ 173500 | consumed samples: 30952960 | consumed tokens: 63391662080 | elapsed time per iteration (s): 0.09 | learning rate: 5.852E-05 | global batch size: 256 | lm loss: 4.511470E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.504 | TFLOPs: 10.98 | 7: iteration 120920/ 173500 | consumed samples: 30955520 | consumed tokens: 63396904960 | elapsed time per iteration (s): 0.08 | learning rate: 5.851E-05 | global batch size: 256 | lm loss: 4.528396E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.787 | TFLOPs: 11.90 | 7: iteration 120930/ 173500 | consumed samples: 30958080 | consumed tokens: 63402147840 | elapsed time per iteration (s): 0.08 | learning rate: 5.850E-05 | global batch size: 256 | lm loss: 4.513795E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.949 | TFLOPs: 11.79 | 7: iteration 120940/ 173500 | consumed samples: 30960640 | consumed tokens: 63407390720 | elapsed time per iteration (s): 0.11 | learning rate: 5.848E-05 | global batch size: 256 | lm loss: 4.509150E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2368.453 | TFLOPs: 8.81 | 7: iteration 120950/ 173500 | consumed samples: 30963200 | consumed tokens: 63412633600 | elapsed time per iteration (s): 0.12 | learning rate: 5.847E-05 | global batch size: 256 | lm loss: 4.507531E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2184.125 | TFLOPs: 8.12 | 7: iteration 120960/ 173500 | consumed samples: 30965760 | consumed tokens: 63417876480 | elapsed time per iteration (s): 0.09 | learning rate: 5.845E-05 | global batch size: 256 | lm loss: 4.513340E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2704.683 | TFLOPs: 10.06 | 7: iteration 120970/ 173500 | consumed samples: 30968320 | consumed tokens: 63423119360 | elapsed time per iteration (s): 0.10 | learning rate: 5.844E-05 | global batch size: 256 | lm loss: 4.504131E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2545.920 | TFLOPs: 9.47 | 7: iteration 120980/ 173500 | consumed samples: 30970880 | consumed tokens: 63428362240 | elapsed time per iteration (s): 0.13 | learning rate: 5.843E-05 | global batch size: 256 | lm loss: 4.521087E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.428 | TFLOPs: 7.58 | 7: iteration 120990/ 173500 | consumed samples: 30973440 | consumed tokens: 63433605120 | elapsed time per iteration (s): 0.13 | learning rate: 5.841E-05 | global batch size: 256 | lm loss: 4.518225E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.303 | TFLOPs: 7.28 | 7: iteration 121000/ 173500 | consumed samples: 30976000 | consumed tokens: 63438848000 | elapsed time per iteration (s): 0.11 | learning rate: 5.840E-05 | global batch size: 256 | lm loss: 4.512820E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2336.494 | TFLOPs: 8.69 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 121000 | lm loss value: 4.370322E+00 | lm loss PPL: 7.906911E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 121000 to checkpoints_14m91b100m 0: [2023-03-17 03:11:45,819] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step121000 is begin to save! 0: [2023-03-17 03:11:45,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:11:45,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:11:45,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:11:45,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:11:45,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:11:45,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:11:45,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:11:45,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:11:45,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:11:45,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:11:45,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:11:45,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:11:45,862] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step121000/mp_rank_00_model_states.pt 0: [2023-03-17 03:11:45,862] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:11:45,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:11:45,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:11:45,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:11:45,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 2: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 1: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 7: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 6: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 3: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 5: [2023-03-17 03:11:45,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:11:45,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:11:45,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step121000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:11:45,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step121000 is ready now! 0: successfully saved checkpoint at iteration 121000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.80 7: iteration 121010/ 173500 | consumed samples: 30978560 | consumed tokens: 63444090880 | elapsed time per iteration (s): 0.09 | learning rate: 5.839E-05 | global batch size: 256 | lm loss: 4.504399E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.449 | TFLOPs: 10.30 | 7: iteration 121020/ 173500 | consumed samples: 30981120 | consumed tokens: 63449333760 | elapsed time per iteration (s): 0.09 | learning rate: 5.837E-05 | global batch size: 256 | lm loss: 4.511352E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.360 | TFLOPs: 11.17 | 7: iteration 121030/ 173500 | consumed samples: 30983680 | consumed tokens: 63454576640 | elapsed time per iteration (s): 0.08 | learning rate: 5.836E-05 | global batch size: 256 | lm loss: 4.503577E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.121 | TFLOPs: 11.51 | 7: iteration 121040/ 173500 | consumed samples: 30986240 | consumed tokens: 63459819520 | elapsed time per iteration (s): 0.08 | learning rate: 5.835E-05 | global batch size: 256 | lm loss: 4.515246E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.493 | TFLOPs: 11.91 | 7: iteration 121050/ 173500 | consumed samples: 30988800 | consumed tokens: 63465062400 | elapsed time per iteration (s): 0.08 | learning rate: 5.833E-05 | global batch size: 256 | lm loss: 4.503132E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.397 | TFLOPs: 11.57 | 7: iteration 121060/ 173500 | consumed samples: 30991360 | consumed tokens: 63470305280 | elapsed time per iteration (s): 0.09 | learning rate: 5.832E-05 | global batch size: 256 | lm loss: 4.508995E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.306 | TFLOPs: 11.10 | 7: iteration 121070/ 173500 | consumed samples: 30993920 | consumed tokens: 63475548160 | elapsed time per iteration (s): 0.08 | learning rate: 5.831E-05 | global batch size: 256 | lm loss: 4.512741E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.610 | TFLOPs: 11.29 | 7: iteration 121080/ 173500 | consumed samples: 30996480 | consumed tokens: 63480791040 | elapsed time per iteration (s): 0.09 | learning rate: 5.829E-05 | global batch size: 256 | lm loss: 4.515607E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.674 | TFLOPs: 10.70 | 7: iteration 121090/ 173500 | consumed samples: 30999040 | consumed tokens: 63486033920 | elapsed time per iteration (s): 0.08 | learning rate: 5.828E-05 | global batch size: 256 | lm loss: 4.509406E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.765 | TFLOPs: 12.01 | 7: iteration 121100/ 173500 | consumed samples: 31001600 | consumed tokens: 63491276800 | elapsed time per iteration (s): 0.09 | learning rate: 5.827E-05 | global batch size: 256 | lm loss: 4.518419E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.284 | TFLOPs: 11.00 | 7: iteration 121110/ 173500 | consumed samples: 31004160 | consumed tokens: 63496519680 | elapsed time per iteration (s): 0.08 | learning rate: 5.825E-05 | global batch size: 256 | lm loss: 4.509736E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.248 | TFLOPs: 11.56 | 7: iteration 121120/ 173500 | consumed samples: 31006720 | consumed tokens: 63501762560 | elapsed time per iteration (s): 0.09 | learning rate: 5.824E-05 | global batch size: 256 | lm loss: 4.505105E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.570 | TFLOPs: 10.15 | 7: iteration 121130/ 173500 | consumed samples: 31009280 | consumed tokens: 63507005440 | elapsed time per iteration (s): 0.08 | learning rate: 5.823E-05 | global batch size: 256 | lm loss: 4.517521E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.821 | TFLOPs: 11.95 | 7: iteration 121140/ 173500 | consumed samples: 31011840 | consumed tokens: 63512248320 | elapsed time per iteration (s): 0.09 | learning rate: 5.821E-05 | global batch size: 256 | lm loss: 4.507991E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2929.776 | TFLOPs: 10.90 | 7: iteration 121150/ 173500 | consumed samples: 31014400 | consumed tokens: 63517491200 | elapsed time per iteration (s): 0.08 | learning rate: 5.820E-05 | global batch size: 256 | lm loss: 4.505834E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.853 | TFLOPs: 12.02 | 7: iteration 121160/ 173500 | consumed samples: 31016960 | consumed tokens: 63522734080 | elapsed time per iteration (s): 0.08 | learning rate: 5.818E-05 | global batch size: 256 | lm loss: 4.520975E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.825 | TFLOPs: 11.91 | 7: iteration 121170/ 173500 | consumed samples: 31019520 | consumed tokens: 63527976960 | elapsed time per iteration (s): 0.08 | learning rate: 5.817E-05 | global batch size: 256 | lm loss: 4.513273E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.208 | TFLOPs: 11.96 | 7: iteration 121180/ 173500 | consumed samples: 31022080 | consumed tokens: 63533219840 | elapsed time per iteration (s): 0.08 | learning rate: 5.816E-05 | global batch size: 256 | lm loss: 4.516584E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.500 | TFLOPs: 12.05 | 7: iteration 121190/ 173500 | consumed samples: 31024640 | consumed tokens: 63538462720 | elapsed time per iteration (s): 0.08 | learning rate: 5.814E-05 | global batch size: 256 | lm loss: 4.515184E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.547 | TFLOPs: 12.03 | 7: iteration 121200/ 173500 | consumed samples: 31027200 | consumed tokens: 63543705600 | elapsed time per iteration (s): 0.08 | learning rate: 5.813E-05 | global batch size: 256 | lm loss: 4.511201E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.711 | TFLOPs: 12.05 | 7: iteration 121210/ 173500 | consumed samples: 31029760 | consumed tokens: 63548948480 | elapsed time per iteration (s): 0.11 | learning rate: 5.812E-05 | global batch size: 256 | lm loss: 4.514745E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2328.699 | TFLOPs: 8.66 | 7: iteration 121220/ 173500 | consumed samples: 31032320 | consumed tokens: 63554191360 | elapsed time per iteration (s): 0.08 | learning rate: 5.810E-05 | global batch size: 256 | lm loss: 4.502052E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.370 | TFLOPs: 12.02 | 7: iteration 121230/ 173500 | consumed samples: 31034880 | consumed tokens: 63559434240 | elapsed time per iteration (s): 0.08 | learning rate: 5.809E-05 | global batch size: 256 | lm loss: 4.508292E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.214 | TFLOPs: 11.38 | 7: iteration 121240/ 173500 | consumed samples: 31037440 | consumed tokens: 63564677120 | elapsed time per iteration (s): 0.13 | learning rate: 5.808E-05 | global batch size: 256 | lm loss: 4.512664E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.494 | TFLOPs: 7.53 | 7: iteration 121250/ 173500 | consumed samples: 31040000 | consumed tokens: 63569920000 | elapsed time per iteration (s): 0.13 | learning rate: 5.806E-05 | global batch size: 256 | lm loss: 4.521121E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.136 | TFLOPs: 7.35 | 7: iteration 121260/ 173500 | consumed samples: 31042560 | consumed tokens: 63575162880 | elapsed time per iteration (s): 0.13 | learning rate: 5.805E-05 | global batch size: 256 | lm loss: 4.513731E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.733 | TFLOPs: 7.39 | 7: iteration 121270/ 173500 | consumed samples: 31045120 | consumed tokens: 63580405760 | elapsed time per iteration (s): 0.11 | learning rate: 5.804E-05 | global batch size: 256 | lm loss: 4.507021E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2275.125 | TFLOPs: 8.46 | 7: iteration 121280/ 173500 | consumed samples: 31047680 | consumed tokens: 63585648640 | elapsed time per iteration (s): 0.12 | learning rate: 5.802E-05 | global batch size: 256 | lm loss: 4.516780E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2171.658 | TFLOPs: 8.08 | 7: iteration 121290/ 173500 | consumed samples: 31050240 | consumed tokens: 63590891520 | elapsed time per iteration (s): 0.12 | learning rate: 5.801E-05 | global batch size: 256 | lm loss: 4.521982E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.366 | TFLOPs: 8.15 | 7: iteration 121300/ 173500 | consumed samples: 31052800 | consumed tokens: 63596134400 | elapsed time per iteration (s): 0.09 | learning rate: 5.800E-05 | global batch size: 256 | lm loss: 4.508688E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.344 | TFLOPs: 10.39 | 7: iteration 121310/ 173500 | consumed samples: 31055360 | consumed tokens: 63601377280 | elapsed time per iteration (s): 0.09 | learning rate: 5.798E-05 | global batch size: 256 | lm loss: 4.507082E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2950.208 | TFLOPs: 10.97 | 7: iteration 121320/ 173500 | consumed samples: 31057920 | consumed tokens: 63606620160 | elapsed time per iteration (s): 0.11 | learning rate: 5.797E-05 | global batch size: 256 | lm loss: 4.507222E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.795 | TFLOPs: 8.78 | 7: iteration 121330/ 173500 | consumed samples: 31060480 | consumed tokens: 63611863040 | elapsed time per iteration (s): 0.10 | learning rate: 5.796E-05 | global batch size: 256 | lm loss: 4.508967E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2444.834 | TFLOPs: 9.09 | 7: iteration 121340/ 173500 | consumed samples: 31063040 | consumed tokens: 63617105920 | elapsed time per iteration (s): 0.11 | learning rate: 5.794E-05 | global batch size: 256 | lm loss: 4.512949E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.977 | TFLOPs: 8.85 | 7: iteration 121350/ 173500 | consumed samples: 31065600 | consumed tokens: 63622348800 | elapsed time per iteration (s): 0.08 | learning rate: 5.793E-05 | global batch size: 256 | lm loss: 4.519247E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.995 | TFLOPs: 11.63 | 7: iteration 121360/ 173500 | consumed samples: 31068160 | consumed tokens: 63627591680 | elapsed time per iteration (s): 0.09 | learning rate: 5.792E-05 | global batch size: 256 | lm loss: 4.506908E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.742 | TFLOPs: 10.38 | 7: iteration 121370/ 173500 | consumed samples: 31070720 | consumed tokens: 63632834560 | elapsed time per iteration (s): 0.10 | learning rate: 5.790E-05 | global batch size: 256 | lm loss: 4.519315E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2462.809 | TFLOPs: 9.16 | 7: iteration 121380/ 173500 | consumed samples: 31073280 | consumed tokens: 63638077440 | elapsed time per iteration (s): 0.09 | learning rate: 5.789E-05 | global batch size: 256 | lm loss: 4.526847E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.339 | TFLOPs: 10.44 | 7: iteration 121390/ 173500 | consumed samples: 31075840 | consumed tokens: 63643320320 | elapsed time per iteration (s): 0.08 | learning rate: 5.788E-05 | global batch size: 256 | lm loss: 4.525679E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.127 | TFLOPs: 12.05 | 7: iteration 121400/ 173500 | consumed samples: 31078400 | consumed tokens: 63648563200 | elapsed time per iteration (s): 0.08 | learning rate: 5.786E-05 | global batch size: 256 | lm loss: 4.506966E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.895 | TFLOPs: 11.59 | 7: iteration 121410/ 173500 | consumed samples: 31080960 | consumed tokens: 63653806080 | elapsed time per iteration (s): 0.08 | learning rate: 5.785E-05 | global batch size: 256 | lm loss: 4.509499E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.983 | TFLOPs: 11.31 | 7: iteration 121420/ 173500 | consumed samples: 31083520 | consumed tokens: 63659048960 | elapsed time per iteration (s): 0.08 | learning rate: 5.784E-05 | global batch size: 256 | lm loss: 4.518351E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.345 | TFLOPs: 11.96 | 7: iteration 121430/ 173500 | consumed samples: 31086080 | consumed tokens: 63664291840 | elapsed time per iteration (s): 0.08 | learning rate: 5.782E-05 | global batch size: 256 | lm loss: 4.511455E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.327 | TFLOPs: 12.01 | 7: iteration 121440/ 173500 | consumed samples: 31088640 | consumed tokens: 63669534720 | elapsed time per iteration (s): 0.08 | learning rate: 5.781E-05 | global batch size: 256 | lm loss: 4.519166E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.881 | TFLOPs: 11.64 | 7: iteration 121450/ 173500 | consumed samples: 31091200 | consumed tokens: 63674777600 | elapsed time per iteration (s): 0.08 | learning rate: 5.780E-05 | global batch size: 256 | lm loss: 4.522430E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.244 | TFLOPs: 11.98 | 7: iteration 121460/ 173500 | consumed samples: 31093760 | consumed tokens: 63680020480 | elapsed time per iteration (s): 0.08 | learning rate: 5.778E-05 | global batch size: 256 | lm loss: 4.504963E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.498 | TFLOPs: 12.02 | 7: iteration 121470/ 173500 | consumed samples: 31096320 | consumed tokens: 63685263360 | elapsed time per iteration (s): 0.08 | learning rate: 5.777E-05 | global batch size: 256 | lm loss: 4.503032E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.761 | TFLOPs: 12.03 | 7: iteration 121480/ 173500 | consumed samples: 31098880 | consumed tokens: 63690506240 | elapsed time per iteration (s): 0.09 | learning rate: 5.776E-05 | global batch size: 256 | lm loss: 4.505927E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.209 | TFLOPs: 11.00 | 7: iteration 121490/ 173500 | consumed samples: 31101440 | consumed tokens: 63695749120 | elapsed time per iteration (s): 0.12 | learning rate: 5.774E-05 | global batch size: 256 | lm loss: 4.519573E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2217.838 | TFLOPs: 8.25 | 7: iteration 121500/ 173500 | consumed samples: 31104000 | consumed tokens: 63700992000 | elapsed time per iteration (s): 0.09 | learning rate: 5.773E-05 | global batch size: 256 | lm loss: 4.511728E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.974 | TFLOPs: 10.21 | 7: iteration 121510/ 173500 | consumed samples: 31106560 | consumed tokens: 63706234880 | elapsed time per iteration (s): 0.08 | learning rate: 5.771E-05 | global batch size: 256 | lm loss: 4.510680E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.649 | TFLOPs: 11.96 | 7: iteration 121520/ 173500 | consumed samples: 31109120 | consumed tokens: 63711477760 | elapsed time per iteration (s): 0.08 | learning rate: 5.770E-05 | global batch size: 256 | lm loss: 4.523539E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.546 | TFLOPs: 12.06 | 7: iteration 121530/ 173500 | consumed samples: 31111680 | consumed tokens: 63716720640 | elapsed time per iteration (s): 0.10 | learning rate: 5.769E-05 | global batch size: 256 | lm loss: 4.514980E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.522 | TFLOPs: 9.61 | 7: iteration 121540/ 173500 | consumed samples: 31114240 | consumed tokens: 63721963520 | elapsed time per iteration (s): 0.10 | learning rate: 5.767E-05 | global batch size: 256 | lm loss: 4.523746E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2614.397 | TFLOPs: 9.72 | 7: iteration 121550/ 173500 | consumed samples: 31116800 | consumed tokens: 63727206400 | elapsed time per iteration (s): 0.08 | learning rate: 5.766E-05 | global batch size: 256 | lm loss: 4.508067E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.536 | TFLOPs: 12.04 | 7: iteration 121560/ 173500 | consumed samples: 31119360 | consumed tokens: 63732449280 | elapsed time per iteration (s): 0.08 | learning rate: 5.765E-05 | global batch size: 256 | lm loss: 4.504240E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.051 | TFLOPs: 11.94 | 7: iteration 121570/ 173500 | consumed samples: 31121920 | consumed tokens: 63737692160 | elapsed time per iteration (s): 0.08 | learning rate: 5.763E-05 | global batch size: 256 | lm loss: 4.524530E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.017 | TFLOPs: 11.98 | 7: iteration 121580/ 173500 | consumed samples: 31124480 | consumed tokens: 63742935040 | elapsed time per iteration (s): 0.08 | learning rate: 5.762E-05 | global batch size: 256 | lm loss: 4.516826E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.923 | TFLOPs: 11.95 | 7: iteration 121590/ 173500 | consumed samples: 31127040 | consumed tokens: 63748177920 | elapsed time per iteration (s): 0.08 | learning rate: 5.761E-05 | global batch size: 256 | lm loss: 4.516808E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.487 | TFLOPs: 12.02 | 7: iteration 121600/ 173500 | consumed samples: 31129600 | consumed tokens: 63753420800 | elapsed time per iteration (s): 0.08 | learning rate: 5.759E-05 | global batch size: 256 | lm loss: 4.504697E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.339 | TFLOPs: 11.80 | 7: iteration 121610/ 173500 | consumed samples: 31132160 | consumed tokens: 63758663680 | elapsed time per iteration (s): 0.08 | learning rate: 5.758E-05 | global batch size: 256 | lm loss: 4.515524E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.713 | TFLOPs: 11.93 | 7: iteration 121620/ 173500 | consumed samples: 31134720 | consumed tokens: 63763906560 | elapsed time per iteration (s): 0.08 | learning rate: 5.757E-05 | global batch size: 256 | lm loss: 4.509274E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.203 | TFLOPs: 12.01 | 7: iteration 121630/ 173500 | consumed samples: 31137280 | consumed tokens: 63769149440 | elapsed time per iteration (s): 0.08 | learning rate: 5.755E-05 | global batch size: 256 | lm loss: 4.514688E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.866 | TFLOPs: 12.05 | 7: iteration 121640/ 173500 | consumed samples: 31139840 | consumed tokens: 63774392320 | elapsed time per iteration (s): 0.08 | learning rate: 5.754E-05 | global batch size: 256 | lm loss: 4.522231E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.352 | TFLOPs: 12.00 | 7: iteration 121650/ 173500 | consumed samples: 31142400 | consumed tokens: 63779635200 | elapsed time per iteration (s): 0.09 | learning rate: 5.753E-05 | global batch size: 256 | lm loss: 4.510888E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.523 | TFLOPs: 10.99 | 7: iteration 121660/ 173500 | consumed samples: 31144960 | consumed tokens: 63784878080 | elapsed time per iteration (s): 0.11 | learning rate: 5.751E-05 | global batch size: 256 | lm loss: 4.518259E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2417.796 | TFLOPs: 8.99 | 7: iteration 121670/ 173500 | consumed samples: 31147520 | consumed tokens: 63790120960 | elapsed time per iteration (s): 0.12 | learning rate: 5.750E-05 | global batch size: 256 | lm loss: 4.515890E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2078.030 | TFLOPs: 7.73 | 7: iteration 121680/ 173500 | consumed samples: 31150080 | consumed tokens: 63795363840 | elapsed time per iteration (s): 0.12 | learning rate: 5.749E-05 | global batch size: 256 | lm loss: 4.503445E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2065.292 | TFLOPs: 7.68 | 7: iteration 121690/ 173500 | consumed samples: 31152640 | consumed tokens: 63800606720 | elapsed time per iteration (s): 0.13 | learning rate: 5.747E-05 | global batch size: 256 | lm loss: 4.502448E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1932.016 | TFLOPs: 7.19 | 7: iteration 121700/ 173500 | consumed samples: 31155200 | consumed tokens: 63805849600 | elapsed time per iteration (s): 0.13 | learning rate: 5.746E-05 | global batch size: 256 | lm loss: 4.518386E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1911.852 | TFLOPs: 7.11 | 7: iteration 121710/ 173500 | consumed samples: 31157760 | consumed tokens: 63811092480 | elapsed time per iteration (s): 0.12 | learning rate: 5.745E-05 | global batch size: 256 | lm loss: 4.514850E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.429 | TFLOPs: 7.86 | 7: iteration 121720/ 173500 | consumed samples: 31160320 | consumed tokens: 63816335360 | elapsed time per iteration (s): 0.12 | learning rate: 5.743E-05 | global batch size: 256 | lm loss: 4.510076E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.828 | TFLOPs: 8.02 | 7: iteration 121730/ 173500 | consumed samples: 31162880 | consumed tokens: 63821578240 | elapsed time per iteration (s): 0.12 | learning rate: 5.742E-05 | global batch size: 256 | lm loss: 4.514284E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2073.047 | TFLOPs: 7.71 | 7: iteration 121740/ 173500 | consumed samples: 31165440 | consumed tokens: 63826821120 | elapsed time per iteration (s): 0.12 | learning rate: 5.741E-05 | global batch size: 256 | lm loss: 4.514244E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2182.157 | TFLOPs: 8.12 | 7: iteration 121750/ 173500 | consumed samples: 31168000 | consumed tokens: 63832064000 | elapsed time per iteration (s): 0.12 | learning rate: 5.739E-05 | global batch size: 256 | lm loss: 4.524029E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.238 | TFLOPs: 7.86 | 7: iteration 121760/ 173500 | consumed samples: 31170560 | consumed tokens: 63837306880 | elapsed time per iteration (s): 0.13 | learning rate: 5.738E-05 | global batch size: 256 | lm loss: 4.514846E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1968.783 | TFLOPs: 7.32 | 7: iteration 121770/ 173500 | consumed samples: 31173120 | consumed tokens: 63842549760 | elapsed time per iteration (s): 0.11 | learning rate: 5.737E-05 | global batch size: 256 | lm loss: 4.529498E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2332.716 | TFLOPs: 8.68 | 7: iteration 121780/ 173500 | consumed samples: 31175680 | consumed tokens: 63847792640 | elapsed time per iteration (s): 0.09 | learning rate: 5.735E-05 | global batch size: 256 | lm loss: 4.516172E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.334 | TFLOPs: 10.56 | 7: iteration 121790/ 173500 | consumed samples: 31178240 | consumed tokens: 63853035520 | elapsed time per iteration (s): 0.08 | learning rate: 5.734E-05 | global batch size: 256 | lm loss: 4.504411E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.188 | TFLOPs: 11.43 | 7: iteration 121800/ 173500 | consumed samples: 31180800 | consumed tokens: 63858278400 | elapsed time per iteration (s): 0.08 | learning rate: 5.733E-05 | global batch size: 256 | lm loss: 4.522594E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.640 | TFLOPs: 11.26 | 7: iteration 121810/ 173500 | consumed samples: 31183360 | consumed tokens: 63863521280 | elapsed time per iteration (s): 0.08 | learning rate: 5.731E-05 | global batch size: 256 | lm loss: 4.511471E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.965 | TFLOPs: 11.68 | 7: iteration 121820/ 173500 | consumed samples: 31185920 | consumed tokens: 63868764160 | elapsed time per iteration (s): 0.08 | learning rate: 5.730E-05 | global batch size: 256 | lm loss: 4.507842E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.887 | TFLOPs: 11.81 | 7: iteration 121830/ 173500 | consumed samples: 31188480 | consumed tokens: 63874007040 | elapsed time per iteration (s): 0.08 | learning rate: 5.729E-05 | global batch size: 256 | lm loss: 4.511189E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.579 | TFLOPs: 11.89 | 7: iteration 121840/ 173500 | consumed samples: 31191040 | consumed tokens: 63879249920 | elapsed time per iteration (s): 0.09 | learning rate: 5.727E-05 | global batch size: 256 | lm loss: 4.518000E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.790 | TFLOPs: 10.33 | 7: iteration 121850/ 173500 | consumed samples: 31193600 | consumed tokens: 63884492800 | elapsed time per iteration (s): 0.08 | learning rate: 5.726E-05 | global batch size: 256 | lm loss: 4.519334E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.211 | TFLOPs: 11.53 | 7: iteration 121860/ 173500 | consumed samples: 31196160 | consumed tokens: 63889735680 | elapsed time per iteration (s): 0.10 | learning rate: 5.725E-05 | global batch size: 256 | lm loss: 4.508070E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2684.495 | TFLOPs: 9.99 | 7: iteration 121870/ 173500 | consumed samples: 31198720 | consumed tokens: 63894978560 | elapsed time per iteration (s): 0.10 | learning rate: 5.723E-05 | global batch size: 256 | lm loss: 4.503865E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.864 | TFLOPs: 9.96 | 7: iteration 121880/ 173500 | consumed samples: 31201280 | consumed tokens: 63900221440 | elapsed time per iteration (s): 0.09 | learning rate: 5.722E-05 | global batch size: 256 | lm loss: 4.513719E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2763.978 | TFLOPs: 10.28 | 7: iteration 121890/ 173500 | consumed samples: 31203840 | consumed tokens: 63905464320 | elapsed time per iteration (s): 0.08 | learning rate: 5.721E-05 | global batch size: 256 | lm loss: 4.511840E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.269 | TFLOPs: 12.03 | 7: iteration 121900/ 173500 | consumed samples: 31206400 | consumed tokens: 63910707200 | elapsed time per iteration (s): 0.11 | learning rate: 5.719E-05 | global batch size: 256 | lm loss: 4.510030E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2421.737 | TFLOPs: 9.01 | 7: iteration 121910/ 173500 | consumed samples: 31208960 | consumed tokens: 63915950080 | elapsed time per iteration (s): 0.08 | learning rate: 5.718E-05 | global batch size: 256 | lm loss: 4.507616E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.635 | TFLOPs: 12.01 | 7: iteration 121920/ 173500 | consumed samples: 31211520 | consumed tokens: 63921192960 | elapsed time per iteration (s): 0.08 | learning rate: 5.717E-05 | global batch size: 256 | lm loss: 4.513243E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.975 | TFLOPs: 11.96 | 7: iteration 121930/ 173500 | consumed samples: 31214080 | consumed tokens: 63926435840 | elapsed time per iteration (s): 0.08 | learning rate: 5.715E-05 | global batch size: 256 | lm loss: 4.509780E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.406 | TFLOPs: 11.78 | 7: iteration 121940/ 173500 | consumed samples: 31216640 | consumed tokens: 63931678720 | elapsed time per iteration (s): 0.08 | learning rate: 5.714E-05 | global batch size: 256 | lm loss: 4.516560E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.154 | TFLOPs: 11.91 | 7: iteration 121950/ 173500 | consumed samples: 31219200 | consumed tokens: 63936921600 | elapsed time per iteration (s): 0.08 | learning rate: 5.713E-05 | global batch size: 256 | lm loss: 4.511595E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.053 | TFLOPs: 11.89 | 7: iteration 121960/ 173500 | consumed samples: 31221760 | consumed tokens: 63942164480 | elapsed time per iteration (s): 0.08 | learning rate: 5.711E-05 | global batch size: 256 | lm loss: 4.508224E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.745 | TFLOPs: 11.95 | 7: iteration 121970/ 173500 | consumed samples: 31224320 | consumed tokens: 63947407360 | elapsed time per iteration (s): 0.08 | learning rate: 5.710E-05 | global batch size: 256 | lm loss: 4.508578E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.072 | TFLOPs: 11.84 | 7: iteration 121980/ 173500 | consumed samples: 31226880 | consumed tokens: 63952650240 | elapsed time per iteration (s): 0.08 | learning rate: 5.709E-05 | global batch size: 256 | lm loss: 4.497755E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.582 | TFLOPs: 11.91 | 7: iteration 121990/ 173500 | consumed samples: 31229440 | consumed tokens: 63957893120 | elapsed time per iteration (s): 0.09 | learning rate: 5.707E-05 | global batch size: 256 | lm loss: 4.506733E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.129 | TFLOPs: 10.10 | 0: [2023-03-17 03:13:18,340] [INFO] [logging.py:68:log_dist] [Rank 0] step=122000, skipped=0, lr=[5.706057124448849e-05, 5.706057124448849e-05, 5.706057124448849e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 122000/ 173500 | consumed samples: 31232000 | consumed tokens: 63963136000 | elapsed time per iteration (s): 0.12 | learning rate: 5.706E-05 | global batch size: 256 | lm loss: 4.515943E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2059.318 | TFLOPs: 7.66 | 0: steps: 122000 loss: 4.5381 iter time (s): 0.093 samples/sec: 2739.727 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 122000 | lm loss value: 4.418801E+00 | lm loss PPL: 8.299670E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 122000 to checkpoints_14m91b100m 0: [2023-03-17 03:13:18,436] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step122000 is begin to save! 0: [2023-03-17 03:13:18,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:13:18,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:13:18,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:13:18,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:13:18,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:13:18,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:13:18,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:13:18,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:13:18,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:13:18,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:13:18,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:13:18,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:13:18,479] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step122000/mp_rank_00_model_states.pt 0: [2023-03-17 03:13:18,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:13:18,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:13:18,498] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:13:18,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:13:18,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 2: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 4: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 7: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 6: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 3: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 1: [2023-03-17 03:13:18,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 5: [2023-03-17 03:13:18,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:13:18,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step122000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:13:18,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step122000 is ready now! 0: successfully saved checkpoint at iteration 122000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.38 7: iteration 122010/ 173500 | consumed samples: 31234560 | consumed tokens: 63968378880 | elapsed time per iteration (s): 0.14 | learning rate: 5.705E-05 | global batch size: 256 | lm loss: 4.511355E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1825.070 | TFLOPs: 6.79 | 7: iteration 122020/ 173500 | consumed samples: 31237120 | consumed tokens: 63973621760 | elapsed time per iteration (s): 0.09 | learning rate: 5.703E-05 | global batch size: 256 | lm loss: 4.515948E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.247 | TFLOPs: 11.00 | 7: iteration 122030/ 173500 | consumed samples: 31239680 | consumed tokens: 63978864640 | elapsed time per iteration (s): 0.08 | learning rate: 5.702E-05 | global batch size: 256 | lm loss: 4.512956E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.052 | TFLOPs: 11.50 | 7: iteration 122040/ 173500 | consumed samples: 31242240 | consumed tokens: 63984107520 | elapsed time per iteration (s): 0.09 | learning rate: 5.701E-05 | global batch size: 256 | lm loss: 4.502418E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.864 | TFLOPs: 10.05 | 7: iteration 122050/ 173500 | consumed samples: 31244800 | consumed tokens: 63989350400 | elapsed time per iteration (s): 0.11 | learning rate: 5.699E-05 | global batch size: 256 | lm loss: 4.509830E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2250.714 | TFLOPs: 8.37 | 7: iteration 122060/ 173500 | consumed samples: 31247360 | consumed tokens: 63994593280 | elapsed time per iteration (s): 0.10 | learning rate: 5.698E-05 | global batch size: 256 | lm loss: 4.499879E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.312 | TFLOPs: 9.34 | 7: iteration 122070/ 173500 | consumed samples: 31249920 | consumed tokens: 63999836160 | elapsed time per iteration (s): 0.10 | learning rate: 5.697E-05 | global batch size: 256 | lm loss: 4.515326E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2645.118 | TFLOPs: 9.84 | 7: iteration 122080/ 173500 | consumed samples: 31252480 | consumed tokens: 64005079040 | elapsed time per iteration (s): 0.08 | learning rate: 5.695E-05 | global batch size: 256 | lm loss: 4.514128E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.422 | TFLOPs: 11.92 | 7: iteration 122090/ 173500 | consumed samples: 31255040 | consumed tokens: 64010321920 | elapsed time per iteration (s): 0.10 | learning rate: 5.694E-05 | global batch size: 256 | lm loss: 4.527579E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.889 | TFLOPs: 9.82 | 7: iteration 122100/ 173500 | consumed samples: 31257600 | consumed tokens: 64015564800 | elapsed time per iteration (s): 0.11 | learning rate: 5.693E-05 | global batch size: 256 | lm loss: 4.512306E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.552 | TFLOPs: 8.95 | 7: iteration 122110/ 173500 | consumed samples: 31260160 | consumed tokens: 64020807680 | elapsed time per iteration (s): 0.10 | learning rate: 5.691E-05 | global batch size: 256 | lm loss: 4.520251E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.798 | TFLOPs: 9.09 | 7: iteration 122120/ 173500 | consumed samples: 31262720 | consumed tokens: 64026050560 | elapsed time per iteration (s): 0.10 | learning rate: 5.690E-05 | global batch size: 256 | lm loss: 4.517797E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.610 | TFLOPs: 9.17 | 7: iteration 122130/ 173500 | consumed samples: 31265280 | consumed tokens: 64031293440 | elapsed time per iteration (s): 0.08 | learning rate: 5.689E-05 | global batch size: 256 | lm loss: 4.509283E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.372 | TFLOPs: 11.94 | 7: iteration 122140/ 173500 | consumed samples: 31267840 | consumed tokens: 64036536320 | elapsed time per iteration (s): 0.08 | learning rate: 5.687E-05 | global batch size: 256 | lm loss: 4.517541E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.496 | TFLOPs: 11.35 | 7: iteration 122150/ 173500 | consumed samples: 31270400 | consumed tokens: 64041779200 | elapsed time per iteration (s): 0.08 | learning rate: 5.686E-05 | global batch size: 256 | lm loss: 4.510518E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.816 | TFLOPs: 11.96 | 7: iteration 122160/ 173500 | consumed samples: 31272960 | consumed tokens: 64047022080 | elapsed time per iteration (s): 0.08 | learning rate: 5.685E-05 | global batch size: 256 | lm loss: 4.524314E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.579 | TFLOPs: 11.93 | 7: iteration 122170/ 173500 | consumed samples: 31275520 | consumed tokens: 64052264960 | elapsed time per iteration (s): 0.08 | learning rate: 5.683E-05 | global batch size: 256 | lm loss: 4.498671E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.135 | TFLOPs: 11.85 | 7: iteration 122180/ 173500 | consumed samples: 31278080 | consumed tokens: 64057507840 | elapsed time per iteration (s): 0.11 | learning rate: 5.682E-05 | global batch size: 256 | lm loss: 4.513313E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2390.209 | TFLOPs: 8.89 | 7: iteration 122190/ 173500 | consumed samples: 31280640 | consumed tokens: 64062750720 | elapsed time per iteration (s): 0.08 | learning rate: 5.681E-05 | global batch size: 256 | lm loss: 4.506620E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.951 | TFLOPs: 12.06 | 7: iteration 122200/ 173500 | consumed samples: 31283200 | consumed tokens: 64067993600 | elapsed time per iteration (s): 0.10 | learning rate: 5.679E-05 | global batch size: 256 | lm loss: 4.522292E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2601.537 | TFLOPs: 9.68 | 7: iteration 122210/ 173500 | consumed samples: 31285760 | consumed tokens: 64073236480 | elapsed time per iteration (s): 0.08 | learning rate: 5.678E-05 | global batch size: 256 | lm loss: 4.522102E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.319 | TFLOPs: 12.00 | 7: iteration 122220/ 173500 | consumed samples: 31288320 | consumed tokens: 64078479360 | elapsed time per iteration (s): 0.08 | learning rate: 5.677E-05 | global batch size: 256 | lm loss: 4.517578E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.979 | TFLOPs: 11.98 | 7: iteration 122230/ 173500 | consumed samples: 31290880 | consumed tokens: 64083722240 | elapsed time per iteration (s): 0.08 | learning rate: 5.675E-05 | global batch size: 256 | lm loss: 4.510498E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.701 | TFLOPs: 11.98 | 7: iteration 122240/ 173500 | consumed samples: 31293440 | consumed tokens: 64088965120 | elapsed time per iteration (s): 0.08 | learning rate: 5.674E-05 | global batch size: 256 | lm loss: 4.515696E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.438 | TFLOPs: 11.25 | 7: iteration 122250/ 173500 | consumed samples: 31296000 | consumed tokens: 64094208000 | elapsed time per iteration (s): 0.08 | learning rate: 5.673E-05 | global batch size: 256 | lm loss: 4.515304E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.813 | TFLOPs: 11.97 | 7: iteration 122260/ 173500 | consumed samples: 31298560 | consumed tokens: 64099450880 | elapsed time per iteration (s): 0.08 | learning rate: 5.672E-05 | global batch size: 256 | lm loss: 4.510356E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.254 | TFLOPs: 11.67 | 7: iteration 122270/ 173500 | consumed samples: 31301120 | consumed tokens: 64104693760 | elapsed time per iteration (s): 0.08 | learning rate: 5.670E-05 | global batch size: 256 | lm loss: 4.508007E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.550 | TFLOPs: 11.50 | 7: iteration 122280/ 173500 | consumed samples: 31303680 | consumed tokens: 64109936640 | elapsed time per iteration (s): 0.08 | learning rate: 5.669E-05 | global batch size: 256 | lm loss: 4.514070E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.948 | TFLOPs: 11.89 | 7: iteration 122290/ 173500 | consumed samples: 31306240 | consumed tokens: 64115179520 | elapsed time per iteration (s): 0.08 | learning rate: 5.668E-05 | global batch size: 256 | lm loss: 4.498232E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.613 | TFLOPs: 11.98 | 7: iteration 122300/ 173500 | consumed samples: 31308800 | consumed tokens: 64120422400 | elapsed time per iteration (s): 0.10 | learning rate: 5.666E-05 | global batch size: 256 | lm loss: 4.515909E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.282 | TFLOPs: 9.36 | 7: iteration 122310/ 173500 | consumed samples: 31311360 | consumed tokens: 64125665280 | elapsed time per iteration (s): 0.10 | learning rate: 5.665E-05 | global batch size: 256 | lm loss: 4.520329E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2454.013 | TFLOPs: 9.13 | 7: iteration 122320/ 173500 | consumed samples: 31313920 | consumed tokens: 64130908160 | elapsed time per iteration (s): 0.10 | learning rate: 5.664E-05 | global batch size: 256 | lm loss: 4.524852E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2578.540 | TFLOPs: 9.59 | 7: iteration 122330/ 173500 | consumed samples: 31316480 | consumed tokens: 64136151040 | elapsed time per iteration (s): 0.10 | learning rate: 5.662E-05 | global batch size: 256 | lm loss: 4.516695E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.748 | TFLOPs: 9.45 | 7: iteration 122340/ 173500 | consumed samples: 31319040 | consumed tokens: 64141393920 | elapsed time per iteration (s): 0.09 | learning rate: 5.661E-05 | global batch size: 256 | lm loss: 4.514928E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.030 | TFLOPs: 10.77 | 7: iteration 122350/ 173500 | consumed samples: 31321600 | consumed tokens: 64146636800 | elapsed time per iteration (s): 0.11 | learning rate: 5.660E-05 | global batch size: 256 | lm loss: 4.524299E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.164 | TFLOPs: 8.88 | 7: iteration 122360/ 173500 | consumed samples: 31324160 | consumed tokens: 64151879680 | elapsed time per iteration (s): 0.10 | learning rate: 5.658E-05 | global batch size: 256 | lm loss: 4.506037E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2673.210 | TFLOPs: 9.94 | 7: iteration 122370/ 173500 | consumed samples: 31326720 | consumed tokens: 64157122560 | elapsed time per iteration (s): 0.08 | learning rate: 5.657E-05 | global batch size: 256 | lm loss: 4.502272E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.158 | TFLOPs: 11.96 | 7: iteration 122380/ 173500 | consumed samples: 31329280 | consumed tokens: 64162365440 | elapsed time per iteration (s): 0.08 | learning rate: 5.656E-05 | global batch size: 256 | lm loss: 4.513335E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.155 | TFLOPs: 11.96 | 7: iteration 122390/ 173500 | consumed samples: 31331840 | consumed tokens: 64167608320 | elapsed time per iteration (s): 0.08 | learning rate: 5.654E-05 | global batch size: 256 | lm loss: 4.516173E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.525 | TFLOPs: 11.89 | 7: iteration 122400/ 173500 | consumed samples: 31334400 | consumed tokens: 64172851200 | elapsed time per iteration (s): 0.08 | learning rate: 5.653E-05 | global batch size: 256 | lm loss: 4.511155E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.795 | TFLOPs: 11.89 | 7: iteration 122410/ 173500 | consumed samples: 31336960 | consumed tokens: 64178094080 | elapsed time per iteration (s): 0.08 | learning rate: 5.652E-05 | global batch size: 256 | lm loss: 4.518022E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.726 | TFLOPs: 11.90 | 7: iteration 122420/ 173500 | consumed samples: 31339520 | consumed tokens: 64183336960 | elapsed time per iteration (s): 0.09 | learning rate: 5.650E-05 | global batch size: 256 | lm loss: 4.502336E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.131 | TFLOPs: 10.81 | 7: iteration 122430/ 173500 | consumed samples: 31342080 | consumed tokens: 64188579840 | elapsed time per iteration (s): 0.10 | learning rate: 5.649E-05 | global batch size: 256 | lm loss: 4.518597E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.882 | TFLOPs: 9.16 | 7: iteration 122440/ 173500 | consumed samples: 31344640 | consumed tokens: 64193822720 | elapsed time per iteration (s): 0.10 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 4.512806E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.707 | TFLOPs: 9.16 | 7: iteration 122450/ 173500 | consumed samples: 31347200 | consumed tokens: 64199065600 | elapsed time per iteration (s): 0.08 | learning rate: 5.646E-05 | global batch size: 256 | lm loss: 4.512502E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.107 | TFLOPs: 11.60 | 7: iteration 122460/ 173500 | consumed samples: 31349760 | consumed tokens: 64204308480 | elapsed time per iteration (s): 0.08 | learning rate: 5.645E-05 | global batch size: 256 | lm loss: 4.497467E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.683 | TFLOPs: 12.00 | 7: iteration 122470/ 173500 | consumed samples: 31352320 | consumed tokens: 64209551360 | elapsed time per iteration (s): 0.09 | learning rate: 5.644E-05 | global batch size: 256 | lm loss: 4.519314E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.679 | TFLOPs: 10.90 | 7: iteration 122480/ 173500 | consumed samples: 31354880 | consumed tokens: 64214794240 | elapsed time per iteration (s): 0.11 | learning rate: 5.642E-05 | global batch size: 256 | lm loss: 4.520917E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.416 | TFLOPs: 8.66 | 7: iteration 122490/ 173500 | consumed samples: 31357440 | consumed tokens: 64220037120 | elapsed time per iteration (s): 0.08 | learning rate: 5.641E-05 | global batch size: 256 | lm loss: 4.515727E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.000 | TFLOPs: 11.87 | 7: iteration 122500/ 173500 | consumed samples: 31360000 | consumed tokens: 64225280000 | elapsed time per iteration (s): 0.08 | learning rate: 5.640E-05 | global batch size: 256 | lm loss: 4.502611E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.365 | TFLOPs: 11.72 | 7: iteration 122510/ 173500 | consumed samples: 31362560 | consumed tokens: 64230522880 | elapsed time per iteration (s): 0.08 | learning rate: 5.638E-05 | global batch size: 256 | lm loss: 4.506008E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.639 | TFLOPs: 11.86 | 7: iteration 122520/ 173500 | consumed samples: 31365120 | consumed tokens: 64235765760 | elapsed time per iteration (s): 0.08 | learning rate: 5.637E-05 | global batch size: 256 | lm loss: 4.502500E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.058 | TFLOPs: 11.82 | 7: iteration 122530/ 173500 | consumed samples: 31367680 | consumed tokens: 64241008640 | elapsed time per iteration (s): 0.09 | learning rate: 5.636E-05 | global batch size: 256 | lm loss: 4.515051E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.139 | TFLOPs: 11.09 | 7: iteration 122540/ 173500 | consumed samples: 31370240 | consumed tokens: 64246251520 | elapsed time per iteration (s): 0.08 | learning rate: 5.634E-05 | global batch size: 256 | lm loss: 4.518771E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.435 | TFLOPs: 11.86 | 7: iteration 122550/ 173500 | consumed samples: 31372800 | consumed tokens: 64251494400 | elapsed time per iteration (s): 0.08 | learning rate: 5.633E-05 | global batch size: 256 | lm loss: 4.512096E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.818 | TFLOPs: 11.79 | 7: iteration 122560/ 173500 | consumed samples: 31375360 | consumed tokens: 64256737280 | elapsed time per iteration (s): 0.09 | learning rate: 5.632E-05 | global batch size: 256 | lm loss: 4.523987E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2697.436 | TFLOPs: 10.03 | 7: iteration 122570/ 173500 | consumed samples: 31377920 | consumed tokens: 64261980160 | elapsed time per iteration (s): 0.08 | learning rate: 5.630E-05 | global batch size: 256 | lm loss: 4.516648E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.488 | TFLOPs: 11.84 | 7: iteration 122580/ 173500 | consumed samples: 31380480 | consumed tokens: 64267223040 | elapsed time per iteration (s): 0.08 | learning rate: 5.629E-05 | global batch size: 256 | lm loss: 4.519263E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.964 | TFLOPs: 11.86 | 7: iteration 122590/ 173500 | consumed samples: 31383040 | consumed tokens: 64272465920 | elapsed time per iteration (s): 0.08 | learning rate: 5.628E-05 | global batch size: 256 | lm loss: 4.513206E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.227 | TFLOPs: 11.76 | 7: iteration 122600/ 173500 | consumed samples: 31385600 | consumed tokens: 64277708800 | elapsed time per iteration (s): 0.08 | learning rate: 5.627E-05 | global batch size: 256 | lm loss: 4.506499E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.929 | TFLOPs: 11.80 | 7: iteration 122610/ 173500 | consumed samples: 31388160 | consumed tokens: 64282951680 | elapsed time per iteration (s): 0.08 | learning rate: 5.625E-05 | global batch size: 256 | lm loss: 4.515226E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.613 | TFLOPs: 11.94 | 7: iteration 122620/ 173500 | consumed samples: 31390720 | consumed tokens: 64288194560 | elapsed time per iteration (s): 0.09 | learning rate: 5.624E-05 | global batch size: 256 | lm loss: 4.511634E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.910 | TFLOPs: 10.38 | 7: iteration 122630/ 173500 | consumed samples: 31393280 | consumed tokens: 64293437440 | elapsed time per iteration (s): 0.08 | learning rate: 5.623E-05 | global batch size: 256 | lm loss: 4.530975E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.774 | TFLOPs: 11.81 | 7: iteration 122640/ 173500 | consumed samples: 31395840 | consumed tokens: 64298680320 | elapsed time per iteration (s): 0.08 | learning rate: 5.621E-05 | global batch size: 256 | lm loss: 4.492828E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.680 | TFLOPs: 12.03 | 7: iteration 122650/ 173500 | consumed samples: 31398400 | consumed tokens: 64303923200 | elapsed time per iteration (s): 0.08 | learning rate: 5.620E-05 | global batch size: 256 | lm loss: 4.510152E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.557 | TFLOPs: 12.06 | 7: iteration 122660/ 173500 | consumed samples: 31400960 | consumed tokens: 64309166080 | elapsed time per iteration (s): 0.08 | learning rate: 5.619E-05 | global batch size: 256 | lm loss: 4.521872E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.865 | TFLOPs: 12.02 | 7: iteration 122670/ 173500 | consumed samples: 31403520 | consumed tokens: 64314408960 | elapsed time per iteration (s): 0.08 | learning rate: 5.617E-05 | global batch size: 256 | lm loss: 4.518480E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.565 | TFLOPs: 12.08 | 7: iteration 122680/ 173500 | consumed samples: 31406080 | consumed tokens: 64319651840 | elapsed time per iteration (s): 0.10 | learning rate: 5.616E-05 | global batch size: 256 | lm loss: 4.513383E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2578.952 | TFLOPs: 9.59 | 7: iteration 122690/ 173500 | consumed samples: 31408640 | consumed tokens: 64324894720 | elapsed time per iteration (s): 0.09 | learning rate: 5.615E-05 | global batch size: 256 | lm loss: 4.504663E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.435 | TFLOPs: 10.97 | 7: iteration 122700/ 173500 | consumed samples: 31411200 | consumed tokens: 64330137600 | elapsed time per iteration (s): 0.08 | learning rate: 5.613E-05 | global batch size: 256 | lm loss: 4.514088E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.892 | TFLOPs: 12.04 | 7: iteration 122710/ 173500 | consumed samples: 31413760 | consumed tokens: 64335380480 | elapsed time per iteration (s): 0.10 | learning rate: 5.612E-05 | global batch size: 256 | lm loss: 4.513467E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2567.763 | TFLOPs: 9.55 | 7: iteration 122720/ 173500 | consumed samples: 31416320 | consumed tokens: 64340623360 | elapsed time per iteration (s): 0.09 | learning rate: 5.611E-05 | global batch size: 256 | lm loss: 4.502456E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.679 | TFLOPs: 10.13 | 7: iteration 122730/ 173500 | consumed samples: 31418880 | consumed tokens: 64345866240 | elapsed time per iteration (s): 0.09 | learning rate: 5.609E-05 | global batch size: 256 | lm loss: 4.495217E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.842 | TFLOPs: 10.76 | 7: iteration 122740/ 173500 | consumed samples: 31421440 | consumed tokens: 64351109120 | elapsed time per iteration (s): 0.08 | learning rate: 5.608E-05 | global batch size: 256 | lm loss: 4.518091E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.591 | TFLOPs: 12.02 | 7: iteration 122750/ 173500 | consumed samples: 31424000 | consumed tokens: 64356352000 | elapsed time per iteration (s): 0.08 | learning rate: 5.607E-05 | global batch size: 256 | lm loss: 4.519428E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.864 | TFLOPs: 12.05 | 7: iteration 122760/ 173500 | consumed samples: 31426560 | consumed tokens: 64361594880 | elapsed time per iteration (s): 0.08 | learning rate: 5.605E-05 | global batch size: 256 | lm loss: 4.520163E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.168 | TFLOPs: 12.07 | 7: iteration 122770/ 173500 | consumed samples: 31429120 | consumed tokens: 64366837760 | elapsed time per iteration (s): 0.13 | learning rate: 5.604E-05 | global batch size: 256 | lm loss: 4.521581E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2045.245 | TFLOPs: 7.61 | 7: iteration 122780/ 173500 | consumed samples: 31431680 | consumed tokens: 64372080640 | elapsed time per iteration (s): 0.09 | learning rate: 5.603E-05 | global batch size: 256 | lm loss: 4.519305E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.658 | TFLOPs: 11.14 | 7: iteration 122790/ 173500 | consumed samples: 31434240 | consumed tokens: 64377323520 | elapsed time per iteration (s): 0.08 | learning rate: 5.601E-05 | global batch size: 256 | lm loss: 4.512651E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.446 | TFLOPs: 11.85 | 7: iteration 122800/ 173500 | consumed samples: 31436800 | consumed tokens: 64382566400 | elapsed time per iteration (s): 0.08 | learning rate: 5.600E-05 | global batch size: 256 | lm loss: 4.508911E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.423 | TFLOPs: 11.96 | 7: iteration 122810/ 173500 | consumed samples: 31439360 | consumed tokens: 64387809280 | elapsed time per iteration (s): 0.09 | learning rate: 5.599E-05 | global batch size: 256 | lm loss: 4.509346E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.049 | TFLOPs: 10.69 | 7: iteration 122820/ 173500 | consumed samples: 31441920 | consumed tokens: 64393052160 | elapsed time per iteration (s): 0.08 | learning rate: 5.597E-05 | global batch size: 256 | lm loss: 4.516298E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.649 | TFLOPs: 11.97 | 7: iteration 122830/ 173500 | consumed samples: 31444480 | consumed tokens: 64398295040 | elapsed time per iteration (s): 0.08 | learning rate: 5.596E-05 | global batch size: 256 | lm loss: 4.504572E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.016 | TFLOPs: 12.04 | 7: iteration 122840/ 173500 | consumed samples: 31447040 | consumed tokens: 64403537920 | elapsed time per iteration (s): 0.08 | learning rate: 5.595E-05 | global batch size: 256 | lm loss: 4.522065E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.319 | TFLOPs: 12.10 | 7: iteration 122850/ 173500 | consumed samples: 31449600 | consumed tokens: 64408780800 | elapsed time per iteration (s): 0.08 | learning rate: 5.594E-05 | global batch size: 256 | lm loss: 4.512151E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3264.508 | TFLOPs: 12.14 | 7: iteration 122860/ 173500 | consumed samples: 31452160 | consumed tokens: 64414023680 | elapsed time per iteration (s): 0.08 | learning rate: 5.592E-05 | global batch size: 256 | lm loss: 4.496963E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.323 | TFLOPs: 12.03 | 7: iteration 122870/ 173500 | consumed samples: 31454720 | consumed tokens: 64419266560 | elapsed time per iteration (s): 0.08 | learning rate: 5.591E-05 | global batch size: 256 | lm loss: 4.508297E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.820 | TFLOPs: 11.98 | 7: iteration 122880/ 173500 | consumed samples: 31457280 | consumed tokens: 64424509440 | elapsed time per iteration (s): 0.08 | learning rate: 5.590E-05 | global batch size: 256 | lm loss: 4.513400E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.851 | TFLOPs: 11.98 | 7: iteration 122890/ 173500 | consumed samples: 31459840 | consumed tokens: 64429752320 | elapsed time per iteration (s): 0.08 | learning rate: 5.588E-05 | global batch size: 256 | lm loss: 4.509391E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.544 | TFLOPs: 11.99 | 7: iteration 122900/ 173500 | consumed samples: 31462400 | consumed tokens: 64434995200 | elapsed time per iteration (s): 0.08 | learning rate: 5.587E-05 | global batch size: 256 | lm loss: 4.513461E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.791 | TFLOPs: 11.46 | 7: iteration 122910/ 173500 | consumed samples: 31464960 | consumed tokens: 64440238080 | elapsed time per iteration (s): 0.10 | learning rate: 5.586E-05 | global batch size: 256 | lm loss: 4.520074E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2524.774 | TFLOPs: 9.39 | 7: iteration 122920/ 173500 | consumed samples: 31467520 | consumed tokens: 64445480960 | elapsed time per iteration (s): 0.09 | learning rate: 5.584E-05 | global batch size: 256 | lm loss: 4.509909E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2747.306 | TFLOPs: 10.22 | 7: iteration 122930/ 173500 | consumed samples: 31470080 | consumed tokens: 64450723840 | elapsed time per iteration (s): 0.08 | learning rate: 5.583E-05 | global batch size: 256 | lm loss: 4.508445E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.105 | TFLOPs: 11.30 | 7: iteration 122940/ 173500 | consumed samples: 31472640 | consumed tokens: 64455966720 | elapsed time per iteration (s): 0.08 | learning rate: 5.582E-05 | global batch size: 256 | lm loss: 4.517728E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.244 | TFLOPs: 11.97 | 7: iteration 122950/ 173500 | consumed samples: 31475200 | consumed tokens: 64461209600 | elapsed time per iteration (s): 0.08 | learning rate: 5.580E-05 | global batch size: 256 | lm loss: 4.507594E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.614 | TFLOPs: 11.86 | 7: iteration 122960/ 173500 | consumed samples: 31477760 | consumed tokens: 64466452480 | elapsed time per iteration (s): 0.08 | learning rate: 5.579E-05 | global batch size: 256 | lm loss: 4.511689E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.538 | TFLOPs: 11.93 | 7: iteration 122970/ 173500 | consumed samples: 31480320 | consumed tokens: 64471695360 | elapsed time per iteration (s): 0.08 | learning rate: 5.578E-05 | global batch size: 256 | lm loss: 4.513271E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.218 | TFLOPs: 11.96 | 7: iteration 122980/ 173500 | consumed samples: 31482880 | consumed tokens: 64476938240 | elapsed time per iteration (s): 0.08 | learning rate: 5.576E-05 | global batch size: 256 | lm loss: 4.512998E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.673 | TFLOPs: 11.97 | 7: iteration 122990/ 173500 | consumed samples: 31485440 | consumed tokens: 64482181120 | elapsed time per iteration (s): 0.10 | learning rate: 5.575E-05 | global batch size: 256 | lm loss: 4.511019E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.591 | TFLOPs: 9.27 | 7: iteration 123000/ 173500 | consumed samples: 31488000 | consumed tokens: 64487424000 | elapsed time per iteration (s): 0.09 | learning rate: 5.574E-05 | global batch size: 256 | lm loss: 4.517353E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.026 | TFLOPs: 10.65 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 123000 | lm loss value: 4.424799E+00 | lm loss PPL: 8.349602E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 123000 to checkpoints_14m91b100m 0: [2023-03-17 03:14:46,013] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step123000 is begin to save! 0: [2023-03-17 03:14:46,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:14:46,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:14:46,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:14:46,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:14:46,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:14:46,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:14:46,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:14:46,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:14:46,061] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:14:46,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:14:46,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:14:46,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:14:46,065] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step123000/mp_rank_00_model_states.pt 0: [2023-03-17 03:14:46,065] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:14:46,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:14:46,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:14:46,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 5: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 2: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 3: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:14:46,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 4: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 7: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 1: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:14:46,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step123000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:14:46,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step123000 is ready now! 0: successfully saved checkpoint at iteration 123000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 88.76 7: iteration 123010/ 173500 | consumed samples: 31490560 | consumed tokens: 64492666880 | elapsed time per iteration (s): 0.09 | learning rate: 5.573E-05 | global batch size: 256 | lm loss: 4.511205E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.363 | TFLOPs: 10.32 | 7: iteration 123020/ 173500 | consumed samples: 31493120 | consumed tokens: 64497909760 | elapsed time per iteration (s): 0.08 | learning rate: 5.571E-05 | global batch size: 256 | lm loss: 4.511257E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.664 | TFLOPs: 11.89 | 7: iteration 123030/ 173500 | consumed samples: 31495680 | consumed tokens: 64503152640 | elapsed time per iteration (s): 0.08 | learning rate: 5.570E-05 | global batch size: 256 | lm loss: 4.521383E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.817 | TFLOPs: 11.97 | 7: iteration 123040/ 173500 | consumed samples: 31498240 | consumed tokens: 64508395520 | elapsed time per iteration (s): 0.08 | learning rate: 5.569E-05 | global batch size: 256 | lm loss: 4.514122E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.008 | TFLOPs: 11.95 | 7: iteration 123050/ 173500 | consumed samples: 31500800 | consumed tokens: 64513638400 | elapsed time per iteration (s): 0.08 | learning rate: 5.567E-05 | global batch size: 256 | lm loss: 4.509688E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.396 | TFLOPs: 11.76 | 7: iteration 123060/ 173500 | consumed samples: 31503360 | consumed tokens: 64518881280 | elapsed time per iteration (s): 0.12 | learning rate: 5.566E-05 | global batch size: 256 | lm loss: 4.511083E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.464 | TFLOPs: 7.70 | 7: iteration 123070/ 173500 | consumed samples: 31505920 | consumed tokens: 64524124160 | elapsed time per iteration (s): 0.09 | learning rate: 5.565E-05 | global batch size: 256 | lm loss: 4.516899E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.422 | TFLOPs: 10.73 | 7: iteration 123080/ 173500 | consumed samples: 31508480 | consumed tokens: 64529367040 | elapsed time per iteration (s): 0.08 | learning rate: 5.563E-05 | global batch size: 256 | lm loss: 4.523857E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.500 | TFLOPs: 11.86 | 7: iteration 123090/ 173500 | consumed samples: 31511040 | consumed tokens: 64534609920 | elapsed time per iteration (s): 0.10 | learning rate: 5.562E-05 | global batch size: 256 | lm loss: 4.510869E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2587.998 | TFLOPs: 9.63 | 7: iteration 123100/ 173500 | consumed samples: 31513600 | consumed tokens: 64539852800 | elapsed time per iteration (s): 0.09 | learning rate: 5.561E-05 | global batch size: 256 | lm loss: 4.508933E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.139 | TFLOPs: 10.97 | 7: iteration 123110/ 173500 | consumed samples: 31516160 | consumed tokens: 64545095680 | elapsed time per iteration (s): 0.08 | learning rate: 5.559E-05 | global batch size: 256 | lm loss: 4.501071E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.952 | TFLOPs: 11.65 | 7: iteration 123120/ 173500 | consumed samples: 31518720 | consumed tokens: 64550338560 | elapsed time per iteration (s): 0.08 | learning rate: 5.558E-05 | global batch size: 256 | lm loss: 4.521001E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.903 | TFLOPs: 11.89 | 7: iteration 123130/ 173500 | consumed samples: 31521280 | consumed tokens: 64555581440 | elapsed time per iteration (s): 0.08 | learning rate: 5.557E-05 | global batch size: 256 | lm loss: 4.519484E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.207 | TFLOPs: 11.86 | 7: iteration 123140/ 173500 | consumed samples: 31523840 | consumed tokens: 64560824320 | elapsed time per iteration (s): 0.08 | learning rate: 5.555E-05 | global batch size: 256 | lm loss: 4.514077E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.907 | TFLOPs: 11.84 | 7: iteration 123150/ 173500 | consumed samples: 31526400 | consumed tokens: 64566067200 | elapsed time per iteration (s): 0.08 | learning rate: 5.554E-05 | global batch size: 256 | lm loss: 4.518479E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.981 | TFLOPs: 11.89 | 7: iteration 123160/ 173500 | consumed samples: 31528960 | consumed tokens: 64571310080 | elapsed time per iteration (s): 0.08 | learning rate: 5.553E-05 | global batch size: 256 | lm loss: 4.513678E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.981 | TFLOPs: 11.87 | 7: iteration 123170/ 173500 | consumed samples: 31531520 | consumed tokens: 64576552960 | elapsed time per iteration (s): 0.13 | learning rate: 5.552E-05 | global batch size: 256 | lm loss: 4.520490E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2020.499 | TFLOPs: 7.52 | 7: iteration 123180/ 173500 | consumed samples: 31534080 | consumed tokens: 64581795840 | elapsed time per iteration (s): 0.09 | learning rate: 5.550E-05 | global batch size: 256 | lm loss: 4.509458E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.135 | TFLOPs: 10.05 | 7: iteration 123190/ 173500 | consumed samples: 31536640 | consumed tokens: 64587038720 | elapsed time per iteration (s): 0.08 | learning rate: 5.549E-05 | global batch size: 256 | lm loss: 4.509168E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.213 | TFLOPs: 12.01 | 7: iteration 123200/ 173500 | consumed samples: 31539200 | consumed tokens: 64592281600 | elapsed time per iteration (s): 0.08 | learning rate: 5.548E-05 | global batch size: 256 | lm loss: 4.516272E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.964 | TFLOPs: 11.70 | 7: iteration 123210/ 173500 | consumed samples: 31541760 | consumed tokens: 64597524480 | elapsed time per iteration (s): 0.10 | learning rate: 5.546E-05 | global batch size: 256 | lm loss: 4.504883E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2494.550 | TFLOPs: 9.28 | 7: iteration 123220/ 173500 | consumed samples: 31544320 | consumed tokens: 64602767360 | elapsed time per iteration (s): 0.10 | learning rate: 5.545E-05 | global batch size: 256 | lm loss: 4.523362E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.169 | TFLOPs: 9.22 | 7: iteration 123230/ 173500 | consumed samples: 31546880 | consumed tokens: 64608010240 | elapsed time per iteration (s): 0.12 | learning rate: 5.544E-05 | global batch size: 256 | lm loss: 4.510777E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2202.251 | TFLOPs: 8.19 | 7: iteration 123240/ 173500 | consumed samples: 31549440 | consumed tokens: 64613253120 | elapsed time per iteration (s): 0.11 | learning rate: 5.542E-05 | global batch size: 256 | lm loss: 4.514962E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.791 | TFLOPs: 8.38 | 7: iteration 123250/ 173500 | consumed samples: 31552000 | consumed tokens: 64618496000 | elapsed time per iteration (s): 0.11 | learning rate: 5.541E-05 | global batch size: 256 | lm loss: 4.515601E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2367.049 | TFLOPs: 8.80 | 7: iteration 123260/ 173500 | consumed samples: 31554560 | consumed tokens: 64623738880 | elapsed time per iteration (s): 0.11 | learning rate: 5.540E-05 | global batch size: 256 | lm loss: 4.502585E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.798 | TFLOPs: 8.75 | 7: iteration 123270/ 173500 | consumed samples: 31557120 | consumed tokens: 64628981760 | elapsed time per iteration (s): 0.10 | learning rate: 5.538E-05 | global batch size: 256 | lm loss: 4.517512E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2498.046 | TFLOPs: 9.29 | 7: iteration 123280/ 173500 | consumed samples: 31559680 | consumed tokens: 64634224640 | elapsed time per iteration (s): 0.08 | learning rate: 5.537E-05 | global batch size: 256 | lm loss: 4.515794E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.248 | TFLOPs: 11.94 | 7: iteration 123290/ 173500 | consumed samples: 31562240 | consumed tokens: 64639467520 | elapsed time per iteration (s): 0.08 | learning rate: 5.536E-05 | global batch size: 256 | lm loss: 4.522303E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.785 | TFLOPs: 11.95 | 7: iteration 123300/ 173500 | consumed samples: 31564800 | consumed tokens: 64644710400 | elapsed time per iteration (s): 0.08 | learning rate: 5.535E-05 | global batch size: 256 | lm loss: 4.527602E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.509 | TFLOPs: 11.89 | 7: iteration 123310/ 173500 | consumed samples: 31567360 | consumed tokens: 64649953280 | elapsed time per iteration (s): 0.08 | learning rate: 5.533E-05 | global batch size: 256 | lm loss: 4.516403E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.365 | TFLOPs: 11.80 | 7: iteration 123320/ 173500 | consumed samples: 31569920 | consumed tokens: 64655196160 | elapsed time per iteration (s): 0.08 | learning rate: 5.532E-05 | global batch size: 256 | lm loss: 4.523077E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.920 | TFLOPs: 11.96 | 7: iteration 123330/ 173500 | consumed samples: 31572480 | consumed tokens: 64660439040 | elapsed time per iteration (s): 0.08 | learning rate: 5.531E-05 | global batch size: 256 | lm loss: 4.522186E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.966 | TFLOPs: 11.93 | 7: iteration 123340/ 173500 | consumed samples: 31575040 | consumed tokens: 64665681920 | elapsed time per iteration (s): 0.08 | learning rate: 5.529E-05 | global batch size: 256 | lm loss: 4.509955E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.299 | TFLOPs: 11.96 | 7: iteration 123350/ 173500 | consumed samples: 31577600 | consumed tokens: 64670924800 | elapsed time per iteration (s): 0.08 | learning rate: 5.528E-05 | global batch size: 256 | lm loss: 4.510839E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3251.721 | TFLOPs: 12.09 | 7: iteration 123360/ 173500 | consumed samples: 31580160 | consumed tokens: 64676167680 | elapsed time per iteration (s): 0.09 | learning rate: 5.527E-05 | global batch size: 256 | lm loss: 4.505673E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.426 | TFLOPs: 11.06 | 7: iteration 123370/ 173500 | consumed samples: 31582720 | consumed tokens: 64681410560 | elapsed time per iteration (s): 0.08 | learning rate: 5.525E-05 | global batch size: 256 | lm loss: 4.519848E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3260.402 | TFLOPs: 12.13 | 7: iteration 123380/ 173500 | consumed samples: 31585280 | consumed tokens: 64686653440 | elapsed time per iteration (s): 0.08 | learning rate: 5.524E-05 | global batch size: 256 | lm loss: 4.509950E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.278 | TFLOPs: 12.01 | 7: iteration 123390/ 173500 | consumed samples: 31587840 | consumed tokens: 64691896320 | elapsed time per iteration (s): 0.08 | learning rate: 5.523E-05 | global batch size: 256 | lm loss: 4.515215E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.805 | TFLOPs: 12.06 | 7: iteration 123400/ 173500 | consumed samples: 31590400 | consumed tokens: 64697139200 | elapsed time per iteration (s): 0.08 | learning rate: 5.521E-05 | global batch size: 256 | lm loss: 4.501922E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.841 | TFLOPs: 12.07 | 7: iteration 123410/ 173500 | consumed samples: 31592960 | consumed tokens: 64702382080 | elapsed time per iteration (s): 0.08 | learning rate: 5.520E-05 | global batch size: 256 | lm loss: 4.508800E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3254.697 | TFLOPs: 12.11 | 7: iteration 123420/ 173500 | consumed samples: 31595520 | consumed tokens: 64707624960 | elapsed time per iteration (s): 0.08 | learning rate: 5.519E-05 | global batch size: 256 | lm loss: 4.512276E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.807 | TFLOPs: 11.95 | 7: iteration 123430/ 173500 | consumed samples: 31598080 | consumed tokens: 64712867840 | elapsed time per iteration (s): 0.08 | learning rate: 5.518E-05 | global batch size: 256 | lm loss: 4.526311E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3257.270 | TFLOPs: 12.12 | 7: iteration 123440/ 173500 | consumed samples: 31600640 | consumed tokens: 64718110720 | elapsed time per iteration (s): 0.08 | learning rate: 5.516E-05 | global batch size: 256 | lm loss: 4.511938E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.037 | TFLOPs: 11.82 | 7: iteration 123450/ 173500 | consumed samples: 31603200 | consumed tokens: 64723353600 | elapsed time per iteration (s): 0.11 | learning rate: 5.515E-05 | global batch size: 256 | lm loss: 4.511617E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2317.326 | TFLOPs: 8.62 | 7: iteration 123460/ 173500 | consumed samples: 31605760 | consumed tokens: 64728596480 | elapsed time per iteration (s): 0.08 | learning rate: 5.514E-05 | global batch size: 256 | lm loss: 4.523991E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.249 | TFLOPs: 11.99 | 7: iteration 123470/ 173500 | consumed samples: 31608320 | consumed tokens: 64733839360 | elapsed time per iteration (s): 0.09 | learning rate: 5.512E-05 | global batch size: 256 | lm loss: 4.515709E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.437 | TFLOPs: 10.15 | 7: iteration 123480/ 173500 | consumed samples: 31610880 | consumed tokens: 64739082240 | elapsed time per iteration (s): 0.12 | learning rate: 5.511E-05 | global batch size: 256 | lm loss: 4.514844E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2064.291 | TFLOPs: 7.68 | 7: iteration 123490/ 173500 | consumed samples: 31613440 | consumed tokens: 64744325120 | elapsed time per iteration (s): 0.13 | learning rate: 5.510E-05 | global batch size: 256 | lm loss: 4.512125E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2023.742 | TFLOPs: 7.53 | 7: iteration 123500/ 173500 | consumed samples: 31616000 | consumed tokens: 64749568000 | elapsed time per iteration (s): 0.10 | learning rate: 5.508E-05 | global batch size: 256 | lm loss: 4.511326E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.857 | TFLOPs: 9.37 | 7: iteration 123510/ 173500 | consumed samples: 31618560 | consumed tokens: 64754810880 | elapsed time per iteration (s): 0.10 | learning rate: 5.507E-05 | global batch size: 256 | lm loss: 4.519489E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.008 | TFLOPs: 9.36 | 7: iteration 123520/ 173500 | consumed samples: 31621120 | consumed tokens: 64760053760 | elapsed time per iteration (s): 0.10 | learning rate: 5.506E-05 | global batch size: 256 | lm loss: 4.505307E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.936 | TFLOPs: 9.29 | 7: iteration 123530/ 173500 | consumed samples: 31623680 | consumed tokens: 64765296640 | elapsed time per iteration (s): 0.11 | learning rate: 5.504E-05 | global batch size: 256 | lm loss: 4.518736E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2358.607 | TFLOPs: 8.77 | 7: iteration 123540/ 173500 | consumed samples: 31626240 | consumed tokens: 64770539520 | elapsed time per iteration (s): 0.09 | learning rate: 5.503E-05 | global batch size: 256 | lm loss: 4.508553E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.119 | TFLOPs: 10.56 | 7: iteration 123550/ 173500 | consumed samples: 31628800 | consumed tokens: 64775782400 | elapsed time per iteration (s): 0.08 | learning rate: 5.502E-05 | global batch size: 256 | lm loss: 4.507698E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.124 | TFLOPs: 11.92 | 7: iteration 123560/ 173500 | consumed samples: 31631360 | consumed tokens: 64781025280 | elapsed time per iteration (s): 0.09 | learning rate: 5.501E-05 | global batch size: 256 | lm loss: 4.518799E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2786.763 | TFLOPs: 10.37 | 7: iteration 123570/ 173500 | consumed samples: 31633920 | consumed tokens: 64786268160 | elapsed time per iteration (s): 0.08 | learning rate: 5.499E-05 | global batch size: 256 | lm loss: 4.502694E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.610 | TFLOPs: 11.25 | 7: iteration 123580/ 173500 | consumed samples: 31636480 | consumed tokens: 64791511040 | elapsed time per iteration (s): 0.08 | learning rate: 5.498E-05 | global batch size: 256 | lm loss: 4.499885E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.191 | TFLOPs: 11.96 | 7: iteration 123590/ 173500 | consumed samples: 31639040 | consumed tokens: 64796753920 | elapsed time per iteration (s): 0.08 | learning rate: 5.497E-05 | global batch size: 256 | lm loss: 4.517968E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.787 | TFLOPs: 11.97 | 7: iteration 123600/ 173500 | consumed samples: 31641600 | consumed tokens: 64801996800 | elapsed time per iteration (s): 0.08 | learning rate: 5.495E-05 | global batch size: 256 | lm loss: 4.510212E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.888 | TFLOPs: 11.89 | 7: iteration 123610/ 173500 | consumed samples: 31644160 | consumed tokens: 64807239680 | elapsed time per iteration (s): 0.08 | learning rate: 5.494E-05 | global batch size: 256 | lm loss: 4.521194E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.086 | TFLOPs: 11.91 | 7: iteration 123620/ 173500 | consumed samples: 31646720 | consumed tokens: 64812482560 | elapsed time per iteration (s): 0.08 | learning rate: 5.493E-05 | global batch size: 256 | lm loss: 4.511449E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.511 | TFLOPs: 11.92 | 7: iteration 123630/ 173500 | consumed samples: 31649280 | consumed tokens: 64817725440 | elapsed time per iteration (s): 0.08 | learning rate: 5.491E-05 | global batch size: 256 | lm loss: 4.501310E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.913 | TFLOPs: 11.95 | 7: iteration 123640/ 173500 | consumed samples: 31651840 | consumed tokens: 64822968320 | elapsed time per iteration (s): 0.08 | learning rate: 5.490E-05 | global batch size: 256 | lm loss: 4.513083E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.411 | TFLOPs: 11.94 | 7: iteration 123650/ 173500 | consumed samples: 31654400 | consumed tokens: 64828211200 | elapsed time per iteration (s): 0.08 | learning rate: 5.489E-05 | global batch size: 256 | lm loss: 4.512609E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.751 | TFLOPs: 11.96 | 7: iteration 123660/ 173500 | consumed samples: 31656960 | consumed tokens: 64833454080 | elapsed time per iteration (s): 0.09 | learning rate: 5.488E-05 | global batch size: 256 | lm loss: 4.508886E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2781.238 | TFLOPs: 10.34 | 7: iteration 123670/ 173500 | consumed samples: 31659520 | consumed tokens: 64838696960 | elapsed time per iteration (s): 0.12 | learning rate: 5.486E-05 | global batch size: 256 | lm loss: 4.518559E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.670 | TFLOPs: 7.99 | 7: iteration 123680/ 173500 | consumed samples: 31662080 | consumed tokens: 64843939840 | elapsed time per iteration (s): 0.10 | learning rate: 5.485E-05 | global batch size: 256 | lm loss: 4.517130E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2649.768 | TFLOPs: 9.86 | 7: iteration 123690/ 173500 | consumed samples: 31664640 | consumed tokens: 64849182720 | elapsed time per iteration (s): 0.08 | learning rate: 5.484E-05 | global batch size: 256 | lm loss: 4.528516E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.728 | TFLOPs: 11.98 | 7: iteration 123700/ 173500 | consumed samples: 31667200 | consumed tokens: 64854425600 | elapsed time per iteration (s): 0.08 | learning rate: 5.482E-05 | global batch size: 256 | lm loss: 4.514223E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.609 | TFLOPs: 11.98 | 7: iteration 123710/ 173500 | consumed samples: 31669760 | consumed tokens: 64859668480 | elapsed time per iteration (s): 0.08 | learning rate: 5.481E-05 | global batch size: 256 | lm loss: 4.512296E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.246 | TFLOPs: 11.99 | 7: iteration 123720/ 173500 | consumed samples: 31672320 | consumed tokens: 64864911360 | elapsed time per iteration (s): 0.08 | learning rate: 5.480E-05 | global batch size: 256 | lm loss: 4.506326E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.018 | TFLOPs: 12.00 | 7: iteration 123730/ 173500 | consumed samples: 31674880 | consumed tokens: 64870154240 | elapsed time per iteration (s): 0.08 | learning rate: 5.478E-05 | global batch size: 256 | lm loss: 4.522352E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.491 | TFLOPs: 11.98 | 7: iteration 123740/ 173500 | consumed samples: 31677440 | consumed tokens: 64875397120 | elapsed time per iteration (s): 0.08 | learning rate: 5.477E-05 | global batch size: 256 | lm loss: 4.503134E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.925 | TFLOPs: 11.98 | 7: iteration 123750/ 173500 | consumed samples: 31680000 | consumed tokens: 64880640000 | elapsed time per iteration (s): 0.08 | learning rate: 5.476E-05 | global batch size: 256 | lm loss: 4.509786E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.355 | TFLOPs: 12.00 | 7: iteration 123760/ 173500 | consumed samples: 31682560 | consumed tokens: 64885882880 | elapsed time per iteration (s): 0.08 | learning rate: 5.475E-05 | global batch size: 256 | lm loss: 4.520962E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.750 | TFLOPs: 12.02 | 7: iteration 123770/ 173500 | consumed samples: 31685120 | consumed tokens: 64891125760 | elapsed time per iteration (s): 0.08 | learning rate: 5.473E-05 | global batch size: 256 | lm loss: 4.501512E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.391 | TFLOPs: 11.99 | 7: iteration 123780/ 173500 | consumed samples: 31687680 | consumed tokens: 64896368640 | elapsed time per iteration (s): 0.08 | learning rate: 5.472E-05 | global batch size: 256 | lm loss: 4.510679E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.824 | TFLOPs: 11.96 | 7: iteration 123790/ 173500 | consumed samples: 31690240 | consumed tokens: 64901611520 | elapsed time per iteration (s): 0.08 | learning rate: 5.471E-05 | global batch size: 256 | lm loss: 4.506247E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.095 | TFLOPs: 11.99 | 7: iteration 123800/ 173500 | consumed samples: 31692800 | consumed tokens: 64906854400 | elapsed time per iteration (s): 0.08 | learning rate: 5.469E-05 | global batch size: 256 | lm loss: 4.509945E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.436 | TFLOPs: 11.88 | 7: iteration 123810/ 173500 | consumed samples: 31695360 | consumed tokens: 64912097280 | elapsed time per iteration (s): 0.08 | learning rate: 5.468E-05 | global batch size: 256 | lm loss: 4.504507E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.267 | TFLOPs: 12.00 | 7: iteration 123820/ 173500 | consumed samples: 31697920 | consumed tokens: 64917340160 | elapsed time per iteration (s): 0.08 | learning rate: 5.467E-05 | global batch size: 256 | lm loss: 4.523907E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.473 | TFLOPs: 11.90 | 7: iteration 123830/ 173500 | consumed samples: 31700480 | consumed tokens: 64922583040 | elapsed time per iteration (s): 0.08 | learning rate: 5.465E-05 | global batch size: 256 | lm loss: 4.515839E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.535 | TFLOPs: 12.00 | 7: iteration 123840/ 173500 | consumed samples: 31703040 | consumed tokens: 64927825920 | elapsed time per iteration (s): 0.08 | learning rate: 5.464E-05 | global batch size: 256 | lm loss: 4.509557E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.505 | TFLOPs: 12.00 | 7: iteration 123850/ 173500 | consumed samples: 31705600 | consumed tokens: 64933068800 | elapsed time per iteration (s): 0.09 | learning rate: 5.463E-05 | global batch size: 256 | lm loss: 4.514427E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2717.612 | TFLOPs: 10.11 | 7: iteration 123860/ 173500 | consumed samples: 31708160 | consumed tokens: 64938311680 | elapsed time per iteration (s): 0.10 | learning rate: 5.462E-05 | global batch size: 256 | lm loss: 4.511966E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.243 | TFLOPs: 9.41 | 7: iteration 123870/ 173500 | consumed samples: 31710720 | consumed tokens: 64943554560 | elapsed time per iteration (s): 0.09 | learning rate: 5.460E-05 | global batch size: 256 | lm loss: 4.517673E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.812 | TFLOPs: 11.01 | 7: iteration 123880/ 173500 | consumed samples: 31713280 | consumed tokens: 64948797440 | elapsed time per iteration (s): 0.08 | learning rate: 5.459E-05 | global batch size: 256 | lm loss: 4.503077E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.424 | TFLOPs: 12.03 | 7: iteration 123890/ 173500 | consumed samples: 31715840 | consumed tokens: 64954040320 | elapsed time per iteration (s): 0.09 | learning rate: 5.458E-05 | global batch size: 256 | lm loss: 4.503985E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.390 | TFLOPs: 11.13 | 7: iteration 123900/ 173500 | consumed samples: 31718400 | consumed tokens: 64959283200 | elapsed time per iteration (s): 0.10 | learning rate: 5.456E-05 | global batch size: 256 | lm loss: 4.508113E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.952 | TFLOPs: 9.52 | 7: iteration 123910/ 173500 | consumed samples: 31720960 | consumed tokens: 64964526080 | elapsed time per iteration (s): 0.10 | learning rate: 5.455E-05 | global batch size: 256 | lm loss: 4.517034E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.230 | TFLOPs: 9.52 | 7: iteration 123920/ 173500 | consumed samples: 31723520 | consumed tokens: 64969768960 | elapsed time per iteration (s): 0.10 | learning rate: 5.454E-05 | global batch size: 256 | lm loss: 4.508667E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.647 | TFLOPs: 9.41 | 7: iteration 123930/ 173500 | consumed samples: 31726080 | consumed tokens: 64975011840 | elapsed time per iteration (s): 0.08 | learning rate: 5.452E-05 | global batch size: 256 | lm loss: 4.509071E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.100 | TFLOPs: 12.02 | 7: iteration 123940/ 173500 | consumed samples: 31728640 | consumed tokens: 64980254720 | elapsed time per iteration (s): 0.08 | learning rate: 5.451E-05 | global batch size: 256 | lm loss: 4.505942E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.271 | TFLOPs: 12.00 | 7: iteration 123950/ 173500 | consumed samples: 31731200 | consumed tokens: 64985497600 | elapsed time per iteration (s): 0.08 | learning rate: 5.450E-05 | global batch size: 256 | lm loss: 4.516740E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.603 | TFLOPs: 12.08 | 7: iteration 123960/ 173500 | consumed samples: 31733760 | consumed tokens: 64990740480 | elapsed time per iteration (s): 0.08 | learning rate: 5.449E-05 | global batch size: 256 | lm loss: 4.519019E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.592 | TFLOPs: 12.07 | 7: iteration 123970/ 173500 | consumed samples: 31736320 | consumed tokens: 64995983360 | elapsed time per iteration (s): 0.08 | learning rate: 5.447E-05 | global batch size: 256 | lm loss: 4.515469E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.742 | TFLOPs: 12.01 | 7: iteration 123980/ 173500 | consumed samples: 31738880 | consumed tokens: 65001226240 | elapsed time per iteration (s): 0.08 | learning rate: 5.446E-05 | global batch size: 256 | lm loss: 4.510847E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.986 | TFLOPs: 12.04 | 7: iteration 123990/ 173500 | consumed samples: 31741440 | consumed tokens: 65006469120 | elapsed time per iteration (s): 0.08 | learning rate: 5.445E-05 | global batch size: 256 | lm loss: 4.522291E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.308 | TFLOPs: 12.05 | 0: [2023-03-17 03:16:13,596] [INFO] [logging.py:68:log_dist] [Rank 0] step=124000, skipped=0, lr=[5.443416434803536e-05, 5.443416434803536e-05, 5.443416434803536e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 124000/ 173500 | consumed samples: 31744000 | consumed tokens: 65011712000 | elapsed time per iteration (s): 0.08 | learning rate: 5.443E-05 | global batch size: 256 | lm loss: 4.511732E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.341 | TFLOPs: 12.05 | 0: steps: 124000 loss: 4.4944 iter time (s): 0.087 samples/sec: 2947.339 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 124000 | lm loss value: 4.361281E+00 | lm loss PPL: 7.835748E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 124000 to checkpoints_14m91b100m 0: [2023-03-17 03:16:13,654] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step124000 is begin to save! 0: [2023-03-17 03:16:13,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:16:13,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:16:13,685] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:16:13,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:16:13,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:16:13,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:16:13,692] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:16:13,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:16:13,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:16:13,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:16:13,698] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:16:13,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:16:13,699] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step124000/mp_rank_00_model_states.pt 0: [2023-03-17 03:16:13,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:16:13,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:16:13,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:16:13,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,726] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,726] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,729] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,729] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 5: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 1: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 2: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 6: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 4: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 7: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 3: [2023-03-17 03:16:13,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step124000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:16:13,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step124000 is ready now! 0: successfully saved checkpoint at iteration 124000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.07 7: iteration 124010/ 173500 | consumed samples: 31746560 | consumed tokens: 65016954880 | elapsed time per iteration (s): 0.09 | learning rate: 5.442E-05 | global batch size: 256 | lm loss: 4.506692E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.202 | TFLOPs: 10.31 | 7: iteration 124020/ 173500 | consumed samples: 31749120 | consumed tokens: 65022197760 | elapsed time per iteration (s): 0.08 | learning rate: 5.441E-05 | global batch size: 256 | lm loss: 4.508083E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.385 | TFLOPs: 12.03 | 7: iteration 124030/ 173500 | consumed samples: 31751680 | consumed tokens: 65027440640 | elapsed time per iteration (s): 0.08 | learning rate: 5.440E-05 | global batch size: 256 | lm loss: 4.499957E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.686 | TFLOPs: 12.05 | 7: iteration 124040/ 173500 | consumed samples: 31754240 | consumed tokens: 65032683520 | elapsed time per iteration (s): 0.09 | learning rate: 5.438E-05 | global batch size: 256 | lm loss: 4.508236E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.855 | TFLOPs: 10.42 | 7: iteration 124050/ 173500 | consumed samples: 31756800 | consumed tokens: 65037926400 | elapsed time per iteration (s): 0.09 | learning rate: 5.437E-05 | global batch size: 256 | lm loss: 4.505919E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.311 | TFLOPs: 10.16 | 7: iteration 124060/ 173500 | consumed samples: 31759360 | consumed tokens: 65043169280 | elapsed time per iteration (s): 0.08 | learning rate: 5.436E-05 | global batch size: 256 | lm loss: 4.508506E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.455 | TFLOPs: 12.04 | 7: iteration 124070/ 173500 | consumed samples: 31761920 | consumed tokens: 65048412160 | elapsed time per iteration (s): 0.08 | learning rate: 5.434E-05 | global batch size: 256 | lm loss: 4.501533E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.812 | TFLOPs: 11.96 | 7: iteration 124080/ 173500 | consumed samples: 31764480 | consumed tokens: 65053655040 | elapsed time per iteration (s): 0.08 | learning rate: 5.433E-05 | global batch size: 256 | lm loss: 4.527898E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.276 | TFLOPs: 11.96 | 7: iteration 124090/ 173500 | consumed samples: 31767040 | consumed tokens: 65058897920 | elapsed time per iteration (s): 0.08 | learning rate: 5.432E-05 | global batch size: 256 | lm loss: 4.520742E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.357 | TFLOPs: 12.00 | 7: iteration 124100/ 173500 | consumed samples: 31769600 | consumed tokens: 65064140800 | elapsed time per iteration (s): 0.08 | learning rate: 5.430E-05 | global batch size: 256 | lm loss: 4.513369E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.879 | TFLOPs: 11.43 | 7: iteration 124110/ 173500 | consumed samples: 31772160 | consumed tokens: 65069383680 | elapsed time per iteration (s): 0.08 | learning rate: 5.429E-05 | global batch size: 256 | lm loss: 4.517439E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.998 | TFLOPs: 11.40 | 7: iteration 124120/ 173500 | consumed samples: 31774720 | consumed tokens: 65074626560 | elapsed time per iteration (s): 0.08 | learning rate: 5.428E-05 | global batch size: 256 | lm loss: 4.497530E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.434 | TFLOPs: 11.91 | 7: iteration 124130/ 173500 | consumed samples: 31777280 | consumed tokens: 65079869440 | elapsed time per iteration (s): 0.08 | learning rate: 5.427E-05 | global batch size: 256 | lm loss: 4.507014E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.670 | TFLOPs: 11.91 | 7: iteration 124140/ 173500 | consumed samples: 31779840 | consumed tokens: 65085112320 | elapsed time per iteration (s): 0.08 | learning rate: 5.425E-05 | global batch size: 256 | lm loss: 4.507937E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.284 | TFLOPs: 11.92 | 7: iteration 124150/ 173500 | consumed samples: 31782400 | consumed tokens: 65090355200 | elapsed time per iteration (s): 0.08 | learning rate: 5.424E-05 | global batch size: 256 | lm loss: 4.505121E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.563 | TFLOPs: 11.92 | 7: iteration 124160/ 173500 | consumed samples: 31784960 | consumed tokens: 65095598080 | elapsed time per iteration (s): 0.08 | learning rate: 5.423E-05 | global batch size: 256 | lm loss: 4.515712E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.957 | TFLOPs: 11.65 | 7: iteration 124170/ 173500 | consumed samples: 31787520 | consumed tokens: 65100840960 | elapsed time per iteration (s): 0.08 | learning rate: 5.421E-05 | global batch size: 256 | lm loss: 4.502109E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.622 | TFLOPs: 11.61 | 7: iteration 124180/ 173500 | consumed samples: 31790080 | consumed tokens: 65106083840 | elapsed time per iteration (s): 0.08 | learning rate: 5.420E-05 | global batch size: 256 | lm loss: 4.505556E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.775 | TFLOPs: 11.94 | 7: iteration 124190/ 173500 | consumed samples: 31792640 | consumed tokens: 65111326720 | elapsed time per iteration (s): 0.08 | learning rate: 5.419E-05 | global batch size: 256 | lm loss: 4.507327E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.379 | TFLOPs: 11.71 | 7: iteration 124200/ 173500 | consumed samples: 31795200 | consumed tokens: 65116569600 | elapsed time per iteration (s): 0.08 | learning rate: 5.418E-05 | global batch size: 256 | lm loss: 4.529811E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.707 | TFLOPs: 11.88 | 7: iteration 124210/ 173500 | consumed samples: 31797760 | consumed tokens: 65121812480 | elapsed time per iteration (s): 0.09 | learning rate: 5.416E-05 | global batch size: 256 | lm loss: 4.512974E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.990 | TFLOPs: 11.20 | 7: iteration 124220/ 173500 | consumed samples: 31800320 | consumed tokens: 65127055360 | elapsed time per iteration (s): 0.08 | learning rate: 5.415E-05 | global batch size: 256 | lm loss: 4.515023E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.962 | TFLOPs: 11.61 | 7: iteration 124230/ 173500 | consumed samples: 31802880 | consumed tokens: 65132298240 | elapsed time per iteration (s): 0.08 | learning rate: 5.414E-05 | global batch size: 256 | lm loss: 4.511639E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.799 | TFLOPs: 11.55 | 7: iteration 124240/ 173500 | consumed samples: 31805440 | consumed tokens: 65137541120 | elapsed time per iteration (s): 0.08 | learning rate: 5.412E-05 | global batch size: 256 | lm loss: 4.514921E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.363 | TFLOPs: 11.60 | 7: iteration 124250/ 173500 | consumed samples: 31808000 | consumed tokens: 65142784000 | elapsed time per iteration (s): 0.08 | learning rate: 5.411E-05 | global batch size: 256 | lm loss: 4.515757E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.135 | TFLOPs: 11.90 | 7: iteration 124260/ 173500 | consumed samples: 31810560 | consumed tokens: 65148026880 | elapsed time per iteration (s): 0.08 | learning rate: 5.410E-05 | global batch size: 256 | lm loss: 4.509288E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.906 | TFLOPs: 11.89 | 7: iteration 124270/ 173500 | consumed samples: 31813120 | consumed tokens: 65153269760 | elapsed time per iteration (s): 0.08 | learning rate: 5.409E-05 | global batch size: 256 | lm loss: 4.497787E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.729 | TFLOPs: 11.99 | 7: iteration 124280/ 173500 | consumed samples: 31815680 | consumed tokens: 65158512640 | elapsed time per iteration (s): 0.08 | learning rate: 5.407E-05 | global batch size: 256 | lm loss: 4.534469E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.908 | TFLOPs: 11.70 | 7: iteration 124290/ 173500 | consumed samples: 31818240 | consumed tokens: 65163755520 | elapsed time per iteration (s): 0.08 | learning rate: 5.406E-05 | global batch size: 256 | lm loss: 4.520436E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.327 | TFLOPs: 11.73 | 7: iteration 124300/ 173500 | consumed samples: 31820800 | consumed tokens: 65168998400 | elapsed time per iteration (s): 0.08 | learning rate: 5.405E-05 | global batch size: 256 | lm loss: 4.513344E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.230 | TFLOPs: 11.78 | 7: iteration 124310/ 173500 | consumed samples: 31823360 | consumed tokens: 65174241280 | elapsed time per iteration (s): 0.09 | learning rate: 5.403E-05 | global batch size: 256 | lm loss: 4.505519E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.887 | TFLOPs: 10.60 | 7: iteration 124320/ 173500 | consumed samples: 31825920 | consumed tokens: 65179484160 | elapsed time per iteration (s): 0.08 | learning rate: 5.402E-05 | global batch size: 256 | lm loss: 4.504399E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.667 | TFLOPs: 11.94 | 7: iteration 124330/ 173500 | consumed samples: 31828480 | consumed tokens: 65184727040 | elapsed time per iteration (s): 0.08 | learning rate: 5.401E-05 | global batch size: 256 | lm loss: 4.502235E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.025 | TFLOPs: 11.85 | 7: iteration 124340/ 173500 | consumed samples: 31831040 | consumed tokens: 65189969920 | elapsed time per iteration (s): 0.08 | learning rate: 5.399E-05 | global batch size: 256 | lm loss: 4.514443E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.074 | TFLOPs: 11.88 | 7: iteration 124350/ 173500 | consumed samples: 31833600 | consumed tokens: 65195212800 | elapsed time per iteration (s): 0.08 | learning rate: 5.398E-05 | global batch size: 256 | lm loss: 4.511959E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.178 | TFLOPs: 11.39 | 7: iteration 124360/ 173500 | consumed samples: 31836160 | consumed tokens: 65200455680 | elapsed time per iteration (s): 0.08 | learning rate: 5.397E-05 | global batch size: 256 | lm loss: 4.514584E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.971 | TFLOPs: 11.30 | 7: iteration 124370/ 173500 | consumed samples: 31838720 | consumed tokens: 65205698560 | elapsed time per iteration (s): 0.10 | learning rate: 5.396E-05 | global batch size: 256 | lm loss: 4.514950E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2511.096 | TFLOPs: 9.34 | 7: iteration 124380/ 173500 | consumed samples: 31841280 | consumed tokens: 65210941440 | elapsed time per iteration (s): 0.08 | learning rate: 5.394E-05 | global batch size: 256 | lm loss: 4.507289E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.391 | TFLOPs: 11.53 | 7: iteration 124390/ 173500 | consumed samples: 31843840 | consumed tokens: 65216184320 | elapsed time per iteration (s): 0.08 | learning rate: 5.393E-05 | global batch size: 256 | lm loss: 4.508146E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.632 | TFLOPs: 11.86 | 7: iteration 124400/ 173500 | consumed samples: 31846400 | consumed tokens: 65221427200 | elapsed time per iteration (s): 0.08 | learning rate: 5.392E-05 | global batch size: 256 | lm loss: 4.510329E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.327 | TFLOPs: 11.81 | 7: iteration 124410/ 173500 | consumed samples: 31848960 | consumed tokens: 65226670080 | elapsed time per iteration (s): 0.08 | learning rate: 5.390E-05 | global batch size: 256 | lm loss: 4.512001E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.470 | TFLOPs: 11.90 | 7: iteration 124420/ 173500 | consumed samples: 31851520 | consumed tokens: 65231912960 | elapsed time per iteration (s): 0.08 | learning rate: 5.389E-05 | global batch size: 256 | lm loss: 4.516844E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.133 | TFLOPs: 11.94 | 7: iteration 124430/ 173500 | consumed samples: 31854080 | consumed tokens: 65237155840 | elapsed time per iteration (s): 0.08 | learning rate: 5.388E-05 | global batch size: 256 | lm loss: 4.512508E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.222 | TFLOPs: 11.94 | 7: iteration 124440/ 173500 | consumed samples: 31856640 | consumed tokens: 65242398720 | elapsed time per iteration (s): 0.09 | learning rate: 5.387E-05 | global batch size: 256 | lm loss: 4.522766E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.890 | TFLOPs: 10.59 | 7: iteration 124450/ 173500 | consumed samples: 31859200 | consumed tokens: 65247641600 | elapsed time per iteration (s): 0.08 | learning rate: 5.385E-05 | global batch size: 256 | lm loss: 4.497574E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.418 | TFLOPs: 11.83 | 7: iteration 124460/ 173500 | consumed samples: 31861760 | consumed tokens: 65252884480 | elapsed time per iteration (s): 0.08 | learning rate: 5.384E-05 | global batch size: 256 | lm loss: 4.504171E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.598 | TFLOPs: 11.88 | 7: iteration 124470/ 173500 | consumed samples: 31864320 | consumed tokens: 65258127360 | elapsed time per iteration (s): 0.08 | learning rate: 5.383E-05 | global batch size: 256 | lm loss: 4.512140E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.104 | TFLOPs: 11.90 | 7: iteration 124480/ 173500 | consumed samples: 31866880 | consumed tokens: 65263370240 | elapsed time per iteration (s): 0.08 | learning rate: 5.381E-05 | global batch size: 256 | lm loss: 4.522993E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.526 | TFLOPs: 11.93 | 7: iteration 124490/ 173500 | consumed samples: 31869440 | consumed tokens: 65268613120 | elapsed time per iteration (s): 0.08 | learning rate: 5.380E-05 | global batch size: 256 | lm loss: 4.505668E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.970 | TFLOPs: 11.92 | 7: iteration 124500/ 173500 | consumed samples: 31872000 | consumed tokens: 65273856000 | elapsed time per iteration (s): 0.08 | learning rate: 5.379E-05 | global batch size: 256 | lm loss: 4.506857E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.897 | TFLOPs: 11.57 | 7: iteration 124510/ 173500 | consumed samples: 31874560 | consumed tokens: 65279098880 | elapsed time per iteration (s): 0.09 | learning rate: 5.378E-05 | global batch size: 256 | lm loss: 4.515584E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.204 | TFLOPs: 10.16 | 7: iteration 124520/ 173500 | consumed samples: 31877120 | consumed tokens: 65284341760 | elapsed time per iteration (s): 0.10 | learning rate: 5.376E-05 | global batch size: 256 | lm loss: 4.505785E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.827 | TFLOPs: 9.16 | 7: iteration 124530/ 173500 | consumed samples: 31879680 | consumed tokens: 65289584640 | elapsed time per iteration (s): 0.09 | learning rate: 5.375E-05 | global batch size: 256 | lm loss: 4.515431E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.399 | TFLOPs: 10.92 | 7: iteration 124540/ 173500 | consumed samples: 31882240 | consumed tokens: 65294827520 | elapsed time per iteration (s): 0.08 | learning rate: 5.374E-05 | global batch size: 256 | lm loss: 4.526468E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.313 | TFLOPs: 11.93 | 7: iteration 124550/ 173500 | consumed samples: 31884800 | consumed tokens: 65300070400 | elapsed time per iteration (s): 0.08 | learning rate: 5.372E-05 | global batch size: 256 | lm loss: 4.519402E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.850 | TFLOPs: 11.93 | 7: iteration 124560/ 173500 | consumed samples: 31887360 | consumed tokens: 65305313280 | elapsed time per iteration (s): 0.08 | learning rate: 5.371E-05 | global batch size: 256 | lm loss: 4.510403E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.954 | TFLOPs: 11.89 | 7: iteration 124570/ 173500 | consumed samples: 31889920 | consumed tokens: 65310556160 | elapsed time per iteration (s): 0.08 | learning rate: 5.370E-05 | global batch size: 256 | lm loss: 4.515268E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.811 | TFLOPs: 11.37 | 7: iteration 124580/ 173500 | consumed samples: 31892480 | consumed tokens: 65315799040 | elapsed time per iteration (s): 0.08 | learning rate: 5.369E-05 | global batch size: 256 | lm loss: 4.504120E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.112 | TFLOPs: 11.87 | 7: iteration 124590/ 173500 | consumed samples: 31895040 | consumed tokens: 65321041920 | elapsed time per iteration (s): 0.08 | learning rate: 5.367E-05 | global batch size: 256 | lm loss: 4.506290E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.183 | TFLOPs: 11.82 | 7: iteration 124600/ 173500 | consumed samples: 31897600 | consumed tokens: 65326284800 | elapsed time per iteration (s): 0.08 | learning rate: 5.366E-05 | global batch size: 256 | lm loss: 4.502648E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.778 | TFLOPs: 11.82 | 7: iteration 124610/ 173500 | consumed samples: 31900160 | consumed tokens: 65331527680 | elapsed time per iteration (s): 0.08 | learning rate: 5.365E-05 | global batch size: 256 | lm loss: 4.507243E+00 | grad norm: 0.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.881 | TFLOPs: 11.87 | 7: iteration 124620/ 173500 | consumed samples: 31902720 | consumed tokens: 65336770560 | elapsed time per iteration (s): 0.08 | learning rate: 5.363E-05 | global batch size: 256 | lm loss: 4.509599E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.993 | TFLOPs: 11.84 | 7: iteration 124630/ 173500 | consumed samples: 31905280 | consumed tokens: 65342013440 | elapsed time per iteration (s): 0.08 | learning rate: 5.362E-05 | global batch size: 256 | lm loss: 4.502994E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.854 | TFLOPs: 11.59 | 7: iteration 124640/ 173500 | consumed samples: 31907840 | consumed tokens: 65347256320 | elapsed time per iteration (s): 0.08 | learning rate: 5.361E-05 | global batch size: 256 | lm loss: 4.526282E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.794 | TFLOPs: 11.82 | 7: iteration 124650/ 173500 | consumed samples: 31910400 | consumed tokens: 65352499200 | elapsed time per iteration (s): 0.08 | learning rate: 5.360E-05 | global batch size: 256 | lm loss: 4.514096E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.720 | TFLOPs: 11.75 | 7: iteration 124660/ 173500 | consumed samples: 31912960 | consumed tokens: 65357742080 | elapsed time per iteration (s): 0.08 | learning rate: 5.358E-05 | global batch size: 256 | lm loss: 4.513301E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.971 | TFLOPs: 11.82 | 7: iteration 124670/ 173500 | consumed samples: 31915520 | consumed tokens: 65362984960 | elapsed time per iteration (s): 0.08 | learning rate: 5.357E-05 | global batch size: 256 | lm loss: 4.519062E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.943 | TFLOPs: 11.87 | 7: iteration 124680/ 173500 | consumed samples: 31918080 | consumed tokens: 65368227840 | elapsed time per iteration (s): 0.08 | learning rate: 5.356E-05 | global batch size: 256 | lm loss: 4.509735E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.396 | TFLOPs: 11.80 | 7: iteration 124690/ 173500 | consumed samples: 31920640 | consumed tokens: 65373470720 | elapsed time per iteration (s): 0.08 | learning rate: 5.355E-05 | global batch size: 256 | lm loss: 4.516686E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.067 | TFLOPs: 11.84 | 7: iteration 124700/ 173500 | consumed samples: 31923200 | consumed tokens: 65378713600 | elapsed time per iteration (s): 0.08 | learning rate: 5.353E-05 | global batch size: 256 | lm loss: 4.495726E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.078 | TFLOPs: 11.40 | 7: iteration 124710/ 173500 | consumed samples: 31925760 | consumed tokens: 65383956480 | elapsed time per iteration (s): 0.08 | learning rate: 5.352E-05 | global batch size: 256 | lm loss: 4.511340E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.743 | TFLOPs: 11.83 | 7: iteration 124720/ 173500 | consumed samples: 31928320 | consumed tokens: 65389199360 | elapsed time per iteration (s): 0.08 | learning rate: 5.351E-05 | global batch size: 256 | lm loss: 4.506446E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.782 | TFLOPs: 11.86 | 7: iteration 124730/ 173500 | consumed samples: 31930880 | consumed tokens: 65394442240 | elapsed time per iteration (s): 0.08 | learning rate: 5.349E-05 | global batch size: 256 | lm loss: 4.512560E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.364 | TFLOPs: 11.83 | 7: iteration 124740/ 173500 | consumed samples: 31933440 | consumed tokens: 65399685120 | elapsed time per iteration (s): 0.08 | learning rate: 5.348E-05 | global batch size: 256 | lm loss: 4.503844E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.021 | TFLOPs: 11.78 | 7: iteration 124750/ 173500 | consumed samples: 31936000 | consumed tokens: 65404928000 | elapsed time per iteration (s): 0.08 | learning rate: 5.347E-05 | global batch size: 256 | lm loss: 4.514998E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.001 | TFLOPs: 11.78 | 7: iteration 124760/ 173500 | consumed samples: 31938560 | consumed tokens: 65410170880 | elapsed time per iteration (s): 0.09 | learning rate: 5.346E-05 | global batch size: 256 | lm loss: 4.521199E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.200 | TFLOPs: 11.13 | 7: iteration 124770/ 173500 | consumed samples: 31941120 | consumed tokens: 65415413760 | elapsed time per iteration (s): 0.08 | learning rate: 5.344E-05 | global batch size: 256 | lm loss: 4.513298E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.562 | TFLOPs: 11.82 | 7: iteration 124780/ 173500 | consumed samples: 31943680 | consumed tokens: 65420656640 | elapsed time per iteration (s): 0.08 | learning rate: 5.343E-05 | global batch size: 256 | lm loss: 4.519738E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.325 | TFLOPs: 11.55 | 7: iteration 124790/ 173500 | consumed samples: 31946240 | consumed tokens: 65425899520 | elapsed time per iteration (s): 0.08 | learning rate: 5.342E-05 | global batch size: 256 | lm loss: 4.508630E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.914 | TFLOPs: 11.83 | 7: iteration 124800/ 173500 | consumed samples: 31948800 | consumed tokens: 65431142400 | elapsed time per iteration (s): 0.08 | learning rate: 5.340E-05 | global batch size: 256 | lm loss: 4.517311E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.906 | TFLOPs: 11.88 | 7: iteration 124810/ 173500 | consumed samples: 31951360 | consumed tokens: 65436385280 | elapsed time per iteration (s): 0.08 | learning rate: 5.339E-05 | global batch size: 256 | lm loss: 4.519762E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.936 | TFLOPs: 11.94 | 7: iteration 124820/ 173500 | consumed samples: 31953920 | consumed tokens: 65441628160 | elapsed time per iteration (s): 0.08 | learning rate: 5.338E-05 | global batch size: 256 | lm loss: 4.493427E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.747 | TFLOPs: 11.88 | 7: iteration 124830/ 173500 | consumed samples: 31956480 | consumed tokens: 65446871040 | elapsed time per iteration (s): 0.08 | learning rate: 5.337E-05 | global batch size: 256 | lm loss: 4.512411E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.236 | TFLOPs: 11.90 | 7: iteration 124840/ 173500 | consumed samples: 31959040 | consumed tokens: 65452113920 | elapsed time per iteration (s): 0.08 | learning rate: 5.335E-05 | global batch size: 256 | lm loss: 4.509378E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.360 | TFLOPs: 11.84 | 7: iteration 124850/ 173500 | consumed samples: 31961600 | consumed tokens: 65457356800 | elapsed time per iteration (s): 0.08 | learning rate: 5.334E-05 | global batch size: 256 | lm loss: 4.506390E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.080 | TFLOPs: 11.88 | 7: iteration 124860/ 173500 | consumed samples: 31964160 | consumed tokens: 65462599680 | elapsed time per iteration (s): 0.08 | learning rate: 5.333E-05 | global batch size: 256 | lm loss: 4.508766E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.095 | TFLOPs: 11.93 | 7: iteration 124870/ 173500 | consumed samples: 31966720 | consumed tokens: 65467842560 | elapsed time per iteration (s): 0.08 | learning rate: 5.331E-05 | global batch size: 256 | lm loss: 4.515452E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.095 | TFLOPs: 11.62 | 7: iteration 124880/ 173500 | consumed samples: 31969280 | consumed tokens: 65473085440 | elapsed time per iteration (s): 0.08 | learning rate: 5.330E-05 | global batch size: 256 | lm loss: 4.507761E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.095 | TFLOPs: 11.90 | 7: iteration 124890/ 173500 | consumed samples: 31971840 | consumed tokens: 65478328320 | elapsed time per iteration (s): 0.08 | learning rate: 5.329E-05 | global batch size: 256 | lm loss: 4.522588E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.943 | TFLOPs: 11.93 | 7: iteration 124900/ 173500 | consumed samples: 31974400 | consumed tokens: 65483571200 | elapsed time per iteration (s): 0.08 | learning rate: 5.328E-05 | global batch size: 256 | lm loss: 4.519413E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.992 | TFLOPs: 11.92 | 7: iteration 124910/ 173500 | consumed samples: 31976960 | consumed tokens: 65488814080 | elapsed time per iteration (s): 0.08 | learning rate: 5.326E-05 | global batch size: 256 | lm loss: 4.527184E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.583 | TFLOPs: 11.90 | 7: iteration 124920/ 173500 | consumed samples: 31979520 | consumed tokens: 65494056960 | elapsed time per iteration (s): 0.08 | learning rate: 5.325E-05 | global batch size: 256 | lm loss: 4.517489E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.927 | TFLOPs: 11.89 | 7: iteration 124930/ 173500 | consumed samples: 31982080 | consumed tokens: 65499299840 | elapsed time per iteration (s): 0.08 | learning rate: 5.324E-05 | global batch size: 256 | lm loss: 4.496434E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.101 | TFLOPs: 11.87 | 7: iteration 124940/ 173500 | consumed samples: 31984640 | consumed tokens: 65504542720 | elapsed time per iteration (s): 0.08 | learning rate: 5.323E-05 | global batch size: 256 | lm loss: 4.511741E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.896 | TFLOPs: 11.91 | 7: iteration 124950/ 173500 | consumed samples: 31987200 | consumed tokens: 65509785600 | elapsed time per iteration (s): 0.08 | learning rate: 5.321E-05 | global batch size: 256 | lm loss: 4.502881E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.336 | TFLOPs: 11.61 | 7: iteration 124960/ 173500 | consumed samples: 31989760 | consumed tokens: 65515028480 | elapsed time per iteration (s): 0.08 | learning rate: 5.320E-05 | global batch size: 256 | lm loss: 4.521531E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.824 | TFLOPs: 11.79 | 7: iteration 124970/ 173500 | consumed samples: 31992320 | consumed tokens: 65520271360 | elapsed time per iteration (s): 0.08 | learning rate: 5.319E-05 | global batch size: 256 | lm loss: 4.517476E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.598 | TFLOPs: 11.57 | 7: iteration 124980/ 173500 | consumed samples: 31994880 | consumed tokens: 65525514240 | elapsed time per iteration (s): 0.08 | learning rate: 5.317E-05 | global batch size: 256 | lm loss: 4.504254E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.260 | TFLOPs: 11.58 | 7: iteration 124990/ 173500 | consumed samples: 31997440 | consumed tokens: 65530757120 | elapsed time per iteration (s): 0.08 | learning rate: 5.316E-05 | global batch size: 256 | lm loss: 4.509698E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.828 | TFLOPs: 11.84 | 7: iteration 125000/ 173500 | consumed samples: 32000000 | consumed tokens: 65536000000 | elapsed time per iteration (s): 0.08 | learning rate: 5.315E-05 | global batch size: 256 | lm loss: 4.517481E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.760 | TFLOPs: 11.85 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 125000 | lm loss value: 4.414136E+00 | lm loss PPL: 8.261047E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 125000 to checkpoints_14m91b100m 0: [2023-03-17 03:17:35,608] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step125000 is begin to save! 0: [2023-03-17 03:17:35,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:17:35,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:17:35,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:17:35,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:17:35,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:17:35,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:17:35,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:17:35,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:17:35,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:17:35,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:17:35,662] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:17:35,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:17:35,664] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step125000/mp_rank_00_model_states.pt 0: [2023-03-17 03:17:35,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:17:35,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:17:35,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:17:35,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 6: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 6: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 6: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:17:35,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 6: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 1: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:17:35,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 2: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 7: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 5: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 3: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:17:35,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:17:35,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 4: [2023-03-17 03:17:35,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:17:35,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step125000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:17:35,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step125000 is ready now! 0: successfully saved checkpoint at iteration 125000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 95.31 7: iteration 125010/ 173500 | consumed samples: 32002560 | consumed tokens: 65541242880 | elapsed time per iteration (s): 0.09 | learning rate: 5.314E-05 | global batch size: 256 | lm loss: 4.519870E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.127 | TFLOPs: 10.23 | 7: iteration 125020/ 173500 | consumed samples: 32005120 | consumed tokens: 65546485760 | elapsed time per iteration (s): 0.08 | learning rate: 5.312E-05 | global batch size: 256 | lm loss: 4.500349E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.131 | TFLOPs: 11.73 | 7: iteration 125030/ 173500 | consumed samples: 32007680 | consumed tokens: 65551728640 | elapsed time per iteration (s): 0.08 | learning rate: 5.311E-05 | global batch size: 256 | lm loss: 4.512323E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.169 | TFLOPs: 11.89 | 7: iteration 125040/ 173500 | consumed samples: 32010240 | consumed tokens: 65556971520 | elapsed time per iteration (s): 0.08 | learning rate: 5.310E-05 | global batch size: 256 | lm loss: 4.506761E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.856 | TFLOPs: 11.52 | 7: iteration 125050/ 173500 | consumed samples: 32012800 | consumed tokens: 65562214400 | elapsed time per iteration (s): 0.09 | learning rate: 5.308E-05 | global batch size: 256 | lm loss: 4.498839E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.422 | TFLOPs: 10.54 | 7: iteration 125060/ 173500 | consumed samples: 32015360 | consumed tokens: 65567457280 | elapsed time per iteration (s): 0.08 | learning rate: 5.307E-05 | global batch size: 256 | lm loss: 4.509268E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.000 | TFLOPs: 11.87 | 7: iteration 125070/ 173500 | consumed samples: 32017920 | consumed tokens: 65572700160 | elapsed time per iteration (s): 0.08 | learning rate: 5.306E-05 | global batch size: 256 | lm loss: 4.516980E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.387 | TFLOPs: 11.87 | 7: iteration 125080/ 173500 | consumed samples: 32020480 | consumed tokens: 65577943040 | elapsed time per iteration (s): 0.08 | learning rate: 5.305E-05 | global batch size: 256 | lm loss: 4.515871E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.240 | TFLOPs: 11.90 | 7: iteration 125090/ 173500 | consumed samples: 32023040 | consumed tokens: 65583185920 | elapsed time per iteration (s): 0.08 | learning rate: 5.303E-05 | global batch size: 256 | lm loss: 4.516911E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.428 | TFLOPs: 11.87 | 7: iteration 125100/ 173500 | consumed samples: 32025600 | consumed tokens: 65588428800 | elapsed time per iteration (s): 0.08 | learning rate: 5.302E-05 | global batch size: 256 | lm loss: 4.502049E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.278 | TFLOPs: 11.89 | 7: iteration 125110/ 173500 | consumed samples: 32028160 | consumed tokens: 65593671680 | elapsed time per iteration (s): 0.08 | learning rate: 5.301E-05 | global batch size: 256 | lm loss: 4.512355E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.038 | TFLOPs: 11.83 | 7: iteration 125120/ 173500 | consumed samples: 32030720 | consumed tokens: 65598914560 | elapsed time per iteration (s): 0.08 | learning rate: 5.300E-05 | global batch size: 256 | lm loss: 4.506517E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.170 | TFLOPs: 11.84 | 7: iteration 125130/ 173500 | consumed samples: 32033280 | consumed tokens: 65604157440 | elapsed time per iteration (s): 0.08 | learning rate: 5.298E-05 | global batch size: 256 | lm loss: 4.516920E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.817 | TFLOPs: 11.81 | 7: iteration 125140/ 173500 | consumed samples: 32035840 | consumed tokens: 65609400320 | elapsed time per iteration (s): 0.08 | learning rate: 5.297E-05 | global batch size: 256 | lm loss: 4.516087E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.383 | TFLOPs: 11.77 | 7: iteration 125150/ 173500 | consumed samples: 32038400 | consumed tokens: 65614643200 | elapsed time per iteration (s): 0.08 | learning rate: 5.296E-05 | global batch size: 256 | lm loss: 4.514162E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.508 | TFLOPs: 11.80 | 7: iteration 125160/ 173500 | consumed samples: 32040960 | consumed tokens: 65619886080 | elapsed time per iteration (s): 0.08 | learning rate: 5.294E-05 | global batch size: 256 | lm loss: 4.506870E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.996 | TFLOPs: 11.78 | 7: iteration 125170/ 173500 | consumed samples: 32043520 | consumed tokens: 65625128960 | elapsed time per iteration (s): 0.08 | learning rate: 5.293E-05 | global batch size: 256 | lm loss: 4.513035E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.451 | TFLOPs: 11.80 | 7: iteration 125180/ 173500 | consumed samples: 32046080 | consumed tokens: 65630371840 | elapsed time per iteration (s): 0.08 | learning rate: 5.292E-05 | global batch size: 256 | lm loss: 4.510986E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.609 | TFLOPs: 11.83 | 7: iteration 125190/ 173500 | consumed samples: 32048640 | consumed tokens: 65635614720 | elapsed time per iteration (s): 0.08 | learning rate: 5.291E-05 | global batch size: 256 | lm loss: 4.514524E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.326 | TFLOPs: 11.84 | 7: iteration 125200/ 173500 | consumed samples: 32051200 | consumed tokens: 65640857600 | elapsed time per iteration (s): 0.08 | learning rate: 5.289E-05 | global batch size: 256 | lm loss: 4.514580E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.254 | TFLOPs: 11.86 | 7: iteration 125210/ 173500 | consumed samples: 32053760 | consumed tokens: 65646100480 | elapsed time per iteration (s): 0.08 | learning rate: 5.288E-05 | global batch size: 256 | lm loss: 4.499563E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.635 | TFLOPs: 11.28 | 7: iteration 125220/ 173500 | consumed samples: 32056320 | consumed tokens: 65651343360 | elapsed time per iteration (s): 0.08 | learning rate: 5.287E-05 | global batch size: 256 | lm loss: 4.499411E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.893 | TFLOPs: 11.80 | 7: iteration 125230/ 173500 | consumed samples: 32058880 | consumed tokens: 65656586240 | elapsed time per iteration (s): 0.08 | learning rate: 5.286E-05 | global batch size: 256 | lm loss: 4.514196E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.488 | TFLOPs: 11.87 | 7: iteration 125240/ 173500 | consumed samples: 32061440 | consumed tokens: 65661829120 | elapsed time per iteration (s): 0.08 | learning rate: 5.284E-05 | global batch size: 256 | lm loss: 4.512613E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.875 | TFLOPs: 11.44 | 7: iteration 125250/ 173500 | consumed samples: 32064000 | consumed tokens: 65667072000 | elapsed time per iteration (s): 0.13 | learning rate: 5.283E-05 | global batch size: 256 | lm loss: 4.505112E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1933.677 | TFLOPs: 7.19 | 7: iteration 125260/ 173500 | consumed samples: 32066560 | consumed tokens: 65672314880 | elapsed time per iteration (s): 0.08 | learning rate: 5.282E-05 | global batch size: 256 | lm loss: 4.504710E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.407 | TFLOPs: 11.27 | 7: iteration 125270/ 173500 | consumed samples: 32069120 | consumed tokens: 65677557760 | elapsed time per iteration (s): 0.09 | learning rate: 5.280E-05 | global batch size: 256 | lm loss: 4.520218E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.183 | TFLOPs: 11.10 | 7: iteration 125280/ 173500 | consumed samples: 32071680 | consumed tokens: 65682800640 | elapsed time per iteration (s): 0.10 | learning rate: 5.279E-05 | global batch size: 256 | lm loss: 4.496366E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2484.127 | TFLOPs: 9.24 | 7: iteration 125290/ 173500 | consumed samples: 32074240 | consumed tokens: 65688043520 | elapsed time per iteration (s): 0.09 | learning rate: 5.278E-05 | global batch size: 256 | lm loss: 4.497681E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.706 | TFLOPs: 10.99 | 7: iteration 125300/ 173500 | consumed samples: 32076800 | consumed tokens: 65693286400 | elapsed time per iteration (s): 0.08 | learning rate: 5.277E-05 | global batch size: 256 | lm loss: 4.512989E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.205 | TFLOPs: 11.37 | 7: iteration 125310/ 173500 | consumed samples: 32079360 | consumed tokens: 65698529280 | elapsed time per iteration (s): 0.08 | learning rate: 5.275E-05 | global batch size: 256 | lm loss: 4.503980E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.393 | TFLOPs: 11.79 | 7: iteration 125320/ 173500 | consumed samples: 32081920 | consumed tokens: 65703772160 | elapsed time per iteration (s): 0.09 | learning rate: 5.274E-05 | global batch size: 256 | lm loss: 4.517599E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.669 | TFLOPs: 11.13 | 7: iteration 125330/ 173500 | consumed samples: 32084480 | consumed tokens: 65709015040 | elapsed time per iteration (s): 0.09 | learning rate: 5.273E-05 | global batch size: 256 | lm loss: 4.509801E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.853 | TFLOPs: 10.54 | 7: iteration 125340/ 173500 | consumed samples: 32087040 | consumed tokens: 65714257920 | elapsed time per iteration (s): 0.10 | learning rate: 5.272E-05 | global batch size: 256 | lm loss: 4.504107E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.139 | TFLOPs: 9.26 | 7: iteration 125350/ 173500 | consumed samples: 32089600 | consumed tokens: 65719500800 | elapsed time per iteration (s): 0.09 | learning rate: 5.270E-05 | global batch size: 256 | lm loss: 4.514959E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2782.497 | TFLOPs: 10.35 | 7: iteration 125360/ 173500 | consumed samples: 32092160 | consumed tokens: 65724743680 | elapsed time per iteration (s): 0.10 | learning rate: 5.269E-05 | global batch size: 256 | lm loss: 4.501846E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.253 | TFLOPs: 9.19 | 7: iteration 125370/ 173500 | consumed samples: 32094720 | consumed tokens: 65729986560 | elapsed time per iteration (s): 0.10 | learning rate: 5.268E-05 | global batch size: 256 | lm loss: 4.510555E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2470.769 | TFLOPs: 9.19 | 7: iteration 125380/ 173500 | consumed samples: 32097280 | consumed tokens: 65735229440 | elapsed time per iteration (s): 0.10 | learning rate: 5.267E-05 | global batch size: 256 | lm loss: 4.494937E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.542 | TFLOPs: 9.16 | 7: iteration 125390/ 173500 | consumed samples: 32099840 | consumed tokens: 65740472320 | elapsed time per iteration (s): 0.10 | learning rate: 5.265E-05 | global batch size: 256 | lm loss: 4.503844E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.872 | TFLOPs: 9.30 | 7: iteration 125400/ 173500 | consumed samples: 32102400 | consumed tokens: 65745715200 | elapsed time per iteration (s): 0.10 | learning rate: 5.264E-05 | global batch size: 256 | lm loss: 4.521329E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.997 | TFLOPs: 9.23 | 7: iteration 125410/ 173500 | consumed samples: 32104960 | consumed tokens: 65750958080 | elapsed time per iteration (s): 0.09 | learning rate: 5.263E-05 | global batch size: 256 | lm loss: 4.509161E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.335 | TFLOPs: 10.31 | 7: iteration 125420/ 173500 | consumed samples: 32107520 | consumed tokens: 65756200960 | elapsed time per iteration (s): 0.08 | learning rate: 5.261E-05 | global batch size: 256 | lm loss: 4.514370E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.263 | TFLOPs: 11.82 | 7: iteration 125430/ 173500 | consumed samples: 32110080 | consumed tokens: 65761443840 | elapsed time per iteration (s): 0.08 | learning rate: 5.260E-05 | global batch size: 256 | lm loss: 4.500313E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.811 | TFLOPs: 11.77 | 7: iteration 125440/ 173500 | consumed samples: 32112640 | consumed tokens: 65766686720 | elapsed time per iteration (s): 0.09 | learning rate: 5.259E-05 | global batch size: 256 | lm loss: 4.505285E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2737.315 | TFLOPs: 10.18 | 7: iteration 125450/ 173500 | consumed samples: 32115200 | consumed tokens: 65771929600 | elapsed time per iteration (s): 0.08 | learning rate: 5.258E-05 | global batch size: 256 | lm loss: 4.515038E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.334 | TFLOPs: 11.70 | 7: iteration 125460/ 173500 | consumed samples: 32117760 | consumed tokens: 65777172480 | elapsed time per iteration (s): 0.08 | learning rate: 5.256E-05 | global batch size: 256 | lm loss: 4.502194E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.286 | TFLOPs: 11.74 | 7: iteration 125470/ 173500 | consumed samples: 32120320 | consumed tokens: 65782415360 | elapsed time per iteration (s): 0.08 | learning rate: 5.255E-05 | global batch size: 256 | lm loss: 4.521308E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.956 | TFLOPs: 11.81 | 7: iteration 125480/ 173500 | consumed samples: 32122880 | consumed tokens: 65787658240 | elapsed time per iteration (s): 0.08 | learning rate: 5.254E-05 | global batch size: 256 | lm loss: 4.511360E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.475 | TFLOPs: 11.80 | 7: iteration 125490/ 173500 | consumed samples: 32125440 | consumed tokens: 65792901120 | elapsed time per iteration (s): 0.08 | learning rate: 5.253E-05 | global batch size: 256 | lm loss: 4.505715E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.799 | TFLOPs: 11.28 | 7: iteration 125500/ 173500 | consumed samples: 32128000 | consumed tokens: 65798144000 | elapsed time per iteration (s): 0.08 | learning rate: 5.251E-05 | global batch size: 256 | lm loss: 4.500774E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.499 | TFLOPs: 11.80 | 7: iteration 125510/ 173500 | consumed samples: 32130560 | consumed tokens: 65803386880 | elapsed time per iteration (s): 0.08 | learning rate: 5.250E-05 | global batch size: 256 | lm loss: 4.501154E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.270 | TFLOPs: 11.78 | 7: iteration 125520/ 173500 | consumed samples: 32133120 | consumed tokens: 65808629760 | elapsed time per iteration (s): 0.08 | learning rate: 5.249E-05 | global batch size: 256 | lm loss: 4.506703E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.521 | TFLOPs: 11.82 | 7: iteration 125530/ 173500 | consumed samples: 32135680 | consumed tokens: 65813872640 | elapsed time per iteration (s): 0.08 | learning rate: 5.247E-05 | global batch size: 256 | lm loss: 4.516409E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.332 | TFLOPs: 11.75 | 7: iteration 125540/ 173500 | consumed samples: 32138240 | consumed tokens: 65819115520 | elapsed time per iteration (s): 0.08 | learning rate: 5.246E-05 | global batch size: 256 | lm loss: 4.519100E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.778 | TFLOPs: 11.79 | 7: iteration 125550/ 173500 | consumed samples: 32140800 | consumed tokens: 65824358400 | elapsed time per iteration (s): 0.08 | learning rate: 5.245E-05 | global batch size: 256 | lm loss: 4.507185E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.713 | TFLOPs: 11.82 | 7: iteration 125560/ 173500 | consumed samples: 32143360 | consumed tokens: 65829601280 | elapsed time per iteration (s): 0.08 | learning rate: 5.244E-05 | global batch size: 256 | lm loss: 4.518167E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.768 | TFLOPs: 11.81 | 7: iteration 125570/ 173500 | consumed samples: 32145920 | consumed tokens: 65834844160 | elapsed time per iteration (s): 0.08 | learning rate: 5.242E-05 | global batch size: 256 | lm loss: 4.497457E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.556 | TFLOPs: 11.70 | 7: iteration 125580/ 173500 | consumed samples: 32148480 | consumed tokens: 65840087040 | elapsed time per iteration (s): 0.10 | learning rate: 5.241E-05 | global batch size: 256 | lm loss: 4.517924E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2671.425 | TFLOPs: 9.94 | 7: iteration 125590/ 173500 | consumed samples: 32151040 | consumed tokens: 65845329920 | elapsed time per iteration (s): 0.09 | learning rate: 5.240E-05 | global batch size: 256 | lm loss: 4.505686E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.171 | TFLOPs: 10.69 | 7: iteration 125600/ 173500 | consumed samples: 32153600 | consumed tokens: 65850572800 | elapsed time per iteration (s): 0.08 | learning rate: 5.239E-05 | global batch size: 256 | lm loss: 4.499709E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.451 | TFLOPs: 11.79 | 7: iteration 125610/ 173500 | consumed samples: 32156160 | consumed tokens: 65855815680 | elapsed time per iteration (s): 0.09 | learning rate: 5.237E-05 | global batch size: 256 | lm loss: 4.515849E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.544 | TFLOPs: 10.73 | 7: iteration 125620/ 173500 | consumed samples: 32158720 | consumed tokens: 65861058560 | elapsed time per iteration (s): 0.08 | learning rate: 5.236E-05 | global batch size: 256 | lm loss: 4.512174E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.879 | TFLOPs: 11.78 | 7: iteration 125630/ 173500 | consumed samples: 32161280 | consumed tokens: 65866301440 | elapsed time per iteration (s): 0.08 | learning rate: 5.235E-05 | global batch size: 256 | lm loss: 4.520337E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.856 | TFLOPs: 11.70 | 7: iteration 125640/ 173500 | consumed samples: 32163840 | consumed tokens: 65871544320 | elapsed time per iteration (s): 0.08 | learning rate: 5.234E-05 | global batch size: 256 | lm loss: 4.504938E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.965 | TFLOPs: 11.79 | 7: iteration 125650/ 173500 | consumed samples: 32166400 | consumed tokens: 65876787200 | elapsed time per iteration (s): 0.08 | learning rate: 5.232E-05 | global batch size: 256 | lm loss: 4.506887E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.644 | TFLOPs: 11.75 | 7: iteration 125660/ 173500 | consumed samples: 32168960 | consumed tokens: 65882030080 | elapsed time per iteration (s): 0.08 | learning rate: 5.231E-05 | global batch size: 256 | lm loss: 4.522212E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.386 | TFLOPs: 11.79 | 7: iteration 125670/ 173500 | consumed samples: 32171520 | consumed tokens: 65887272960 | elapsed time per iteration (s): 0.08 | learning rate: 5.230E-05 | global batch size: 256 | lm loss: 4.514991E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.291 | TFLOPs: 11.77 | 7: iteration 125680/ 173500 | consumed samples: 32174080 | consumed tokens: 65892515840 | elapsed time per iteration (s): 0.08 | learning rate: 5.229E-05 | global batch size: 256 | lm loss: 4.497844E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.459 | TFLOPs: 11.69 | 7: iteration 125690/ 173500 | consumed samples: 32176640 | consumed tokens: 65897758720 | elapsed time per iteration (s): 0.08 | learning rate: 5.227E-05 | global batch size: 256 | lm loss: 4.510246E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.091 | TFLOPs: 11.84 | 7: iteration 125700/ 173500 | consumed samples: 32179200 | consumed tokens: 65903001600 | elapsed time per iteration (s): 0.08 | learning rate: 5.226E-05 | global batch size: 256 | lm loss: 4.511593E+00 | grad norm: 0.459 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.700 | TFLOPs: 11.48 | 7: iteration 125710/ 173500 | consumed samples: 32181760 | consumed tokens: 65908244480 | elapsed time per iteration (s): 0.08 | learning rate: 5.225E-05 | global batch size: 256 | lm loss: 4.509340E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.902 | TFLOPs: 11.78 | 7: iteration 125720/ 173500 | consumed samples: 32184320 | consumed tokens: 65913487360 | elapsed time per iteration (s): 0.08 | learning rate: 5.223E-05 | global batch size: 256 | lm loss: 4.519603E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.060 | TFLOPs: 11.81 | 7: iteration 125730/ 173500 | consumed samples: 32186880 | consumed tokens: 65918730240 | elapsed time per iteration (s): 0.08 | learning rate: 5.222E-05 | global batch size: 256 | lm loss: 4.508961E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.749 | TFLOPs: 11.57 | 7: iteration 125740/ 173500 | consumed samples: 32189440 | consumed tokens: 65923973120 | elapsed time per iteration (s): 0.08 | learning rate: 5.221E-05 | global batch size: 256 | lm loss: 4.508647E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.286 | TFLOPs: 11.60 | 7: iteration 125750/ 173500 | consumed samples: 32192000 | consumed tokens: 65929216000 | elapsed time per iteration (s): 0.08 | learning rate: 5.220E-05 | global batch size: 256 | lm loss: 4.512924E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.244 | TFLOPs: 11.86 | 7: iteration 125760/ 173500 | consumed samples: 32194560 | consumed tokens: 65934458880 | elapsed time per iteration (s): 0.08 | learning rate: 5.218E-05 | global batch size: 256 | lm loss: 4.499722E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.122 | TFLOPs: 11.34 | 7: iteration 125770/ 173500 | consumed samples: 32197120 | consumed tokens: 65939701760 | elapsed time per iteration (s): 0.08 | learning rate: 5.217E-05 | global batch size: 256 | lm loss: 4.515182E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.514 | TFLOPs: 11.59 | 7: iteration 125780/ 173500 | consumed samples: 32199680 | consumed tokens: 65944944640 | elapsed time per iteration (s): 0.08 | learning rate: 5.216E-05 | global batch size: 256 | lm loss: 4.505281E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.185 | TFLOPs: 11.85 | 7: iteration 125790/ 173500 | consumed samples: 32202240 | consumed tokens: 65950187520 | elapsed time per iteration (s): 0.08 | learning rate: 5.215E-05 | global batch size: 256 | lm loss: 4.519664E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.507 | TFLOPs: 11.60 | 7: iteration 125800/ 173500 | consumed samples: 32204800 | consumed tokens: 65955430400 | elapsed time per iteration (s): 0.09 | learning rate: 5.213E-05 | global batch size: 256 | lm loss: 4.517590E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2812.053 | TFLOPs: 10.46 | 7: iteration 125810/ 173500 | consumed samples: 32207360 | consumed tokens: 65960673280 | elapsed time per iteration (s): 0.13 | learning rate: 5.212E-05 | global batch size: 256 | lm loss: 4.521695E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2031.720 | TFLOPs: 7.56 | 7: iteration 125820/ 173500 | consumed samples: 32209920 | consumed tokens: 65965916160 | elapsed time per iteration (s): 0.11 | learning rate: 5.211E-05 | global batch size: 256 | lm loss: 4.519929E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.920 | TFLOPs: 8.32 | 7: iteration 125830/ 173500 | consumed samples: 32212480 | consumed tokens: 65971159040 | elapsed time per iteration (s): 0.12 | learning rate: 5.210E-05 | global batch size: 256 | lm loss: 4.513822E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2072.063 | TFLOPs: 7.71 | 7: iteration 125840/ 173500 | consumed samples: 32215040 | consumed tokens: 65976401920 | elapsed time per iteration (s): 0.12 | learning rate: 5.208E-05 | global batch size: 256 | lm loss: 4.518925E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.968 | TFLOPs: 8.22 | 7: iteration 125850/ 173500 | consumed samples: 32217600 | consumed tokens: 65981644800 | elapsed time per iteration (s): 0.08 | learning rate: 5.207E-05 | global batch size: 256 | lm loss: 4.504842E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.236 | TFLOPs: 11.99 | 7: iteration 125860/ 173500 | consumed samples: 32220160 | consumed tokens: 65986887680 | elapsed time per iteration (s): 0.08 | learning rate: 5.206E-05 | global batch size: 256 | lm loss: 4.531231E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.673 | TFLOPs: 12.07 | 7: iteration 125870/ 173500 | consumed samples: 32222720 | consumed tokens: 65992130560 | elapsed time per iteration (s): 0.08 | learning rate: 5.205E-05 | global batch size: 256 | lm loss: 4.520095E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.321 | TFLOPs: 11.97 | 7: iteration 125880/ 173500 | consumed samples: 32225280 | consumed tokens: 65997373440 | elapsed time per iteration (s): 0.08 | learning rate: 5.203E-05 | global batch size: 256 | lm loss: 4.508359E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.338 | TFLOPs: 12.03 | 7: iteration 125890/ 173500 | consumed samples: 32227840 | consumed tokens: 66002616320 | elapsed time per iteration (s): 0.08 | learning rate: 5.202E-05 | global batch size: 256 | lm loss: 4.522993E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.458 | TFLOPs: 12.01 | 7: iteration 125900/ 173500 | consumed samples: 32230400 | consumed tokens: 66007859200 | elapsed time per iteration (s): 0.08 | learning rate: 5.201E-05 | global batch size: 256 | lm loss: 4.508429E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.365 | TFLOPs: 12.06 | 7: iteration 125910/ 173500 | consumed samples: 32232960 | consumed tokens: 66013102080 | elapsed time per iteration (s): 0.08 | learning rate: 5.200E-05 | global batch size: 256 | lm loss: 4.510013E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.954 | TFLOPs: 12.06 | 7: iteration 125920/ 173500 | consumed samples: 32235520 | consumed tokens: 66018344960 | elapsed time per iteration (s): 0.08 | learning rate: 5.198E-05 | global batch size: 256 | lm loss: 4.504158E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.152 | TFLOPs: 11.94 | 7: iteration 125930/ 173500 | consumed samples: 32238080 | consumed tokens: 66023587840 | elapsed time per iteration (s): 0.08 | learning rate: 5.197E-05 | global batch size: 256 | lm loss: 4.521086E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.983 | TFLOPs: 12.01 | 7: iteration 125940/ 173500 | consumed samples: 32240640 | consumed tokens: 66028830720 | elapsed time per iteration (s): 0.08 | learning rate: 5.196E-05 | global batch size: 256 | lm loss: 4.513327E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.761 | TFLOPs: 12.04 | 7: iteration 125950/ 173500 | consumed samples: 32243200 | consumed tokens: 66034073600 | elapsed time per iteration (s): 0.08 | learning rate: 5.194E-05 | global batch size: 256 | lm loss: 4.517315E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.427 | TFLOPs: 12.01 | 7: iteration 125960/ 173500 | consumed samples: 32245760 | consumed tokens: 66039316480 | elapsed time per iteration (s): 0.08 | learning rate: 5.193E-05 | global batch size: 256 | lm loss: 4.527402E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.169 | TFLOPs: 12.00 | 7: iteration 125970/ 173500 | consumed samples: 32248320 | consumed tokens: 66044559360 | elapsed time per iteration (s): 0.08 | learning rate: 5.192E-05 | global batch size: 256 | lm loss: 4.503511E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.653 | TFLOPs: 11.94 | 7: iteration 125980/ 173500 | consumed samples: 32250880 | consumed tokens: 66049802240 | elapsed time per iteration (s): 0.08 | learning rate: 5.191E-05 | global batch size: 256 | lm loss: 4.506358E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.586 | TFLOPs: 11.98 | 7: iteration 125990/ 173500 | consumed samples: 32253440 | consumed tokens: 66055045120 | elapsed time per iteration (s): 0.08 | learning rate: 5.189E-05 | global batch size: 256 | lm loss: 4.509737E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.424 | TFLOPs: 11.96 | 0: [2023-03-17 03:19:01,262] [INFO] [logging.py:68:log_dist] [Rank 0] step=126000, skipped=0, lr=[5.188210163686188e-05, 5.188210163686188e-05, 5.188210163686188e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 126000/ 173500 | consumed samples: 32256000 | consumed tokens: 66060288000 | elapsed time per iteration (s): 0.08 | learning rate: 5.188E-05 | global batch size: 256 | lm loss: 4.508201E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.492 | TFLOPs: 11.90 | 0: steps: 126000 loss: 4.5221 iter time (s): 0.083 samples/sec: 3092.865 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 126000 | lm loss value: 4.405303E+00 | lm loss PPL: 8.188395E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 126000 to checkpoints_14m91b100m 0: [2023-03-17 03:19:01,321] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step126000 is begin to save! 0: [2023-03-17 03:19:01,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:19:01,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:19:01,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:19:01,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:19:01,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:19:01,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:19:01,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:19:01,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:19:01,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:19:01,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:19:01,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:19:01,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:19:01,369] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step126000/mp_rank_00_model_states.pt 0: [2023-03-17 03:19:01,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:19:01,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:19:01,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:19:01,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 7: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 3: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 6: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 4: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 2: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 5: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 1: [2023-03-17 03:19:01,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:19:01,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step126000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:19:01,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step126000 is ready now! 0: successfully saved checkpoint at iteration 126000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 86.06 7: iteration 126010/ 173500 | consumed samples: 32258560 | consumed tokens: 66065530880 | elapsed time per iteration (s): 0.09 | learning rate: 5.187E-05 | global batch size: 256 | lm loss: 4.511601E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2738.330 | TFLOPs: 10.19 | 7: iteration 126020/ 173500 | consumed samples: 32261120 | consumed tokens: 66070773760 | elapsed time per iteration (s): 0.08 | learning rate: 5.186E-05 | global batch size: 256 | lm loss: 4.511374E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.933 | TFLOPs: 11.94 | 7: iteration 126030/ 173500 | consumed samples: 32263680 | consumed tokens: 66076016640 | elapsed time per iteration (s): 0.08 | learning rate: 5.184E-05 | global batch size: 256 | lm loss: 4.512096E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.957 | TFLOPs: 12.00 | 7: iteration 126040/ 173500 | consumed samples: 32266240 | consumed tokens: 66081259520 | elapsed time per iteration (s): 0.08 | learning rate: 5.183E-05 | global batch size: 256 | lm loss: 4.514130E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.607 | TFLOPs: 11.90 | 7: iteration 126050/ 173500 | consumed samples: 32268800 | consumed tokens: 66086502400 | elapsed time per iteration (s): 0.08 | learning rate: 5.182E-05 | global batch size: 256 | lm loss: 4.504897E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.204 | TFLOPs: 11.94 | 7: iteration 126060/ 173500 | consumed samples: 32271360 | consumed tokens: 66091745280 | elapsed time per iteration (s): 0.08 | learning rate: 5.181E-05 | global batch size: 256 | lm loss: 4.511348E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.795 | TFLOPs: 11.94 | 7: iteration 126070/ 173500 | consumed samples: 32273920 | consumed tokens: 66096988160 | elapsed time per iteration (s): 0.08 | learning rate: 5.179E-05 | global batch size: 256 | lm loss: 4.503131E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.885 | TFLOPs: 11.90 | 7: iteration 126080/ 173500 | consumed samples: 32276480 | consumed tokens: 66102231040 | elapsed time per iteration (s): 0.09 | learning rate: 5.178E-05 | global batch size: 256 | lm loss: 4.517456E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.532 | TFLOPs: 10.22 | 7: iteration 126090/ 173500 | consumed samples: 32279040 | consumed tokens: 66107473920 | elapsed time per iteration (s): 0.11 | learning rate: 5.177E-05 | global batch size: 256 | lm loss: 4.501219E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2407.898 | TFLOPs: 8.96 | 7: iteration 126100/ 173500 | consumed samples: 32281600 | consumed tokens: 66112716800 | elapsed time per iteration (s): 0.09 | learning rate: 5.176E-05 | global batch size: 256 | lm loss: 4.519007E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2909.115 | TFLOPs: 10.82 | 7: iteration 126110/ 173500 | consumed samples: 32284160 | consumed tokens: 66117959680 | elapsed time per iteration (s): 0.08 | learning rate: 5.174E-05 | global batch size: 256 | lm loss: 4.508015E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.983 | TFLOPs: 11.95 | 7: iteration 126120/ 173500 | consumed samples: 32286720 | consumed tokens: 66123202560 | elapsed time per iteration (s): 0.08 | learning rate: 5.173E-05 | global batch size: 256 | lm loss: 4.505487E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.449 | TFLOPs: 11.93 | 7: iteration 126130/ 173500 | consumed samples: 32289280 | consumed tokens: 66128445440 | elapsed time per iteration (s): 0.08 | learning rate: 5.172E-05 | global batch size: 256 | lm loss: 4.531460E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.908 | TFLOPs: 11.96 | 7: iteration 126140/ 173500 | consumed samples: 32291840 | consumed tokens: 66133688320 | elapsed time per iteration (s): 0.08 | learning rate: 5.171E-05 | global batch size: 256 | lm loss: 4.500373E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.936 | TFLOPs: 11.94 | 7: iteration 126150/ 173500 | consumed samples: 32294400 | consumed tokens: 66138931200 | elapsed time per iteration (s): 0.08 | learning rate: 5.169E-05 | global batch size: 256 | lm loss: 4.513415E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.914 | TFLOPs: 11.47 | 7: iteration 126160/ 173500 | consumed samples: 32296960 | consumed tokens: 66144174080 | elapsed time per iteration (s): 0.08 | learning rate: 5.168E-05 | global batch size: 256 | lm loss: 4.509375E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.674 | TFLOPs: 11.96 | 7: iteration 126170/ 173500 | consumed samples: 32299520 | consumed tokens: 66149416960 | elapsed time per iteration (s): 0.08 | learning rate: 5.167E-05 | global batch size: 256 | lm loss: 4.515672E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.472 | TFLOPs: 11.91 | 7: iteration 126180/ 173500 | consumed samples: 32302080 | consumed tokens: 66154659840 | elapsed time per iteration (s): 0.08 | learning rate: 5.166E-05 | global batch size: 256 | lm loss: 4.505743E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.839 | TFLOPs: 11.93 | 7: iteration 126190/ 173500 | consumed samples: 32304640 | consumed tokens: 66159902720 | elapsed time per iteration (s): 0.08 | learning rate: 5.164E-05 | global batch size: 256 | lm loss: 4.511286E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.200 | TFLOPs: 11.92 | 7: iteration 126200/ 173500 | consumed samples: 32307200 | consumed tokens: 66165145600 | elapsed time per iteration (s): 0.08 | learning rate: 5.163E-05 | global batch size: 256 | lm loss: 4.518562E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.438 | TFLOPs: 11.87 | 7: iteration 126210/ 173500 | consumed samples: 32309760 | consumed tokens: 66170388480 | elapsed time per iteration (s): 0.08 | learning rate: 5.162E-05 | global batch size: 256 | lm loss: 4.501915E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.011 | TFLOPs: 11.81 | 7: iteration 126220/ 173500 | consumed samples: 32312320 | consumed tokens: 66175631360 | elapsed time per iteration (s): 0.08 | learning rate: 5.161E-05 | global batch size: 256 | lm loss: 4.509581E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.860 | TFLOPs: 11.89 | 7: iteration 126230/ 173500 | consumed samples: 32314880 | consumed tokens: 66180874240 | elapsed time per iteration (s): 0.08 | learning rate: 5.159E-05 | global batch size: 256 | lm loss: 4.514265E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.729 | TFLOPs: 11.87 | 7: iteration 126240/ 173500 | consumed samples: 32317440 | consumed tokens: 66186117120 | elapsed time per iteration (s): 0.08 | learning rate: 5.158E-05 | global batch size: 256 | lm loss: 4.514663E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.056 | TFLOPs: 11.88 | 7: iteration 126250/ 173500 | consumed samples: 32320000 | consumed tokens: 66191360000 | elapsed time per iteration (s): 0.08 | learning rate: 5.157E-05 | global batch size: 256 | lm loss: 4.512586E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.852 | TFLOPs: 11.92 | 7: iteration 126260/ 173500 | consumed samples: 32322560 | consumed tokens: 66196602880 | elapsed time per iteration (s): 0.08 | learning rate: 5.156E-05 | global batch size: 256 | lm loss: 4.503479E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.692 | TFLOPs: 11.92 | 7: iteration 126270/ 173500 | consumed samples: 32325120 | consumed tokens: 66201845760 | elapsed time per iteration (s): 0.08 | learning rate: 5.154E-05 | global batch size: 256 | lm loss: 4.518076E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.818 | TFLOPs: 11.97 | 7: iteration 126280/ 173500 | consumed samples: 32327680 | consumed tokens: 66207088640 | elapsed time per iteration (s): 0.08 | learning rate: 5.153E-05 | global batch size: 256 | lm loss: 4.502534E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.846 | TFLOPs: 11.94 | 7: iteration 126290/ 173500 | consumed samples: 32330240 | consumed tokens: 66212331520 | elapsed time per iteration (s): 0.08 | learning rate: 5.152E-05 | global batch size: 256 | lm loss: 4.508394E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.265 | TFLOPs: 11.76 | 7: iteration 126300/ 173500 | consumed samples: 32332800 | consumed tokens: 66217574400 | elapsed time per iteration (s): 0.08 | learning rate: 5.151E-05 | global batch size: 256 | lm loss: 4.506672E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.960 | TFLOPs: 11.92 | 7: iteration 126310/ 173500 | consumed samples: 32335360 | consumed tokens: 66222817280 | elapsed time per iteration (s): 0.08 | learning rate: 5.149E-05 | global batch size: 256 | lm loss: 4.523857E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.676 | TFLOPs: 11.84 | 7: iteration 126320/ 173500 | consumed samples: 32337920 | consumed tokens: 66228060160 | elapsed time per iteration (s): 0.08 | learning rate: 5.148E-05 | global batch size: 256 | lm loss: 4.505150E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.962 | TFLOPs: 11.88 | 7: iteration 126330/ 173500 | consumed samples: 32340480 | consumed tokens: 66233303040 | elapsed time per iteration (s): 0.08 | learning rate: 5.147E-05 | global batch size: 256 | lm loss: 4.507188E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.171 | TFLOPs: 11.88 | 7: iteration 126340/ 173500 | consumed samples: 32343040 | consumed tokens: 66238545920 | elapsed time per iteration (s): 0.08 | learning rate: 5.146E-05 | global batch size: 256 | lm loss: 4.518790E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.383 | TFLOPs: 11.71 | 7: iteration 126350/ 173500 | consumed samples: 32345600 | consumed tokens: 66243788800 | elapsed time per iteration (s): 0.08 | learning rate: 5.144E-05 | global batch size: 256 | lm loss: 4.518109E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.300 | TFLOPs: 11.80 | 7: iteration 126360/ 173500 | consumed samples: 32348160 | consumed tokens: 66249031680 | elapsed time per iteration (s): 0.08 | learning rate: 5.143E-05 | global batch size: 256 | lm loss: 4.522441E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.718 | TFLOPs: 11.83 | 7: iteration 126370/ 173500 | consumed samples: 32350720 | consumed tokens: 66254274560 | elapsed time per iteration (s): 0.08 | learning rate: 5.142E-05 | global batch size: 256 | lm loss: 4.506598E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.698 | TFLOPs: 11.89 | 7: iteration 126380/ 173500 | consumed samples: 32353280 | consumed tokens: 66259517440 | elapsed time per iteration (s): 0.08 | learning rate: 5.141E-05 | global batch size: 256 | lm loss: 4.517180E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.884 | TFLOPs: 11.79 | 7: iteration 126390/ 173500 | consumed samples: 32355840 | consumed tokens: 66264760320 | elapsed time per iteration (s): 0.08 | learning rate: 5.139E-05 | global batch size: 256 | lm loss: 4.520650E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.305 | TFLOPs: 12.02 | 7: iteration 126400/ 173500 | consumed samples: 32358400 | consumed tokens: 66270003200 | elapsed time per iteration (s): 0.08 | learning rate: 5.138E-05 | global batch size: 256 | lm loss: 4.518233E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.138 | TFLOPs: 12.05 | 7: iteration 126410/ 173500 | consumed samples: 32360960 | consumed tokens: 66275246080 | elapsed time per iteration (s): 0.08 | learning rate: 5.137E-05 | global batch size: 256 | lm loss: 4.510414E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.126 | TFLOPs: 11.74 | 7: iteration 126420/ 173500 | consumed samples: 32363520 | consumed tokens: 66280488960 | elapsed time per iteration (s): 0.08 | learning rate: 5.136E-05 | global batch size: 256 | lm loss: 4.503003E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.785 | TFLOPs: 11.78 | 7: iteration 126430/ 173500 | consumed samples: 32366080 | consumed tokens: 66285731840 | elapsed time per iteration (s): 0.08 | learning rate: 5.134E-05 | global batch size: 256 | lm loss: 4.515416E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.359 | TFLOPs: 12.02 | 7: iteration 126440/ 173500 | consumed samples: 32368640 | consumed tokens: 66290974720 | elapsed time per iteration (s): 0.08 | learning rate: 5.133E-05 | global batch size: 256 | lm loss: 4.511631E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.750 | TFLOPs: 11.87 | 7: iteration 126450/ 173500 | consumed samples: 32371200 | consumed tokens: 66296217600 | elapsed time per iteration (s): 0.08 | learning rate: 5.132E-05 | global batch size: 256 | lm loss: 4.497694E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.494 | TFLOPs: 11.44 | 7: iteration 126460/ 173500 | consumed samples: 32373760 | consumed tokens: 66301460480 | elapsed time per iteration (s): 0.08 | learning rate: 5.131E-05 | global batch size: 256 | lm loss: 4.501182E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.323 | TFLOPs: 11.98 | 7: iteration 126470/ 173500 | consumed samples: 32376320 | consumed tokens: 66306703360 | elapsed time per iteration (s): 0.08 | learning rate: 5.129E-05 | global batch size: 256 | lm loss: 4.506371E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.046 | TFLOPs: 11.73 | 7: iteration 126480/ 173500 | consumed samples: 32378880 | consumed tokens: 66311946240 | elapsed time per iteration (s): 0.08 | learning rate: 5.128E-05 | global batch size: 256 | lm loss: 4.504161E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.578 | TFLOPs: 11.99 | 7: iteration 126490/ 173500 | consumed samples: 32381440 | consumed tokens: 66317189120 | elapsed time per iteration (s): 0.08 | learning rate: 5.127E-05 | global batch size: 256 | lm loss: 4.514711E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.020 | TFLOPs: 12.03 | 7: iteration 126500/ 173500 | consumed samples: 32384000 | consumed tokens: 66322432000 | elapsed time per iteration (s): 0.08 | learning rate: 5.126E-05 | global batch size: 256 | lm loss: 4.524939E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3017.287 | TFLOPs: 11.22 | 7: iteration 126510/ 173500 | consumed samples: 32386560 | consumed tokens: 66327674880 | elapsed time per iteration (s): 0.10 | learning rate: 5.124E-05 | global batch size: 256 | lm loss: 4.503910E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2631.699 | TFLOPs: 9.79 | 7: iteration 126520/ 173500 | consumed samples: 32389120 | consumed tokens: 66332917760 | elapsed time per iteration (s): 0.09 | learning rate: 5.123E-05 | global batch size: 256 | lm loss: 4.507642E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.593 | TFLOPs: 10.45 | 7: iteration 126530/ 173500 | consumed samples: 32391680 | consumed tokens: 66338160640 | elapsed time per iteration (s): 0.08 | learning rate: 5.122E-05 | global batch size: 256 | lm loss: 4.519281E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.348 | TFLOPs: 12.02 | 7: iteration 126540/ 173500 | consumed samples: 32394240 | consumed tokens: 66343403520 | elapsed time per iteration (s): 0.08 | learning rate: 5.121E-05 | global batch size: 256 | lm loss: 4.512893E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.471 | TFLOPs: 12.03 | 7: iteration 126550/ 173500 | consumed samples: 32396800 | consumed tokens: 66348646400 | elapsed time per iteration (s): 0.08 | learning rate: 5.119E-05 | global batch size: 256 | lm loss: 4.511807E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.967 | TFLOPs: 12.05 | 7: iteration 126560/ 173500 | consumed samples: 32399360 | consumed tokens: 66353889280 | elapsed time per iteration (s): 0.08 | learning rate: 5.118E-05 | global batch size: 256 | lm loss: 4.521579E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.853 | TFLOPs: 11.76 | 7: iteration 126570/ 173500 | consumed samples: 32401920 | consumed tokens: 66359132160 | elapsed time per iteration (s): 0.08 | learning rate: 5.117E-05 | global batch size: 256 | lm loss: 4.515562E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.985 | TFLOPs: 11.78 | 7: iteration 126580/ 173500 | consumed samples: 32404480 | consumed tokens: 66364375040 | elapsed time per iteration (s): 0.08 | learning rate: 5.116E-05 | global batch size: 256 | lm loss: 4.505363E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.533 | TFLOPs: 12.01 | 7: iteration 126590/ 173500 | consumed samples: 32407040 | consumed tokens: 66369617920 | elapsed time per iteration (s): 0.08 | learning rate: 5.114E-05 | global batch size: 256 | lm loss: 4.505510E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.824 | TFLOPs: 11.79 | 7: iteration 126600/ 173500 | consumed samples: 32409600 | consumed tokens: 66374860800 | elapsed time per iteration (s): 0.08 | learning rate: 5.113E-05 | global batch size: 256 | lm loss: 4.496047E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.284 | TFLOPs: 11.94 | 7: iteration 126610/ 173500 | consumed samples: 32412160 | consumed tokens: 66380103680 | elapsed time per iteration (s): 0.08 | learning rate: 5.112E-05 | global batch size: 256 | lm loss: 4.506171E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.906 | TFLOPs: 12.02 | 7: iteration 126620/ 173500 | consumed samples: 32414720 | consumed tokens: 66385346560 | elapsed time per iteration (s): 0.08 | learning rate: 5.111E-05 | global batch size: 256 | lm loss: 4.504952E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.568 | TFLOPs: 12.02 | 7: iteration 126630/ 173500 | consumed samples: 32417280 | consumed tokens: 66390589440 | elapsed time per iteration (s): 0.08 | learning rate: 5.109E-05 | global batch size: 256 | lm loss: 4.512398E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.755 | TFLOPs: 12.04 | 7: iteration 126640/ 173500 | consumed samples: 32419840 | consumed tokens: 66395832320 | elapsed time per iteration (s): 0.08 | learning rate: 5.108E-05 | global batch size: 256 | lm loss: 4.521820E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.714 | TFLOPs: 11.90 | 7: iteration 126650/ 173500 | consumed samples: 32422400 | consumed tokens: 66401075200 | elapsed time per iteration (s): 0.08 | learning rate: 5.107E-05 | global batch size: 256 | lm loss: 4.515118E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.975 | TFLOPs: 11.74 | 7: iteration 126660/ 173500 | consumed samples: 32424960 | consumed tokens: 66406318080 | elapsed time per iteration (s): 0.08 | learning rate: 5.106E-05 | global batch size: 256 | lm loss: 4.504141E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.440 | TFLOPs: 12.01 | 7: iteration 126670/ 173500 | consumed samples: 32427520 | consumed tokens: 66411560960 | elapsed time per iteration (s): 0.08 | learning rate: 5.104E-05 | global batch size: 256 | lm loss: 4.510663E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.622 | TFLOPs: 11.75 | 7: iteration 126680/ 173500 | consumed samples: 32430080 | consumed tokens: 66416803840 | elapsed time per iteration (s): 0.08 | learning rate: 5.103E-05 | global batch size: 256 | lm loss: 4.495127E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.903 | TFLOPs: 12.00 | 7: iteration 126690/ 173500 | consumed samples: 32432640 | consumed tokens: 66422046720 | elapsed time per iteration (s): 0.08 | learning rate: 5.102E-05 | global batch size: 256 | lm loss: 4.505273E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.468 | TFLOPs: 11.93 | 7: iteration 126700/ 173500 | consumed samples: 32435200 | consumed tokens: 66427289600 | elapsed time per iteration (s): 0.08 | learning rate: 5.101E-05 | global batch size: 256 | lm loss: 4.512016E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.874 | TFLOPs: 11.89 | 7: iteration 126710/ 173500 | consumed samples: 32437760 | consumed tokens: 66432532480 | elapsed time per iteration (s): 0.09 | learning rate: 5.099E-05 | global batch size: 256 | lm loss: 4.515027E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2927.433 | TFLOPs: 10.89 | 7: iteration 126720/ 173500 | consumed samples: 32440320 | consumed tokens: 66437775360 | elapsed time per iteration (s): 0.09 | learning rate: 5.098E-05 | global batch size: 256 | lm loss: 4.509839E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.912 | TFLOPs: 10.23 | 7: iteration 126730/ 173500 | consumed samples: 32442880 | consumed tokens: 66443018240 | elapsed time per iteration (s): 0.08 | learning rate: 5.097E-05 | global batch size: 256 | lm loss: 4.504621E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.044 | TFLOPs: 11.85 | 7: iteration 126740/ 173500 | consumed samples: 32445440 | consumed tokens: 66448261120 | elapsed time per iteration (s): 0.08 | learning rate: 5.096E-05 | global batch size: 256 | lm loss: 4.530424E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.977 | TFLOPs: 11.42 | 7: iteration 126750/ 173500 | consumed samples: 32448000 | consumed tokens: 66453504000 | elapsed time per iteration (s): 0.08 | learning rate: 5.094E-05 | global batch size: 256 | lm loss: 4.502484E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.609 | TFLOPs: 11.71 | 7: iteration 126760/ 173500 | consumed samples: 32450560 | consumed tokens: 66458746880 | elapsed time per iteration (s): 0.08 | learning rate: 5.093E-05 | global batch size: 256 | lm loss: 4.506342E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.245 | TFLOPs: 11.55 | 7: iteration 126770/ 173500 | consumed samples: 32453120 | consumed tokens: 66463989760 | elapsed time per iteration (s): 0.08 | learning rate: 5.092E-05 | global batch size: 256 | lm loss: 4.509011E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.993 | TFLOPs: 11.77 | 7: iteration 126780/ 173500 | consumed samples: 32455680 | consumed tokens: 66469232640 | elapsed time per iteration (s): 0.08 | learning rate: 5.091E-05 | global batch size: 256 | lm loss: 4.515269E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.955 | TFLOPs: 11.62 | 7: iteration 126790/ 173500 | consumed samples: 32458240 | consumed tokens: 66474475520 | elapsed time per iteration (s): 0.08 | learning rate: 5.090E-05 | global batch size: 256 | lm loss: 4.503088E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.225 | TFLOPs: 11.88 | 7: iteration 126800/ 173500 | consumed samples: 32460800 | consumed tokens: 66479718400 | elapsed time per iteration (s): 0.09 | learning rate: 5.088E-05 | global batch size: 256 | lm loss: 4.510926E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.829 | TFLOPs: 10.89 | 7: iteration 126810/ 173500 | consumed samples: 32463360 | consumed tokens: 66484961280 | elapsed time per iteration (s): 0.08 | learning rate: 5.087E-05 | global batch size: 256 | lm loss: 4.522750E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.805 | TFLOPs: 11.88 | 7: iteration 126820/ 173500 | consumed samples: 32465920 | consumed tokens: 66490204160 | elapsed time per iteration (s): 0.08 | learning rate: 5.086E-05 | global batch size: 256 | lm loss: 4.505036E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.980 | TFLOPs: 11.88 | 7: iteration 126830/ 173500 | consumed samples: 32468480 | consumed tokens: 66495447040 | elapsed time per iteration (s): 0.08 | learning rate: 5.085E-05 | global batch size: 256 | lm loss: 4.529279E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.597 | TFLOPs: 11.60 | 7: iteration 126840/ 173500 | consumed samples: 32471040 | consumed tokens: 66500689920 | elapsed time per iteration (s): 0.10 | learning rate: 5.083E-05 | global batch size: 256 | lm loss: 4.496455E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2681.608 | TFLOPs: 9.97 | 7: iteration 126850/ 173500 | consumed samples: 32473600 | consumed tokens: 66505932800 | elapsed time per iteration (s): 0.10 | learning rate: 5.082E-05 | global batch size: 256 | lm loss: 4.516260E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.717 | TFLOPs: 9.73 | 7: iteration 126860/ 173500 | consumed samples: 32476160 | consumed tokens: 66511175680 | elapsed time per iteration (s): 0.10 | learning rate: 5.081E-05 | global batch size: 256 | lm loss: 4.509239E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.322 | TFLOPs: 9.74 | 7: iteration 126870/ 173500 | consumed samples: 32478720 | consumed tokens: 66516418560 | elapsed time per iteration (s): 0.09 | learning rate: 5.080E-05 | global batch size: 256 | lm loss: 4.506155E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.750 | TFLOPs: 10.20 | 7: iteration 126880/ 173500 | consumed samples: 32481280 | consumed tokens: 66521661440 | elapsed time per iteration (s): 0.08 | learning rate: 5.078E-05 | global batch size: 256 | lm loss: 4.512705E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.077 | TFLOPs: 11.91 | 7: iteration 126890/ 173500 | consumed samples: 32483840 | consumed tokens: 66526904320 | elapsed time per iteration (s): 0.08 | learning rate: 5.077E-05 | global batch size: 256 | lm loss: 4.517629E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.368 | TFLOPs: 11.37 | 7: iteration 126900/ 173500 | consumed samples: 32486400 | consumed tokens: 66532147200 | elapsed time per iteration (s): 0.08 | learning rate: 5.076E-05 | global batch size: 256 | lm loss: 4.504538E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.074 | TFLOPs: 11.87 | 7: iteration 126910/ 173500 | consumed samples: 32488960 | consumed tokens: 66537390080 | elapsed time per iteration (s): 0.08 | learning rate: 5.075E-05 | global batch size: 256 | lm loss: 4.520958E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.902 | TFLOPs: 11.79 | 7: iteration 126920/ 173500 | consumed samples: 32491520 | consumed tokens: 66542632960 | elapsed time per iteration (s): 0.08 | learning rate: 5.073E-05 | global batch size: 256 | lm loss: 4.512737E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.258 | TFLOPs: 11.90 | 7: iteration 126930/ 173500 | consumed samples: 32494080 | consumed tokens: 66547875840 | elapsed time per iteration (s): 0.09 | learning rate: 5.072E-05 | global batch size: 256 | lm loss: 4.492311E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.143 | TFLOPs: 10.99 | 7: iteration 126940/ 173500 | consumed samples: 32496640 | consumed tokens: 66553118720 | elapsed time per iteration (s): 0.08 | learning rate: 5.071E-05 | global batch size: 256 | lm loss: 4.511356E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.328 | TFLOPs: 11.90 | 7: iteration 126950/ 173500 | consumed samples: 32499200 | consumed tokens: 66558361600 | elapsed time per iteration (s): 0.08 | learning rate: 5.070E-05 | global batch size: 256 | lm loss: 4.510718E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.235 | TFLOPs: 11.84 | 7: iteration 126960/ 173500 | consumed samples: 32501760 | consumed tokens: 66563604480 | elapsed time per iteration (s): 0.08 | learning rate: 5.068E-05 | global batch size: 256 | lm loss: 4.499705E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.144 | TFLOPs: 11.69 | 7: iteration 126970/ 173500 | consumed samples: 32504320 | consumed tokens: 66568847360 | elapsed time per iteration (s): 0.08 | learning rate: 5.067E-05 | global batch size: 256 | lm loss: 4.516531E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.676 | TFLOPs: 11.90 | 7: iteration 126980/ 173500 | consumed samples: 32506880 | consumed tokens: 66574090240 | elapsed time per iteration (s): 0.08 | learning rate: 5.066E-05 | global batch size: 256 | lm loss: 4.510822E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.939 | TFLOPs: 11.84 | 7: iteration 126990/ 173500 | consumed samples: 32509440 | consumed tokens: 66579333120 | elapsed time per iteration (s): 0.08 | learning rate: 5.065E-05 | global batch size: 256 | lm loss: 4.504195E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.850 | TFLOPs: 11.87 | 7: iteration 127000/ 173500 | consumed samples: 32512000 | consumed tokens: 66584576000 | elapsed time per iteration (s): 0.08 | learning rate: 5.064E-05 | global batch size: 256 | lm loss: 4.502028E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.103 | TFLOPs: 11.81 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 127000 | lm loss value: 4.405925E+00 | lm loss PPL: 8.193492E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 127000 to checkpoints_14m91b100m 0: [2023-03-17 03:20:23,487] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step127000 is begin to save! 0: [2023-03-17 03:20:23,491] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:20:23,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:20:23,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:20:23,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:20:23,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:20:23,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:20:23,523] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:20:23,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:20:23,526] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:20:23,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:20:23,529] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:20:23,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:20:23,530] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step127000/mp_rank_00_model_states.pt 0: [2023-03-17 03:20:23,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:20:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:20:23,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:20:23,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 6: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: [2023-03-17 03:20:23,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 2: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 7: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 5: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 3: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 03:20:23,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 4: [2023-03-17 03:20:23,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 1: [2023-03-17 03:20:23,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:20:23,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step127000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:20:23,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step127000 is ready now! 0: successfully saved checkpoint at iteration 127000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.62 7: iteration 127010/ 173500 | consumed samples: 32514560 | consumed tokens: 66589818880 | elapsed time per iteration (s): 0.09 | learning rate: 5.062E-05 | global batch size: 256 | lm loss: 4.512888E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.579 | TFLOPs: 10.42 | 7: iteration 127020/ 173500 | consumed samples: 32517120 | consumed tokens: 66595061760 | elapsed time per iteration (s): 0.08 | learning rate: 5.061E-05 | global batch size: 256 | lm loss: 4.511346E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.933 | TFLOPs: 11.91 | 7: iteration 127030/ 173500 | consumed samples: 32519680 | consumed tokens: 66600304640 | elapsed time per iteration (s): 0.08 | learning rate: 5.060E-05 | global batch size: 256 | lm loss: 4.499176E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.834 | TFLOPs: 11.79 | 7: iteration 127040/ 173500 | consumed samples: 32522240 | consumed tokens: 66605547520 | elapsed time per iteration (s): 0.08 | learning rate: 5.059E-05 | global batch size: 256 | lm loss: 4.522105E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.713 | TFLOPs: 11.79 | 7: iteration 127050/ 173500 | consumed samples: 32524800 | consumed tokens: 66610790400 | elapsed time per iteration (s): 0.08 | learning rate: 5.057E-05 | global batch size: 256 | lm loss: 4.504051E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.198 | TFLOPs: 11.87 | 7: iteration 127060/ 173500 | consumed samples: 32527360 | consumed tokens: 66616033280 | elapsed time per iteration (s): 0.08 | learning rate: 5.056E-05 | global batch size: 256 | lm loss: 4.505769E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.356 | TFLOPs: 11.89 | 7: iteration 127070/ 173500 | consumed samples: 32529920 | consumed tokens: 66621276160 | elapsed time per iteration (s): 0.09 | learning rate: 5.055E-05 | global batch size: 256 | lm loss: 4.518069E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.037 | TFLOPs: 10.84 | 7: iteration 127080/ 173500 | consumed samples: 32532480 | consumed tokens: 66626519040 | elapsed time per iteration (s): 0.11 | learning rate: 5.054E-05 | global batch size: 256 | lm loss: 4.504264E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2381.779 | TFLOPs: 8.86 | 7: iteration 127090/ 173500 | consumed samples: 32535040 | consumed tokens: 66631761920 | elapsed time per iteration (s): 0.11 | learning rate: 5.052E-05 | global batch size: 256 | lm loss: 4.520001E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2305.593 | TFLOPs: 8.58 | 7: iteration 127100/ 173500 | consumed samples: 32537600 | consumed tokens: 66637004800 | elapsed time per iteration (s): 0.13 | learning rate: 5.051E-05 | global batch size: 256 | lm loss: 4.497375E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.552 | TFLOPs: 7.50 | 7: iteration 127110/ 173500 | consumed samples: 32540160 | consumed tokens: 66642247680 | elapsed time per iteration (s): 0.10 | learning rate: 5.050E-05 | global batch size: 256 | lm loss: 4.502708E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2659.531 | TFLOPs: 9.89 | 7: iteration 127120/ 173500 | consumed samples: 32542720 | consumed tokens: 66647490560 | elapsed time per iteration (s): 0.14 | learning rate: 5.049E-05 | global batch size: 256 | lm loss: 4.502053E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1876.941 | TFLOPs: 6.98 | 7: iteration 127130/ 173500 | consumed samples: 32545280 | consumed tokens: 66652733440 | elapsed time per iteration (s): 0.09 | learning rate: 5.047E-05 | global batch size: 256 | lm loss: 4.519958E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.147 | TFLOPs: 10.77 | 7: iteration 127140/ 173500 | consumed samples: 32547840 | consumed tokens: 66657976320 | elapsed time per iteration (s): 0.08 | learning rate: 5.046E-05 | global batch size: 256 | lm loss: 4.506473E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.332 | TFLOPs: 12.02 | 7: iteration 127150/ 173500 | consumed samples: 32550400 | consumed tokens: 66663219200 | elapsed time per iteration (s): 0.08 | learning rate: 5.045E-05 | global batch size: 256 | lm loss: 4.521043E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.971 | TFLOPs: 11.88 | 7: iteration 127160/ 173500 | consumed samples: 32552960 | consumed tokens: 66668462080 | elapsed time per iteration (s): 0.12 | learning rate: 5.044E-05 | global batch size: 256 | lm loss: 4.517727E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2180.188 | TFLOPs: 8.11 | 7: iteration 127170/ 173500 | consumed samples: 32555520 | consumed tokens: 66673704960 | elapsed time per iteration (s): 0.09 | learning rate: 5.042E-05 | global batch size: 256 | lm loss: 4.511380E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.611 | TFLOPs: 11.01 | 7: iteration 127180/ 173500 | consumed samples: 32558080 | consumed tokens: 66678947840 | elapsed time per iteration (s): 0.09 | learning rate: 5.041E-05 | global batch size: 256 | lm loss: 4.515597E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2760.587 | TFLOPs: 10.27 | 7: iteration 127190/ 173500 | consumed samples: 32560640 | consumed tokens: 66684190720 | elapsed time per iteration (s): 0.08 | learning rate: 5.040E-05 | global batch size: 256 | lm loss: 4.504645E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3092.689 | TFLOPs: 11.50 | 7: iteration 127200/ 173500 | consumed samples: 32563200 | consumed tokens: 66689433600 | elapsed time per iteration (s): 0.08 | learning rate: 5.039E-05 | global batch size: 256 | lm loss: 4.500134E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.733 | TFLOPs: 11.22 | 7: iteration 127210/ 173500 | consumed samples: 32565760 | consumed tokens: 66694676480 | elapsed time per iteration (s): 0.09 | learning rate: 5.038E-05 | global batch size: 256 | lm loss: 4.507584E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.681 | TFLOPs: 10.04 | 7: iteration 127220/ 173500 | consumed samples: 32568320 | consumed tokens: 66699919360 | elapsed time per iteration (s): 0.09 | learning rate: 5.036E-05 | global batch size: 256 | lm loss: 4.504486E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.697 | TFLOPs: 11.06 | 7: iteration 127230/ 173500 | consumed samples: 32570880 | consumed tokens: 66705162240 | elapsed time per iteration (s): 0.08 | learning rate: 5.035E-05 | global batch size: 256 | lm loss: 4.515669E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.670 | TFLOPs: 11.74 | 7: iteration 127240/ 173500 | consumed samples: 32573440 | consumed tokens: 66710405120 | elapsed time per iteration (s): 0.08 | learning rate: 5.034E-05 | global batch size: 256 | lm loss: 4.514976E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3027.676 | TFLOPs: 11.26 | 7: iteration 127250/ 173500 | consumed samples: 32576000 | consumed tokens: 66715648000 | elapsed time per iteration (s): 0.09 | learning rate: 5.033E-05 | global batch size: 256 | lm loss: 4.518310E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.394 | TFLOPs: 10.84 | 7: iteration 127260/ 173500 | consumed samples: 32578560 | consumed tokens: 66720890880 | elapsed time per iteration (s): 0.08 | learning rate: 5.031E-05 | global batch size: 256 | lm loss: 4.509954E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.379 | TFLOPs: 11.99 | 7: iteration 127270/ 173500 | consumed samples: 32581120 | consumed tokens: 66726133760 | elapsed time per iteration (s): 0.08 | learning rate: 5.030E-05 | global batch size: 256 | lm loss: 4.507937E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3023.389 | TFLOPs: 11.25 | 7: iteration 127280/ 173500 | consumed samples: 32583680 | consumed tokens: 66731376640 | elapsed time per iteration (s): 0.09 | learning rate: 5.029E-05 | global batch size: 256 | lm loss: 4.505016E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.590 | TFLOPs: 10.58 | 7: iteration 127290/ 173500 | consumed samples: 32586240 | consumed tokens: 66736619520 | elapsed time per iteration (s): 0.08 | learning rate: 5.028E-05 | global batch size: 256 | lm loss: 4.516960E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.190 | TFLOPs: 11.93 | 7: iteration 127300/ 173500 | consumed samples: 32588800 | consumed tokens: 66741862400 | elapsed time per iteration (s): 0.13 | learning rate: 5.026E-05 | global batch size: 256 | lm loss: 4.516580E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1944.899 | TFLOPs: 7.23 | 7: iteration 127310/ 173500 | consumed samples: 32591360 | consumed tokens: 66747105280 | elapsed time per iteration (s): 0.11 | learning rate: 5.025E-05 | global batch size: 256 | lm loss: 4.500063E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2247.405 | TFLOPs: 8.36 | 7: iteration 127320/ 173500 | consumed samples: 32593920 | consumed tokens: 66752348160 | elapsed time per iteration (s): 0.08 | learning rate: 5.024E-05 | global batch size: 256 | lm loss: 4.517875E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.000 | TFLOPs: 12.02 | 7: iteration 127330/ 173500 | consumed samples: 32596480 | consumed tokens: 66757591040 | elapsed time per iteration (s): 0.08 | learning rate: 5.023E-05 | global batch size: 256 | lm loss: 4.514928E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.442 | TFLOPs: 12.01 | 7: iteration 127340/ 173500 | consumed samples: 32599040 | consumed tokens: 66762833920 | elapsed time per iteration (s): 0.08 | learning rate: 5.022E-05 | global batch size: 256 | lm loss: 4.508993E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.877 | TFLOPs: 12.02 | 7: iteration 127350/ 173500 | consumed samples: 32601600 | consumed tokens: 66768076800 | elapsed time per iteration (s): 0.09 | learning rate: 5.020E-05 | global batch size: 256 | lm loss: 4.507698E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2854.855 | TFLOPs: 10.62 | 7: iteration 127360/ 173500 | consumed samples: 32604160 | consumed tokens: 66773319680 | elapsed time per iteration (s): 0.08 | learning rate: 5.019E-05 | global batch size: 256 | lm loss: 4.512190E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.952 | TFLOPs: 11.74 | 7: iteration 127370/ 173500 | consumed samples: 32606720 | consumed tokens: 66778562560 | elapsed time per iteration (s): 0.09 | learning rate: 5.018E-05 | global batch size: 256 | lm loss: 4.499924E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2781.595 | TFLOPs: 10.35 | 7: iteration 127380/ 173500 | consumed samples: 32609280 | consumed tokens: 66783805440 | elapsed time per iteration (s): 0.08 | learning rate: 5.017E-05 | global batch size: 256 | lm loss: 4.510183E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.647 | TFLOPs: 12.02 | 7: iteration 127390/ 173500 | consumed samples: 32611840 | consumed tokens: 66789048320 | elapsed time per iteration (s): 0.08 | learning rate: 5.015E-05 | global batch size: 256 | lm loss: 4.506866E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.080 | TFLOPs: 11.72 | 7: iteration 127400/ 173500 | consumed samples: 32614400 | consumed tokens: 66794291200 | elapsed time per iteration (s): 0.10 | learning rate: 5.014E-05 | global batch size: 256 | lm loss: 4.507293E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.607 | TFLOPs: 9.96 | 7: iteration 127410/ 173500 | consumed samples: 32616960 | consumed tokens: 66799534080 | elapsed time per iteration (s): 0.08 | learning rate: 5.013E-05 | global batch size: 256 | lm loss: 4.500690E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.870 | TFLOPs: 11.97 | 7: iteration 127420/ 173500 | consumed samples: 32619520 | consumed tokens: 66804776960 | elapsed time per iteration (s): 0.08 | learning rate: 5.012E-05 | global batch size: 256 | lm loss: 4.514250E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.915 | TFLOPs: 12.00 | 7: iteration 127430/ 173500 | consumed samples: 32622080 | consumed tokens: 66810019840 | elapsed time per iteration (s): 0.08 | learning rate: 5.010E-05 | global batch size: 256 | lm loss: 4.519383E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.126 | TFLOPs: 11.61 | 7: iteration 127440/ 173500 | consumed samples: 32624640 | consumed tokens: 66815262720 | elapsed time per iteration (s): 0.11 | learning rate: 5.009E-05 | global batch size: 256 | lm loss: 4.513367E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2297.557 | TFLOPs: 8.55 | 7: iteration 127450/ 173500 | consumed samples: 32627200 | consumed tokens: 66820505600 | elapsed time per iteration (s): 0.10 | learning rate: 5.008E-05 | global batch size: 256 | lm loss: 4.511817E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2454.681 | TFLOPs: 9.13 | 7: iteration 127460/ 173500 | consumed samples: 32629760 | consumed tokens: 66825748480 | elapsed time per iteration (s): 0.09 | learning rate: 5.007E-05 | global batch size: 256 | lm loss: 4.508923E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.869 | TFLOPs: 10.61 | 7: iteration 127470/ 173500 | consumed samples: 32632320 | consumed tokens: 66830991360 | elapsed time per iteration (s): 0.08 | learning rate: 5.006E-05 | global batch size: 256 | lm loss: 4.508533E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.190 | TFLOPs: 12.03 | 7: iteration 127480/ 173500 | consumed samples: 32634880 | consumed tokens: 66836234240 | elapsed time per iteration (s): 0.09 | learning rate: 5.004E-05 | global batch size: 256 | lm loss: 4.509353E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.058 | TFLOPs: 10.82 | 7: iteration 127490/ 173500 | consumed samples: 32637440 | consumed tokens: 66841477120 | elapsed time per iteration (s): 0.09 | learning rate: 5.003E-05 | global batch size: 256 | lm loss: 4.512833E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.128 | TFLOPs: 11.07 | 7: iteration 127500/ 173500 | consumed samples: 32640000 | consumed tokens: 66846720000 | elapsed time per iteration (s): 0.08 | learning rate: 5.002E-05 | global batch size: 256 | lm loss: 4.513646E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.033 | TFLOPs: 11.87 | 7: iteration 127510/ 173500 | consumed samples: 32642560 | consumed tokens: 66851962880 | elapsed time per iteration (s): 0.09 | learning rate: 5.001E-05 | global batch size: 256 | lm loss: 4.509279E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.839 | TFLOPs: 10.91 | 7: iteration 127520/ 173500 | consumed samples: 32645120 | consumed tokens: 66857205760 | elapsed time per iteration (s): 0.08 | learning rate: 4.999E-05 | global batch size: 256 | lm loss: 4.528515E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3089.063 | TFLOPs: 11.49 | 7: iteration 127530/ 173500 | consumed samples: 32647680 | consumed tokens: 66862448640 | elapsed time per iteration (s): 0.11 | learning rate: 4.998E-05 | global batch size: 256 | lm loss: 4.522938E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2287.478 | TFLOPs: 8.51 | 7: iteration 127540/ 173500 | consumed samples: 32650240 | consumed tokens: 66867691520 | elapsed time per iteration (s): 0.11 | learning rate: 4.997E-05 | global batch size: 256 | lm loss: 4.515686E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2298.321 | TFLOPs: 8.55 | 7: iteration 127550/ 173500 | consumed samples: 32652800 | consumed tokens: 66872934400 | elapsed time per iteration (s): 0.11 | learning rate: 4.996E-05 | global batch size: 256 | lm loss: 4.505146E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.051 | TFLOPs: 8.88 | 7: iteration 127560/ 173500 | consumed samples: 32655360 | consumed tokens: 66878177280 | elapsed time per iteration (s): 0.10 | learning rate: 4.995E-05 | global batch size: 256 | lm loss: 4.514450E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.599 | TFLOPs: 9.09 | 7: iteration 127570/ 173500 | consumed samples: 32657920 | consumed tokens: 66883420160 | elapsed time per iteration (s): 0.11 | learning rate: 4.993E-05 | global batch size: 256 | lm loss: 4.521341E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.163 | TFLOPs: 8.85 | 7: iteration 127580/ 173500 | consumed samples: 32660480 | consumed tokens: 66888663040 | elapsed time per iteration (s): 0.14 | learning rate: 4.992E-05 | global batch size: 256 | lm loss: 4.503974E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1867.448 | TFLOPs: 6.95 | 7: iteration 127590/ 173500 | consumed samples: 32663040 | consumed tokens: 66893905920 | elapsed time per iteration (s): 0.10 | learning rate: 4.991E-05 | global batch size: 256 | lm loss: 4.511538E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.656 | TFLOPs: 9.41 | 7: iteration 127600/ 173500 | consumed samples: 32665600 | consumed tokens: 66899148800 | elapsed time per iteration (s): 0.08 | learning rate: 4.990E-05 | global batch size: 256 | lm loss: 4.518357E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.107 | TFLOPs: 11.68 | 7: iteration 127610/ 173500 | consumed samples: 32668160 | consumed tokens: 66904391680 | elapsed time per iteration (s): 0.08 | learning rate: 4.988E-05 | global batch size: 256 | lm loss: 4.519951E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.996 | TFLOPs: 11.98 | 7: iteration 127620/ 173500 | consumed samples: 32670720 | consumed tokens: 66909634560 | elapsed time per iteration (s): 0.08 | learning rate: 4.987E-05 | global batch size: 256 | lm loss: 4.509456E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.630 | TFLOPs: 11.66 | 7: iteration 127630/ 173500 | consumed samples: 32673280 | consumed tokens: 66914877440 | elapsed time per iteration (s): 0.08 | learning rate: 4.986E-05 | global batch size: 256 | lm loss: 4.508580E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.295 | TFLOPs: 11.96 | 7: iteration 127640/ 173500 | consumed samples: 32675840 | consumed tokens: 66920120320 | elapsed time per iteration (s): 0.09 | learning rate: 4.985E-05 | global batch size: 256 | lm loss: 4.514342E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2893.701 | TFLOPs: 10.76 | 7: iteration 127650/ 173500 | consumed samples: 32678400 | consumed tokens: 66925363200 | elapsed time per iteration (s): 0.08 | learning rate: 4.984E-05 | global batch size: 256 | lm loss: 4.506311E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.350 | TFLOPs: 11.99 | 7: iteration 127660/ 173500 | consumed samples: 32680960 | consumed tokens: 66930606080 | elapsed time per iteration (s): 0.09 | learning rate: 4.982E-05 | global batch size: 256 | lm loss: 4.509879E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.262 | TFLOPs: 11.14 | 7: iteration 127670/ 173500 | consumed samples: 32683520 | consumed tokens: 66935848960 | elapsed time per iteration (s): 0.08 | learning rate: 4.981E-05 | global batch size: 256 | lm loss: 4.520535E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.135 | TFLOPs: 11.98 | 7: iteration 127680/ 173500 | consumed samples: 32686080 | consumed tokens: 66941091840 | elapsed time per iteration (s): 0.09 | learning rate: 4.980E-05 | global batch size: 256 | lm loss: 4.505291E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.808 | TFLOPs: 11.01 | 7: iteration 127690/ 173500 | consumed samples: 32688640 | consumed tokens: 66946334720 | elapsed time per iteration (s): 0.09 | learning rate: 4.979E-05 | global batch size: 256 | lm loss: 4.513239E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2769.708 | TFLOPs: 10.30 | 7: iteration 127700/ 173500 | consumed samples: 32691200 | consumed tokens: 66951577600 | elapsed time per iteration (s): 0.08 | learning rate: 4.977E-05 | global batch size: 256 | lm loss: 4.505642E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.918 | TFLOPs: 12.01 | 7: iteration 127710/ 173500 | consumed samples: 32693760 | consumed tokens: 66956820480 | elapsed time per iteration (s): 0.08 | learning rate: 4.976E-05 | global batch size: 256 | lm loss: 4.510046E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.006 | TFLOPs: 12.03 | 7: iteration 127720/ 173500 | consumed samples: 32696320 | consumed tokens: 66962063360 | elapsed time per iteration (s): 0.09 | learning rate: 4.975E-05 | global batch size: 256 | lm loss: 4.513239E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.240 | TFLOPs: 10.59 | 7: iteration 127730/ 173500 | consumed samples: 32698880 | consumed tokens: 66967306240 | elapsed time per iteration (s): 0.09 | learning rate: 4.974E-05 | global batch size: 256 | lm loss: 4.500866E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.257 | TFLOPs: 10.64 | 7: iteration 127740/ 173500 | consumed samples: 32701440 | consumed tokens: 66972549120 | elapsed time per iteration (s): 0.09 | learning rate: 4.972E-05 | global batch size: 256 | lm loss: 4.510642E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2930.690 | TFLOPs: 10.90 | 7: iteration 127750/ 173500 | consumed samples: 32704000 | consumed tokens: 66977792000 | elapsed time per iteration (s): 0.09 | learning rate: 4.971E-05 | global batch size: 256 | lm loss: 4.520613E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2721.926 | TFLOPs: 10.12 | 7: iteration 127760/ 173500 | consumed samples: 32706560 | consumed tokens: 66983034880 | elapsed time per iteration (s): 0.11 | learning rate: 4.970E-05 | global batch size: 256 | lm loss: 4.509316E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2412.778 | TFLOPs: 8.97 | 7: iteration 127770/ 173500 | consumed samples: 32709120 | consumed tokens: 66988277760 | elapsed time per iteration (s): 0.08 | learning rate: 4.969E-05 | global batch size: 256 | lm loss: 4.516447E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.083 | TFLOPs: 11.87 | 7: iteration 127780/ 173500 | consumed samples: 32711680 | consumed tokens: 66993520640 | elapsed time per iteration (s): 0.08 | learning rate: 4.968E-05 | global batch size: 256 | lm loss: 4.521091E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.366 | TFLOPs: 11.86 | 7: iteration 127790/ 173500 | consumed samples: 32714240 | consumed tokens: 66998763520 | elapsed time per iteration (s): 0.11 | learning rate: 4.966E-05 | global batch size: 256 | lm loss: 4.506821E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.976 | TFLOPs: 8.41 | 7: iteration 127800/ 173500 | consumed samples: 32716800 | consumed tokens: 67004006400 | elapsed time per iteration (s): 0.08 | learning rate: 4.965E-05 | global batch size: 256 | lm loss: 4.508258E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.399 | TFLOPs: 11.39 | 7: iteration 127810/ 173500 | consumed samples: 32719360 | consumed tokens: 67009249280 | elapsed time per iteration (s): 0.10 | learning rate: 4.964E-05 | global batch size: 256 | lm loss: 4.508149E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2629.971 | TFLOPs: 9.78 | 7: iteration 127820/ 173500 | consumed samples: 32721920 | consumed tokens: 67014492160 | elapsed time per iteration (s): 0.12 | learning rate: 4.963E-05 | global batch size: 256 | lm loss: 4.522599E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2171.695 | TFLOPs: 8.08 | 7: iteration 127830/ 173500 | consumed samples: 32724480 | consumed tokens: 67019735040 | elapsed time per iteration (s): 0.09 | learning rate: 4.962E-05 | global batch size: 256 | lm loss: 4.508028E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.509 | TFLOPs: 11.19 | 7: iteration 127840/ 173500 | consumed samples: 32727040 | consumed tokens: 67024977920 | elapsed time per iteration (s): 0.11 | learning rate: 4.960E-05 | global batch size: 256 | lm loss: 4.513690E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2425.915 | TFLOPs: 9.02 | 7: iteration 127850/ 173500 | consumed samples: 32729600 | consumed tokens: 67030220800 | elapsed time per iteration (s): 0.08 | learning rate: 4.959E-05 | global batch size: 256 | lm loss: 4.523649E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.738 | TFLOPs: 11.81 | 7: iteration 127860/ 173500 | consumed samples: 32732160 | consumed tokens: 67035463680 | elapsed time per iteration (s): 0.08 | learning rate: 4.958E-05 | global batch size: 256 | lm loss: 4.511576E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.936 | TFLOPs: 11.89 | 7: iteration 127870/ 173500 | consumed samples: 32734720 | consumed tokens: 67040706560 | elapsed time per iteration (s): 0.10 | learning rate: 4.957E-05 | global batch size: 256 | lm loss: 4.505116E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.854 | TFLOPs: 10.00 | 7: iteration 127880/ 173500 | consumed samples: 32737280 | consumed tokens: 67045949440 | elapsed time per iteration (s): 0.09 | learning rate: 4.955E-05 | global batch size: 256 | lm loss: 4.504170E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.411 | TFLOPs: 10.33 | 7: iteration 127890/ 173500 | consumed samples: 32739840 | consumed tokens: 67051192320 | elapsed time per iteration (s): 0.11 | learning rate: 4.954E-05 | global batch size: 256 | lm loss: 4.499075E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2297.742 | TFLOPs: 8.55 | 7: iteration 127900/ 173500 | consumed samples: 32742400 | consumed tokens: 67056435200 | elapsed time per iteration (s): 0.13 | learning rate: 4.953E-05 | global batch size: 256 | lm loss: 4.509273E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1991.786 | TFLOPs: 7.41 | 7: iteration 127910/ 173500 | consumed samples: 32744960 | consumed tokens: 67061678080 | elapsed time per iteration (s): 0.11 | learning rate: 4.952E-05 | global batch size: 256 | lm loss: 4.514728E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2383.937 | TFLOPs: 8.87 | 7: iteration 127920/ 173500 | consumed samples: 32747520 | consumed tokens: 67066920960 | elapsed time per iteration (s): 0.12 | learning rate: 4.951E-05 | global batch size: 256 | lm loss: 4.496578E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.870 | TFLOPs: 8.18 | 7: iteration 127930/ 173500 | consumed samples: 32750080 | consumed tokens: 67072163840 | elapsed time per iteration (s): 0.11 | learning rate: 4.949E-05 | global batch size: 256 | lm loss: 4.518945E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2305.219 | TFLOPs: 8.57 | 7: iteration 127940/ 173500 | consumed samples: 32752640 | consumed tokens: 67077406720 | elapsed time per iteration (s): 0.12 | learning rate: 4.948E-05 | global batch size: 256 | lm loss: 4.504527E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2210.585 | TFLOPs: 8.22 | 7: iteration 127950/ 173500 | consumed samples: 32755200 | consumed tokens: 67082649600 | elapsed time per iteration (s): 0.11 | learning rate: 4.947E-05 | global batch size: 256 | lm loss: 4.513824E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.625 | TFLOPs: 8.91 | 7: iteration 127960/ 173500 | consumed samples: 32757760 | consumed tokens: 67087892480 | elapsed time per iteration (s): 0.09 | learning rate: 4.946E-05 | global batch size: 256 | lm loss: 4.504196E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.835 | TFLOPs: 11.02 | 7: iteration 127970/ 173500 | consumed samples: 32760320 | consumed tokens: 67093135360 | elapsed time per iteration (s): 0.08 | learning rate: 4.944E-05 | global batch size: 256 | lm loss: 4.508700E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.783 | TFLOPs: 11.89 | 7: iteration 127980/ 173500 | consumed samples: 32762880 | consumed tokens: 67098378240 | elapsed time per iteration (s): 0.08 | learning rate: 4.943E-05 | global batch size: 256 | lm loss: 4.502959E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.637 | TFLOPs: 11.90 | 7: iteration 127990/ 173500 | consumed samples: 32765440 | consumed tokens: 67103621120 | elapsed time per iteration (s): 0.08 | learning rate: 4.942E-05 | global batch size: 256 | lm loss: 4.499084E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.299 | TFLOPs: 11.73 | 0: [2023-03-17 03:21:56,169] [INFO] [logging.py:68:log_dist] [Rank 0] step=128000, skipped=0, lr=[4.94077976375529e-05, 4.94077976375529e-05, 4.94077976375529e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 128000/ 173500 | consumed samples: 32768000 | consumed tokens: 67108864000 | elapsed time per iteration (s): 0.10 | learning rate: 4.941E-05 | global batch size: 256 | lm loss: 4.516957E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2439.352 | TFLOPs: 9.07 | 0: steps: 128000 loss: 4.5070 iter time (s): 0.087 samples/sec: 2954.315 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 128000 | lm loss value: 4.392117E+00 | lm loss PPL: 8.081128E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 128000 to checkpoints_14m91b100m 0: [2023-03-17 03:21:56,239] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step128000 is begin to save! 0: [2023-03-17 03:21:56,242] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:21:56,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:21:56,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:21:56,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:21:56,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:21:56,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:21:56,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:21:56,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:21:56,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:21:56,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:21:56,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:21:56,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:21:56,281] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step128000/mp_rank_00_model_states.pt 0: [2023-03-17 03:21:56,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:21:56,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:21:56,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:21:56,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:21:56,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:21:56,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,306] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:21:56,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,308] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,310] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,310] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 4: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 03:21:56,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 0: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 1: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 6: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 2: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 7: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 03:21:56,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 5: [2023-03-17 03:21:56,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 3: [2023-03-17 03:21:56,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:21:56,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step128000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:21:56,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step128000 is ready now! 0: successfully saved checkpoint at iteration 128000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.13 7: iteration 128010/ 173500 | consumed samples: 32770560 | consumed tokens: 67114106880 | elapsed time per iteration (s): 0.10 | learning rate: 4.940E-05 | global batch size: 256 | lm loss: 4.521456E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.775 | TFLOPs: 9.47 | 7: iteration 128020/ 173500 | consumed samples: 32773120 | consumed tokens: 67119349760 | elapsed time per iteration (s): 0.08 | learning rate: 4.938E-05 | global batch size: 256 | lm loss: 4.512292E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.557 | TFLOPs: 11.67 | 7: iteration 128030/ 173500 | consumed samples: 32775680 | consumed tokens: 67124592640 | elapsed time per iteration (s): 0.09 | learning rate: 4.937E-05 | global batch size: 256 | lm loss: 4.522562E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2718.442 | TFLOPs: 10.11 | 7: iteration 128040/ 173500 | consumed samples: 32778240 | consumed tokens: 67129835520 | elapsed time per iteration (s): 0.09 | learning rate: 4.936E-05 | global batch size: 256 | lm loss: 4.514691E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.416 | TFLOPs: 11.01 | 7: iteration 128050/ 173500 | consumed samples: 32780800 | consumed tokens: 67135078400 | elapsed time per iteration (s): 0.12 | learning rate: 4.935E-05 | global batch size: 256 | lm loss: 4.510144E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.890 | TFLOPs: 7.71 | 7: iteration 128060/ 173500 | consumed samples: 32783360 | consumed tokens: 67140321280 | elapsed time per iteration (s): 0.11 | learning rate: 4.933E-05 | global batch size: 256 | lm loss: 4.515587E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2242.786 | TFLOPs: 8.34 | 7: iteration 128070/ 173500 | consumed samples: 32785920 | consumed tokens: 67145564160 | elapsed time per iteration (s): 0.08 | learning rate: 4.932E-05 | global batch size: 256 | lm loss: 4.498960E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.508 | TFLOPs: 11.80 | 7: iteration 128080/ 173500 | consumed samples: 32788480 | consumed tokens: 67150807040 | elapsed time per iteration (s): 0.08 | learning rate: 4.931E-05 | global batch size: 256 | lm loss: 4.506277E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.462 | TFLOPs: 11.94 | 7: iteration 128090/ 173500 | consumed samples: 32791040 | consumed tokens: 67156049920 | elapsed time per iteration (s): 0.08 | learning rate: 4.930E-05 | global batch size: 256 | lm loss: 4.510308E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.708 | TFLOPs: 12.02 | 7: iteration 128100/ 173500 | consumed samples: 32793600 | consumed tokens: 67161292800 | elapsed time per iteration (s): 0.09 | learning rate: 4.929E-05 | global batch size: 256 | lm loss: 4.503991E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.278 | TFLOPs: 10.84 | 7: iteration 128110/ 173500 | consumed samples: 32796160 | consumed tokens: 67166535680 | elapsed time per iteration (s): 0.11 | learning rate: 4.927E-05 | global batch size: 256 | lm loss: 4.518352E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.771 | TFLOPs: 8.62 | 7: iteration 128120/ 173500 | consumed samples: 32798720 | consumed tokens: 67171778560 | elapsed time per iteration (s): 0.10 | learning rate: 4.926E-05 | global batch size: 256 | lm loss: 4.512617E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2586.829 | TFLOPs: 9.62 | 7: iteration 128130/ 173500 | consumed samples: 32801280 | consumed tokens: 67177021440 | elapsed time per iteration (s): 0.09 | learning rate: 4.925E-05 | global batch size: 256 | lm loss: 4.510975E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2709.605 | TFLOPs: 10.08 | 7: iteration 128140/ 173500 | consumed samples: 32803840 | consumed tokens: 67182264320 | elapsed time per iteration (s): 0.11 | learning rate: 4.924E-05 | global batch size: 256 | lm loss: 4.512837E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.140 | TFLOPs: 8.95 | 7: iteration 128150/ 173500 | consumed samples: 32806400 | consumed tokens: 67187507200 | elapsed time per iteration (s): 0.08 | learning rate: 4.923E-05 | global batch size: 256 | lm loss: 4.509745E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.078 | TFLOPs: 11.33 | 7: iteration 128160/ 173500 | consumed samples: 32808960 | consumed tokens: 67192750080 | elapsed time per iteration (s): 0.10 | learning rate: 4.921E-05 | global batch size: 256 | lm loss: 4.522281E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.623 | TFLOPs: 9.17 | 7: iteration 128170/ 173500 | consumed samples: 32811520 | consumed tokens: 67197992960 | elapsed time per iteration (s): 0.08 | learning rate: 4.920E-05 | global batch size: 256 | lm loss: 4.507606E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.292 | TFLOPs: 11.81 | 7: iteration 128180/ 173500 | consumed samples: 32814080 | consumed tokens: 67203235840 | elapsed time per iteration (s): 0.10 | learning rate: 4.919E-05 | global batch size: 256 | lm loss: 4.504715E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.547 | TFLOPs: 9.30 | 7: iteration 128190/ 173500 | consumed samples: 32816640 | consumed tokens: 67208478720 | elapsed time per iteration (s): 0.09 | learning rate: 4.918E-05 | global batch size: 256 | lm loss: 4.498977E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2827.039 | TFLOPs: 10.52 | 7: iteration 128200/ 173500 | consumed samples: 32819200 | consumed tokens: 67213721600 | elapsed time per iteration (s): 0.11 | learning rate: 4.916E-05 | global batch size: 256 | lm loss: 4.520062E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2314.377 | TFLOPs: 8.61 | 7: iteration 128210/ 173500 | consumed samples: 32821760 | consumed tokens: 67218964480 | elapsed time per iteration (s): 0.09 | learning rate: 4.915E-05 | global batch size: 256 | lm loss: 4.510365E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.212 | TFLOPs: 10.49 | 7: iteration 128220/ 173500 | consumed samples: 32824320 | consumed tokens: 67224207360 | elapsed time per iteration (s): 0.09 | learning rate: 4.914E-05 | global batch size: 256 | lm loss: 4.509266E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2839.471 | TFLOPs: 10.56 | 7: iteration 128230/ 173500 | consumed samples: 32826880 | consumed tokens: 67229450240 | elapsed time per iteration (s): 0.09 | learning rate: 4.913E-05 | global batch size: 256 | lm loss: 4.504337E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.076 | TFLOPs: 10.84 | 7: iteration 128240/ 173500 | consumed samples: 32829440 | consumed tokens: 67234693120 | elapsed time per iteration (s): 0.10 | learning rate: 4.912E-05 | global batch size: 256 | lm loss: 4.505044E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2589.296 | TFLOPs: 9.63 | 7: iteration 128250/ 173500 | consumed samples: 32832000 | consumed tokens: 67239936000 | elapsed time per iteration (s): 0.09 | learning rate: 4.910E-05 | global batch size: 256 | lm loss: 4.509103E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2713.156 | TFLOPs: 10.09 | 7: iteration 128260/ 173500 | consumed samples: 32834560 | consumed tokens: 67245178880 | elapsed time per iteration (s): 0.08 | learning rate: 4.909E-05 | global batch size: 256 | lm loss: 4.511510E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.865 | TFLOPs: 11.88 | 7: iteration 128270/ 173500 | consumed samples: 32837120 | consumed tokens: 67250421760 | elapsed time per iteration (s): 0.08 | learning rate: 4.908E-05 | global batch size: 256 | lm loss: 4.515469E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.496 | TFLOPs: 11.40 | 7: iteration 128280/ 173500 | consumed samples: 32839680 | consumed tokens: 67255664640 | elapsed time per iteration (s): 0.09 | learning rate: 4.907E-05 | global batch size: 256 | lm loss: 4.512131E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.187 | TFLOPs: 10.66 | 7: iteration 128290/ 173500 | consumed samples: 32842240 | consumed tokens: 67260907520 | elapsed time per iteration (s): 0.09 | learning rate: 4.906E-05 | global batch size: 256 | lm loss: 4.498683E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.329 | TFLOPs: 10.43 | 7: iteration 128300/ 173500 | consumed samples: 32844800 | consumed tokens: 67266150400 | elapsed time per iteration (s): 0.10 | learning rate: 4.904E-05 | global batch size: 256 | lm loss: 4.502847E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.423 | TFLOPs: 9.16 | 7: iteration 128310/ 173500 | consumed samples: 32847360 | consumed tokens: 67271393280 | elapsed time per iteration (s): 0.11 | learning rate: 4.903E-05 | global batch size: 256 | lm loss: 4.522799E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.283 | TFLOPs: 8.91 | 7: iteration 128320/ 173500 | consumed samples: 32849920 | consumed tokens: 67276636160 | elapsed time per iteration (s): 0.12 | learning rate: 4.902E-05 | global batch size: 256 | lm loss: 4.516886E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2175.409 | TFLOPs: 8.09 | 7: iteration 128330/ 173500 | consumed samples: 32852480 | consumed tokens: 67281879040 | elapsed time per iteration (s): 0.10 | learning rate: 4.901E-05 | global batch size: 256 | lm loss: 4.510516E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.204 | TFLOPs: 9.72 | 7: iteration 128340/ 173500 | consumed samples: 32855040 | consumed tokens: 67287121920 | elapsed time per iteration (s): 0.08 | learning rate: 4.900E-05 | global batch size: 256 | lm loss: 4.512771E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.078 | TFLOPs: 11.62 | 7: iteration 128350/ 173500 | consumed samples: 32857600 | consumed tokens: 67292364800 | elapsed time per iteration (s): 0.08 | learning rate: 4.898E-05 | global batch size: 256 | lm loss: 4.522317E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.758 | TFLOPs: 11.71 | 7: iteration 128360/ 173500 | consumed samples: 32860160 | consumed tokens: 67297607680 | elapsed time per iteration (s): 0.10 | learning rate: 4.897E-05 | global batch size: 256 | lm loss: 4.504058E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.061 | TFLOPs: 10.02 | 7: iteration 128370/ 173500 | consumed samples: 32862720 | consumed tokens: 67302850560 | elapsed time per iteration (s): 0.08 | learning rate: 4.896E-05 | global batch size: 256 | lm loss: 4.502343E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.399 | TFLOPs: 11.74 | 7: iteration 128380/ 173500 | consumed samples: 32865280 | consumed tokens: 67308093440 | elapsed time per iteration (s): 0.08 | learning rate: 4.895E-05 | global batch size: 256 | lm loss: 4.514008E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.072 | TFLOPs: 11.87 | 7: iteration 128390/ 173500 | consumed samples: 32867840 | consumed tokens: 67313336320 | elapsed time per iteration (s): 0.08 | learning rate: 4.893E-05 | global batch size: 256 | lm loss: 4.514874E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.230 | TFLOPs: 11.86 | 7: iteration 128400/ 173500 | consumed samples: 32870400 | consumed tokens: 67318579200 | elapsed time per iteration (s): 0.12 | learning rate: 4.892E-05 | global batch size: 256 | lm loss: 4.528047E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.694 | TFLOPs: 8.18 | 7: iteration 128410/ 173500 | consumed samples: 32872960 | consumed tokens: 67323822080 | elapsed time per iteration (s): 0.10 | learning rate: 4.891E-05 | global batch size: 256 | lm loss: 4.506871E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.106 | TFLOPs: 9.30 | 7: iteration 128420/ 173500 | consumed samples: 32875520 | consumed tokens: 67329064960 | elapsed time per iteration (s): 0.08 | learning rate: 4.890E-05 | global batch size: 256 | lm loss: 4.510578E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.004 | TFLOPs: 11.84 | 7: iteration 128430/ 173500 | consumed samples: 32878080 | consumed tokens: 67334307840 | elapsed time per iteration (s): 0.09 | learning rate: 4.889E-05 | global batch size: 256 | lm loss: 4.504595E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.080 | TFLOPs: 11.16 | 7: iteration 128440/ 173500 | consumed samples: 32880640 | consumed tokens: 67339550720 | elapsed time per iteration (s): 0.08 | learning rate: 4.887E-05 | global batch size: 256 | lm loss: 4.499687E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.086 | TFLOPs: 11.88 | 7: iteration 128450/ 173500 | consumed samples: 32883200 | consumed tokens: 67344793600 | elapsed time per iteration (s): 0.09 | learning rate: 4.886E-05 | global batch size: 256 | lm loss: 4.495927E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.929 | TFLOPs: 11.16 | 7: iteration 128460/ 173500 | consumed samples: 32885760 | consumed tokens: 67350036480 | elapsed time per iteration (s): 0.11 | learning rate: 4.885E-05 | global batch size: 256 | lm loss: 4.501791E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2260.666 | TFLOPs: 8.41 | 7: iteration 128470/ 173500 | consumed samples: 32888320 | consumed tokens: 67355279360 | elapsed time per iteration (s): 0.09 | learning rate: 4.884E-05 | global batch size: 256 | lm loss: 4.523529E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.248 | TFLOPs: 10.68 | 7: iteration 128480/ 173500 | consumed samples: 32890880 | consumed tokens: 67360522240 | elapsed time per iteration (s): 0.09 | learning rate: 4.883E-05 | global batch size: 256 | lm loss: 4.506200E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2817.877 | TFLOPs: 10.48 | 7: iteration 128490/ 173500 | consumed samples: 32893440 | consumed tokens: 67365765120 | elapsed time per iteration (s): 0.08 | learning rate: 4.881E-05 | global batch size: 256 | lm loss: 4.514902E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.894 | TFLOPs: 11.89 | 7: iteration 128500/ 173500 | consumed samples: 32896000 | consumed tokens: 67371008000 | elapsed time per iteration (s): 0.24 | learning rate: 4.880E-05 | global batch size: 256 | lm loss: 4.497590E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1053.224 | TFLOPs: 3.92 | 7: iteration 128510/ 173500 | consumed samples: 32898560 | consumed tokens: 67376250880 | elapsed time per iteration (s): 0.09 | learning rate: 4.879E-05 | global batch size: 256 | lm loss: 4.526084E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.172 | TFLOPs: 10.41 | 7: iteration 128520/ 173500 | consumed samples: 32901120 | consumed tokens: 67381493760 | elapsed time per iteration (s): 0.08 | learning rate: 4.878E-05 | global batch size: 256 | lm loss: 4.506438E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.892 | TFLOPs: 11.88 | 7: iteration 128530/ 173500 | consumed samples: 32903680 | consumed tokens: 67386736640 | elapsed time per iteration (s): 0.08 | learning rate: 4.877E-05 | global batch size: 256 | lm loss: 4.500715E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.239 | TFLOPs: 11.89 | 7: iteration 128540/ 173500 | consumed samples: 32906240 | consumed tokens: 67391979520 | elapsed time per iteration (s): 0.08 | learning rate: 4.875E-05 | global batch size: 256 | lm loss: 4.514965E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.589 | TFLOPs: 11.76 | 7: iteration 128550/ 173500 | consumed samples: 32908800 | consumed tokens: 67397222400 | elapsed time per iteration (s): 0.10 | learning rate: 4.874E-05 | global batch size: 256 | lm loss: 4.504489E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2478.243 | TFLOPs: 9.22 | 7: iteration 128560/ 173500 | consumed samples: 32911360 | consumed tokens: 67402465280 | elapsed time per iteration (s): 0.11 | learning rate: 4.873E-05 | global batch size: 256 | lm loss: 4.515718E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2246.484 | TFLOPs: 8.36 | 7: iteration 128570/ 173500 | consumed samples: 32913920 | consumed tokens: 67407708160 | elapsed time per iteration (s): 0.10 | learning rate: 4.872E-05 | global batch size: 256 | lm loss: 4.517354E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.838 | TFLOPs: 9.41 | 7: iteration 128580/ 173500 | consumed samples: 32916480 | consumed tokens: 67412951040 | elapsed time per iteration (s): 0.09 | learning rate: 4.871E-05 | global batch size: 256 | lm loss: 4.511002E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2751.602 | TFLOPs: 10.23 | 7: iteration 128590/ 173500 | consumed samples: 32919040 | consumed tokens: 67418193920 | elapsed time per iteration (s): 0.08 | learning rate: 4.869E-05 | global batch size: 256 | lm loss: 4.502945E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.881 | TFLOPs: 11.89 | 7: iteration 128600/ 173500 | consumed samples: 32921600 | consumed tokens: 67423436800 | elapsed time per iteration (s): 0.08 | learning rate: 4.868E-05 | global batch size: 256 | lm loss: 4.519071E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.350 | TFLOPs: 11.37 | 7: iteration 128610/ 173500 | consumed samples: 32924160 | consumed tokens: 67428679680 | elapsed time per iteration (s): 0.08 | learning rate: 4.867E-05 | global batch size: 256 | lm loss: 4.512520E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.621 | TFLOPs: 11.28 | 7: iteration 128620/ 173500 | consumed samples: 32926720 | consumed tokens: 67433922560 | elapsed time per iteration (s): 0.09 | learning rate: 4.866E-05 | global batch size: 256 | lm loss: 4.507852E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2980.476 | TFLOPs: 11.09 | 7: iteration 128630/ 173500 | consumed samples: 32929280 | consumed tokens: 67439165440 | elapsed time per iteration (s): 0.09 | learning rate: 4.865E-05 | global batch size: 256 | lm loss: 4.515035E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.480 | TFLOPs: 11.06 | 7: iteration 128640/ 173500 | consumed samples: 32931840 | consumed tokens: 67444408320 | elapsed time per iteration (s): 0.09 | learning rate: 4.863E-05 | global batch size: 256 | lm loss: 4.503188E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2843.112 | TFLOPs: 10.58 | 7: iteration 128650/ 173500 | consumed samples: 32934400 | consumed tokens: 67449651200 | elapsed time per iteration (s): 0.10 | learning rate: 4.862E-05 | global batch size: 256 | lm loss: 4.506306E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2505.092 | TFLOPs: 9.32 | 7: iteration 128660/ 173500 | consumed samples: 32936960 | consumed tokens: 67454894080 | elapsed time per iteration (s): 0.10 | learning rate: 4.861E-05 | global batch size: 256 | lm loss: 4.507708E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.097 | TFLOPs: 9.12 | 7: iteration 128670/ 173500 | consumed samples: 32939520 | consumed tokens: 67460136960 | elapsed time per iteration (s): 0.10 | learning rate: 4.860E-05 | global batch size: 256 | lm loss: 4.491596E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2586.675 | TFLOPs: 9.62 | 7: iteration 128680/ 173500 | consumed samples: 32942080 | consumed tokens: 67465379840 | elapsed time per iteration (s): 0.11 | learning rate: 4.858E-05 | global batch size: 256 | lm loss: 4.511747E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.272 | TFLOPs: 8.50 | 7: iteration 128690/ 173500 | consumed samples: 32944640 | consumed tokens: 67470622720 | elapsed time per iteration (s): 0.11 | learning rate: 4.857E-05 | global batch size: 256 | lm loss: 4.494456E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.270 | TFLOPs: 8.99 | 7: iteration 128700/ 173500 | consumed samples: 32947200 | consumed tokens: 67475865600 | elapsed time per iteration (s): 0.12 | learning rate: 4.856E-05 | global batch size: 256 | lm loss: 4.508052E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2151.222 | TFLOPs: 8.00 | 7: iteration 128710/ 173500 | consumed samples: 32949760 | consumed tokens: 67481108480 | elapsed time per iteration (s): 0.12 | learning rate: 4.855E-05 | global batch size: 256 | lm loss: 4.497657E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2181.295 | TFLOPs: 8.11 | 7: iteration 128720/ 173500 | consumed samples: 32952320 | consumed tokens: 67486351360 | elapsed time per iteration (s): 0.11 | learning rate: 4.854E-05 | global batch size: 256 | lm loss: 4.518717E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.219 | TFLOPs: 8.75 | 7: iteration 128730/ 173500 | consumed samples: 32954880 | consumed tokens: 67491594240 | elapsed time per iteration (s): 0.11 | learning rate: 4.852E-05 | global batch size: 256 | lm loss: 4.511615E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2377.908 | TFLOPs: 8.84 | 7: iteration 128740/ 173500 | consumed samples: 32957440 | consumed tokens: 67496837120 | elapsed time per iteration (s): 0.11 | learning rate: 4.851E-05 | global batch size: 256 | lm loss: 4.510508E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.181 | TFLOPs: 8.80 | 7: iteration 128750/ 173500 | consumed samples: 32960000 | consumed tokens: 67502080000 | elapsed time per iteration (s): 0.11 | learning rate: 4.850E-05 | global batch size: 256 | lm loss: 4.521656E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2384.188 | TFLOPs: 8.87 | 7: iteration 128760/ 173500 | consumed samples: 32962560 | consumed tokens: 67507322880 | elapsed time per iteration (s): 0.11 | learning rate: 4.849E-05 | global batch size: 256 | lm loss: 4.514016E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2380.052 | TFLOPs: 8.85 | 7: iteration 128770/ 173500 | consumed samples: 32965120 | consumed tokens: 67512565760 | elapsed time per iteration (s): 0.11 | learning rate: 4.848E-05 | global batch size: 256 | lm loss: 4.524496E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.496 | TFLOPs: 8.86 | 7: iteration 128780/ 173500 | consumed samples: 32967680 | consumed tokens: 67517808640 | elapsed time per iteration (s): 0.11 | learning rate: 4.846E-05 | global batch size: 256 | lm loss: 4.512025E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2337.597 | TFLOPs: 8.69 | 7: iteration 128790/ 173500 | consumed samples: 32970240 | consumed tokens: 67523051520 | elapsed time per iteration (s): 0.12 | learning rate: 4.845E-05 | global batch size: 256 | lm loss: 4.510209E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2208.499 | TFLOPs: 8.21 | 7: iteration 128800/ 173500 | consumed samples: 32972800 | consumed tokens: 67528294400 | elapsed time per iteration (s): 0.11 | learning rate: 4.844E-05 | global batch size: 256 | lm loss: 4.515699E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2252.982 | TFLOPs: 8.38 | 7: iteration 128810/ 173500 | consumed samples: 32975360 | consumed tokens: 67533537280 | elapsed time per iteration (s): 0.11 | learning rate: 4.843E-05 | global batch size: 256 | lm loss: 4.505297E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2389.957 | TFLOPs: 8.89 | 7: iteration 128820/ 173500 | consumed samples: 32977920 | consumed tokens: 67538780160 | elapsed time per iteration (s): 0.11 | learning rate: 4.842E-05 | global batch size: 256 | lm loss: 4.514615E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.541 | TFLOPs: 8.82 | 7: iteration 128830/ 173500 | consumed samples: 32980480 | consumed tokens: 67544023040 | elapsed time per iteration (s): 0.11 | learning rate: 4.840E-05 | global batch size: 256 | lm loss: 4.519049E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2277.254 | TFLOPs: 8.47 | 7: iteration 128840/ 173500 | consumed samples: 32983040 | consumed tokens: 67549265920 | elapsed time per iteration (s): 0.12 | learning rate: 4.839E-05 | global batch size: 256 | lm loss: 4.502546E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2141.931 | TFLOPs: 7.97 | 7: iteration 128850/ 173500 | consumed samples: 32985600 | consumed tokens: 67554508800 | elapsed time per iteration (s): 0.11 | learning rate: 4.838E-05 | global batch size: 256 | lm loss: 4.514236E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2404.710 | TFLOPs: 8.94 | 7: iteration 128860/ 173500 | consumed samples: 32988160 | consumed tokens: 67559751680 | elapsed time per iteration (s): 0.12 | learning rate: 4.837E-05 | global batch size: 256 | lm loss: 4.508649E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2182.432 | TFLOPs: 8.12 | 7: iteration 128870/ 173500 | consumed samples: 32990720 | consumed tokens: 67564994560 | elapsed time per iteration (s): 0.11 | learning rate: 4.836E-05 | global batch size: 256 | lm loss: 4.508168E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2337.519 | TFLOPs: 8.69 | 7: iteration 128880/ 173500 | consumed samples: 32993280 | consumed tokens: 67570237440 | elapsed time per iteration (s): 0.12 | learning rate: 4.834E-05 | global batch size: 256 | lm loss: 4.500671E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2095.181 | TFLOPs: 7.79 | 7: iteration 128890/ 173500 | consumed samples: 32995840 | consumed tokens: 67575480320 | elapsed time per iteration (s): 0.11 | learning rate: 4.833E-05 | global batch size: 256 | lm loss: 4.512966E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2226.251 | TFLOPs: 8.28 | 7: iteration 128900/ 173500 | consumed samples: 32998400 | consumed tokens: 67580723200 | elapsed time per iteration (s): 0.09 | learning rate: 4.832E-05 | global batch size: 256 | lm loss: 4.497825E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.323 | TFLOPs: 10.31 | 7: iteration 128910/ 173500 | consumed samples: 33000960 | consumed tokens: 67585966080 | elapsed time per iteration (s): 0.08 | learning rate: 4.831E-05 | global batch size: 256 | lm loss: 4.516660E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.860 | TFLOPs: 11.56 | 7: iteration 128920/ 173500 | consumed samples: 33003520 | consumed tokens: 67591208960 | elapsed time per iteration (s): 0.08 | learning rate: 4.830E-05 | global batch size: 256 | lm loss: 4.508698E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.393 | TFLOPs: 12.03 | 7: iteration 128930/ 173500 | consumed samples: 33006080 | consumed tokens: 67596451840 | elapsed time per iteration (s): 0.09 | learning rate: 4.828E-05 | global batch size: 256 | lm loss: 4.512854E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.390 | TFLOPs: 11.11 | 7: iteration 128940/ 173500 | consumed samples: 33008640 | consumed tokens: 67601694720 | elapsed time per iteration (s): 0.09 | learning rate: 4.827E-05 | global batch size: 256 | lm loss: 4.512571E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.316 | TFLOPs: 10.41 | 7: iteration 128950/ 173500 | consumed samples: 33011200 | consumed tokens: 67606937600 | elapsed time per iteration (s): 0.09 | learning rate: 4.826E-05 | global batch size: 256 | lm loss: 4.512591E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2831.865 | TFLOPs: 10.53 | 7: iteration 128960/ 173500 | consumed samples: 33013760 | consumed tokens: 67612180480 | elapsed time per iteration (s): 0.08 | learning rate: 4.825E-05 | global batch size: 256 | lm loss: 4.519570E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.643 | TFLOPs: 11.59 | 7: iteration 128970/ 173500 | consumed samples: 33016320 | consumed tokens: 67617423360 | elapsed time per iteration (s): 0.09 | learning rate: 4.824E-05 | global batch size: 256 | lm loss: 4.507673E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2855.947 | TFLOPs: 10.62 | 7: iteration 128980/ 173500 | consumed samples: 33018880 | consumed tokens: 67622666240 | elapsed time per iteration (s): 0.11 | learning rate: 4.822E-05 | global batch size: 256 | lm loss: 4.509099E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2248.847 | TFLOPs: 8.36 | 7: iteration 128990/ 173500 | consumed samples: 33021440 | consumed tokens: 67627909120 | elapsed time per iteration (s): 0.08 | learning rate: 4.821E-05 | global batch size: 256 | lm loss: 4.510815E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.054 | TFLOPs: 11.94 | 7: iteration 129000/ 173500 | consumed samples: 33024000 | consumed tokens: 67633152000 | elapsed time per iteration (s): 0.09 | learning rate: 4.820E-05 | global batch size: 256 | lm loss: 4.510135E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2782.808 | TFLOPs: 10.35 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 129000 | lm loss value: 4.403083E+00 | lm loss PPL: 8.170240E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 129000 to checkpoints_14m91b100m 0: [2023-03-17 03:23:34,284] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step129000 is begin to save! 0: [2023-03-17 03:23:34,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:23:34,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:23:34,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:23:34,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:23:34,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:23:34,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:23:34,325] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:23:34,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:23:34,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:23:34,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:23:34,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:23:34,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:23:34,332] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step129000/mp_rank_00_model_states.pt 0: [2023-03-17 03:23:34,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:23:34,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:23:34,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:23:34,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:23:34,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:23:34,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 6: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 5: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 1: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 3: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 4: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 7: [2023-03-17 03:23:34,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step129000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:23:34,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step129000 is ready now! 0: successfully saved checkpoint at iteration 129000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 109.86 7: iteration 129010/ 173500 | consumed samples: 33026560 | consumed tokens: 67638394880 | elapsed time per iteration (s): 0.11 | learning rate: 4.819E-05 | global batch size: 256 | lm loss: 4.487270E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2278.231 | TFLOPs: 8.47 | 7: iteration 129020/ 173500 | consumed samples: 33029120 | consumed tokens: 67643637760 | elapsed time per iteration (s): 0.11 | learning rate: 4.818E-05 | global batch size: 256 | lm loss: 4.518789E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.130 | TFLOPs: 9.02 | 7: iteration 129030/ 173500 | consumed samples: 33031680 | consumed tokens: 67648880640 | elapsed time per iteration (s): 0.12 | learning rate: 4.816E-05 | global batch size: 256 | lm loss: 4.492505E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2084.959 | TFLOPs: 7.76 | 7: iteration 129040/ 173500 | consumed samples: 33034240 | consumed tokens: 67654123520 | elapsed time per iteration (s): 0.10 | learning rate: 4.815E-05 | global batch size: 256 | lm loss: 4.523648E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2670.550 | TFLOPs: 9.93 | 7: iteration 129050/ 173500 | consumed samples: 33036800 | consumed tokens: 67659366400 | elapsed time per iteration (s): 0.09 | learning rate: 4.814E-05 | global batch size: 256 | lm loss: 4.503634E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.641 | TFLOPs: 11.02 | 7: iteration 129060/ 173500 | consumed samples: 33039360 | consumed tokens: 67664609280 | elapsed time per iteration (s): 0.08 | learning rate: 4.813E-05 | global batch size: 256 | lm loss: 4.502604E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.743 | TFLOPs: 12.02 | 7: iteration 129070/ 173500 | consumed samples: 33041920 | consumed tokens: 67669852160 | elapsed time per iteration (s): 0.08 | learning rate: 4.812E-05 | global batch size: 256 | lm loss: 4.502333E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.605 | TFLOPs: 11.27 | 7: iteration 129080/ 173500 | consumed samples: 33044480 | consumed tokens: 67675095040 | elapsed time per iteration (s): 0.09 | learning rate: 4.811E-05 | global batch size: 256 | lm loss: 4.515306E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.378 | TFLOPs: 10.45 | 7: iteration 129090/ 173500 | consumed samples: 33047040 | consumed tokens: 67680337920 | elapsed time per iteration (s): 0.09 | learning rate: 4.809E-05 | global batch size: 256 | lm loss: 4.509595E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.080 | TFLOPs: 10.76 | 7: iteration 129100/ 173500 | consumed samples: 33049600 | consumed tokens: 67685580800 | elapsed time per iteration (s): 0.08 | learning rate: 4.808E-05 | global batch size: 256 | lm loss: 4.521169E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.423 | TFLOPs: 11.65 | 7: iteration 129110/ 173500 | consumed samples: 33052160 | consumed tokens: 67690823680 | elapsed time per iteration (s): 0.09 | learning rate: 4.807E-05 | global batch size: 256 | lm loss: 4.512041E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.731 | TFLOPs: 10.47 | 7: iteration 129120/ 173500 | consumed samples: 33054720 | consumed tokens: 67696066560 | elapsed time per iteration (s): 0.09 | learning rate: 4.806E-05 | global batch size: 256 | lm loss: 4.503078E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.033 | TFLOPs: 10.41 | 7: iteration 129130/ 173500 | consumed samples: 33057280 | consumed tokens: 67701309440 | elapsed time per iteration (s): 0.11 | learning rate: 4.805E-05 | global batch size: 256 | lm loss: 4.509824E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2342.222 | TFLOPs: 8.71 | 7: iteration 129140/ 173500 | consumed samples: 33059840 | consumed tokens: 67706552320 | elapsed time per iteration (s): 0.08 | learning rate: 4.803E-05 | global batch size: 256 | lm loss: 4.506533E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.194 | TFLOPs: 11.80 | 7: iteration 129150/ 173500 | consumed samples: 33062400 | consumed tokens: 67711795200 | elapsed time per iteration (s): 0.09 | learning rate: 4.802E-05 | global batch size: 256 | lm loss: 4.509339E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.906 | TFLOPs: 11.03 | 7: iteration 129160/ 173500 | consumed samples: 33064960 | consumed tokens: 67717038080 | elapsed time per iteration (s): 0.10 | learning rate: 4.801E-05 | global batch size: 256 | lm loss: 4.505817E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2504.716 | TFLOPs: 9.32 | 7: iteration 129170/ 173500 | consumed samples: 33067520 | consumed tokens: 67722280960 | elapsed time per iteration (s): 0.08 | learning rate: 4.800E-05 | global batch size: 256 | lm loss: 4.499160E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.055 | TFLOPs: 11.66 | 7: iteration 129180/ 173500 | consumed samples: 33070080 | consumed tokens: 67727523840 | elapsed time per iteration (s): 0.08 | learning rate: 4.799E-05 | global batch size: 256 | lm loss: 4.522549E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.568 | TFLOPs: 11.89 | 7: iteration 129190/ 173500 | consumed samples: 33072640 | consumed tokens: 67732766720 | elapsed time per iteration (s): 0.08 | learning rate: 4.797E-05 | global batch size: 256 | lm loss: 4.507732E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.220 | TFLOPs: 11.89 | 7: iteration 129200/ 173500 | consumed samples: 33075200 | consumed tokens: 67738009600 | elapsed time per iteration (s): 0.08 | learning rate: 4.796E-05 | global batch size: 256 | lm loss: 4.507798E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.974 | TFLOPs: 11.78 | 7: iteration 129210/ 173500 | consumed samples: 33077760 | consumed tokens: 67743252480 | elapsed time per iteration (s): 0.08 | learning rate: 4.795E-05 | global batch size: 256 | lm loss: 4.518719E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.121 | TFLOPs: 11.85 | 7: iteration 129220/ 173500 | consumed samples: 33080320 | consumed tokens: 67748495360 | elapsed time per iteration (s): 0.08 | learning rate: 4.794E-05 | global batch size: 256 | lm loss: 4.518446E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.729 | TFLOPs: 11.88 | 7: iteration 129230/ 173500 | consumed samples: 33082880 | consumed tokens: 67753738240 | elapsed time per iteration (s): 0.08 | learning rate: 4.793E-05 | global batch size: 256 | lm loss: 4.514426E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.800 | TFLOPs: 11.90 | 7: iteration 129240/ 173500 | consumed samples: 33085440 | consumed tokens: 67758981120 | elapsed time per iteration (s): 0.08 | learning rate: 4.791E-05 | global batch size: 256 | lm loss: 4.513548E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.617 | TFLOPs: 11.92 | 7: iteration 129250/ 173500 | consumed samples: 33088000 | consumed tokens: 67764224000 | elapsed time per iteration (s): 0.08 | learning rate: 4.790E-05 | global batch size: 256 | lm loss: 4.500706E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.027 | TFLOPs: 11.50 | 7: iteration 129260/ 173500 | consumed samples: 33090560 | consumed tokens: 67769466880 | elapsed time per iteration (s): 0.08 | learning rate: 4.789E-05 | global batch size: 256 | lm loss: 4.514775E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.662 | TFLOPs: 11.49 | 7: iteration 129270/ 173500 | consumed samples: 33093120 | consumed tokens: 67774709760 | elapsed time per iteration (s): 0.08 | learning rate: 4.788E-05 | global batch size: 256 | lm loss: 4.505692E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.912 | TFLOPs: 12.00 | 7: iteration 129280/ 173500 | consumed samples: 33095680 | consumed tokens: 67779952640 | elapsed time per iteration (s): 0.09 | learning rate: 4.787E-05 | global batch size: 256 | lm loss: 4.510916E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.628 | TFLOPs: 10.99 | 7: iteration 129290/ 173500 | consumed samples: 33098240 | consumed tokens: 67785195520 | elapsed time per iteration (s): 0.08 | learning rate: 4.785E-05 | global batch size: 256 | lm loss: 4.500397E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.669 | TFLOPs: 11.45 | 7: iteration 129300/ 173500 | consumed samples: 33100800 | consumed tokens: 67790438400 | elapsed time per iteration (s): 0.08 | learning rate: 4.784E-05 | global batch size: 256 | lm loss: 4.496312E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.681 | TFLOPs: 11.56 | 7: iteration 129310/ 173500 | consumed samples: 33103360 | consumed tokens: 67795681280 | elapsed time per iteration (s): 0.08 | learning rate: 4.783E-05 | global batch size: 256 | lm loss: 4.505088E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.968 | TFLOPs: 11.95 | 7: iteration 129320/ 173500 | consumed samples: 33105920 | consumed tokens: 67800924160 | elapsed time per iteration (s): 0.09 | learning rate: 4.782E-05 | global batch size: 256 | lm loss: 4.510173E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2755.720 | TFLOPs: 10.25 | 7: iteration 129330/ 173500 | consumed samples: 33108480 | consumed tokens: 67806167040 | elapsed time per iteration (s): 0.10 | learning rate: 4.781E-05 | global batch size: 256 | lm loss: 4.512479E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2469.616 | TFLOPs: 9.19 | 7: iteration 129340/ 173500 | consumed samples: 33111040 | consumed tokens: 67811409920 | elapsed time per iteration (s): 0.10 | learning rate: 4.780E-05 | global batch size: 256 | lm loss: 4.505796E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2462.727 | TFLOPs: 9.16 | 7: iteration 129350/ 173500 | consumed samples: 33113600 | consumed tokens: 67816652800 | elapsed time per iteration (s): 0.09 | learning rate: 4.778E-05 | global batch size: 256 | lm loss: 4.517740E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.764 | TFLOPs: 10.10 | 7: iteration 129360/ 173500 | consumed samples: 33116160 | consumed tokens: 67821895680 | elapsed time per iteration (s): 0.11 | learning rate: 4.777E-05 | global batch size: 256 | lm loss: 4.496245E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.329 | TFLOPs: 8.86 | 7: iteration 129370/ 173500 | consumed samples: 33118720 | consumed tokens: 67827138560 | elapsed time per iteration (s): 0.08 | learning rate: 4.776E-05 | global batch size: 256 | lm loss: 4.520505E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.436 | TFLOPs: 11.77 | 7: iteration 129380/ 173500 | consumed samples: 33121280 | consumed tokens: 67832381440 | elapsed time per iteration (s): 0.08 | learning rate: 4.775E-05 | global batch size: 256 | lm loss: 4.511863E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.454 | TFLOPs: 11.80 | 7: iteration 129390/ 173500 | consumed samples: 33123840 | consumed tokens: 67837624320 | elapsed time per iteration (s): 0.08 | learning rate: 4.774E-05 | global batch size: 256 | lm loss: 4.501671E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.592 | TFLOPs: 11.83 | 7: iteration 129400/ 173500 | consumed samples: 33126400 | consumed tokens: 67842867200 | elapsed time per iteration (s): 0.08 | learning rate: 4.772E-05 | global batch size: 256 | lm loss: 4.514919E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.464 | TFLOPs: 11.38 | 7: iteration 129410/ 173500 | consumed samples: 33128960 | consumed tokens: 67848110080 | elapsed time per iteration (s): 0.13 | learning rate: 4.771E-05 | global batch size: 256 | lm loss: 4.500358E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2039.689 | TFLOPs: 7.59 | 7: iteration 129420/ 173500 | consumed samples: 33131520 | consumed tokens: 67853352960 | elapsed time per iteration (s): 0.12 | learning rate: 4.770E-05 | global batch size: 256 | lm loss: 4.515109E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2117.453 | TFLOPs: 7.88 | 7: iteration 129430/ 173500 | consumed samples: 33134080 | consumed tokens: 67858595840 | elapsed time per iteration (s): 0.12 | learning rate: 4.769E-05 | global batch size: 256 | lm loss: 4.507808E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2128.014 | TFLOPs: 7.92 | 7: iteration 129440/ 173500 | consumed samples: 33136640 | consumed tokens: 67863838720 | elapsed time per iteration (s): 0.10 | learning rate: 4.768E-05 | global batch size: 256 | lm loss: 4.521365E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2491.633 | TFLOPs: 9.27 | 7: iteration 129450/ 173500 | consumed samples: 33139200 | consumed tokens: 67869081600 | elapsed time per iteration (s): 0.08 | learning rate: 4.766E-05 | global batch size: 256 | lm loss: 4.514495E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.116 | TFLOPs: 11.98 | 7: iteration 129460/ 173500 | consumed samples: 33141760 | consumed tokens: 67874324480 | elapsed time per iteration (s): 0.08 | learning rate: 4.765E-05 | global batch size: 256 | lm loss: 4.509185E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.266 | TFLOPs: 11.62 | 7: iteration 129470/ 173500 | consumed samples: 33144320 | consumed tokens: 67879567360 | elapsed time per iteration (s): 0.08 | learning rate: 4.764E-05 | global batch size: 256 | lm loss: 4.525259E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.705 | TFLOPs: 11.98 | 7: iteration 129480/ 173500 | consumed samples: 33146880 | consumed tokens: 67884810240 | elapsed time per iteration (s): 0.09 | learning rate: 4.763E-05 | global batch size: 256 | lm loss: 4.511736E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.708 | TFLOPs: 10.58 | 7: iteration 129490/ 173500 | consumed samples: 33149440 | consumed tokens: 67890053120 | elapsed time per iteration (s): 0.08 | learning rate: 4.762E-05 | global batch size: 256 | lm loss: 4.511904E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.861 | TFLOPs: 11.23 | 7: iteration 129500/ 173500 | consumed samples: 33152000 | consumed tokens: 67895296000 | elapsed time per iteration (s): 0.09 | learning rate: 4.761E-05 | global batch size: 256 | lm loss: 4.516835E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.079 | TFLOPs: 10.53 | 7: iteration 129510/ 173500 | consumed samples: 33154560 | consumed tokens: 67900538880 | elapsed time per iteration (s): 0.09 | learning rate: 4.759E-05 | global batch size: 256 | lm loss: 4.499771E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.128 | TFLOPs: 10.70 | 7: iteration 129520/ 173500 | consumed samples: 33157120 | consumed tokens: 67905781760 | elapsed time per iteration (s): 0.09 | learning rate: 4.758E-05 | global batch size: 256 | lm loss: 4.497363E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.579 | TFLOPs: 11.19 | 7: iteration 129530/ 173500 | consumed samples: 33159680 | consumed tokens: 67911024640 | elapsed time per iteration (s): 0.08 | learning rate: 4.757E-05 | global batch size: 256 | lm loss: 4.512931E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.384 | TFLOPs: 11.99 | 7: iteration 129540/ 173500 | consumed samples: 33162240 | consumed tokens: 67916267520 | elapsed time per iteration (s): 0.09 | learning rate: 4.756E-05 | global batch size: 256 | lm loss: 4.524979E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2792.938 | TFLOPs: 10.39 | 7: iteration 129550/ 173500 | consumed samples: 33164800 | consumed tokens: 67921510400 | elapsed time per iteration (s): 0.08 | learning rate: 4.755E-05 | global batch size: 256 | lm loss: 4.510438E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.752 | TFLOPs: 11.59 | 7: iteration 129560/ 173500 | consumed samples: 33167360 | consumed tokens: 67926753280 | elapsed time per iteration (s): 0.08 | learning rate: 4.753E-05 | global batch size: 256 | lm loss: 4.507553E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3021.876 | TFLOPs: 11.24 | 7: iteration 129570/ 173500 | consumed samples: 33169920 | consumed tokens: 67931996160 | elapsed time per iteration (s): 0.12 | learning rate: 4.752E-05 | global batch size: 256 | lm loss: 4.509719E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2105.965 | TFLOPs: 7.83 | 7: iteration 129580/ 173500 | consumed samples: 33172480 | consumed tokens: 67937239040 | elapsed time per iteration (s): 0.08 | learning rate: 4.751E-05 | global batch size: 256 | lm loss: 4.513499E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.836 | TFLOPs: 11.82 | 7: iteration 129590/ 173500 | consumed samples: 33175040 | consumed tokens: 67942481920 | elapsed time per iteration (s): 0.08 | learning rate: 4.750E-05 | global batch size: 256 | lm loss: 4.508647E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.265 | TFLOPs: 11.92 | 7: iteration 129600/ 173500 | consumed samples: 33177600 | consumed tokens: 67947724800 | elapsed time per iteration (s): 0.08 | learning rate: 4.749E-05 | global batch size: 256 | lm loss: 4.512774E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.148 | TFLOPs: 11.46 | 7: iteration 129610/ 173500 | consumed samples: 33180160 | consumed tokens: 67952967680 | elapsed time per iteration (s): 0.11 | learning rate: 4.747E-05 | global batch size: 256 | lm loss: 4.516558E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2359.607 | TFLOPs: 8.78 | 7: iteration 129620/ 173500 | consumed samples: 33182720 | consumed tokens: 67958210560 | elapsed time per iteration (s): 0.09 | learning rate: 4.746E-05 | global batch size: 256 | lm loss: 4.509807E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.241 | TFLOPs: 10.68 | 7: iteration 129630/ 173500 | consumed samples: 33185280 | consumed tokens: 67963453440 | elapsed time per iteration (s): 0.08 | learning rate: 4.745E-05 | global batch size: 256 | lm loss: 4.509789E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.597 | TFLOPs: 11.46 | 7: iteration 129640/ 173500 | consumed samples: 33187840 | consumed tokens: 67968696320 | elapsed time per iteration (s): 0.08 | learning rate: 4.744E-05 | global batch size: 256 | lm loss: 4.508232E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.516 | TFLOPs: 11.95 | 7: iteration 129650/ 173500 | consumed samples: 33190400 | consumed tokens: 67973939200 | elapsed time per iteration (s): 0.09 | learning rate: 4.743E-05 | global batch size: 256 | lm loss: 4.513189E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2921.026 | TFLOPs: 10.86 | 7: iteration 129660/ 173500 | consumed samples: 33192960 | consumed tokens: 67979182080 | elapsed time per iteration (s): 0.08 | learning rate: 4.742E-05 | global batch size: 256 | lm loss: 4.512466E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.911 | TFLOPs: 11.97 | 7: iteration 129670/ 173500 | consumed samples: 33195520 | consumed tokens: 67984424960 | elapsed time per iteration (s): 0.08 | learning rate: 4.740E-05 | global batch size: 256 | lm loss: 4.510934E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.776 | TFLOPs: 11.96 | 7: iteration 129680/ 173500 | consumed samples: 33198080 | consumed tokens: 67989667840 | elapsed time per iteration (s): 0.08 | learning rate: 4.739E-05 | global batch size: 256 | lm loss: 4.509014E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.665 | TFLOPs: 11.99 | 7: iteration 129690/ 173500 | consumed samples: 33200640 | consumed tokens: 67994910720 | elapsed time per iteration (s): 0.08 | learning rate: 4.738E-05 | global batch size: 256 | lm loss: 4.510838E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.169 | TFLOPs: 11.71 | 7: iteration 129700/ 173500 | consumed samples: 33203200 | consumed tokens: 68000153600 | elapsed time per iteration (s): 0.08 | learning rate: 4.737E-05 | global batch size: 256 | lm loss: 4.504404E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.785 | TFLOPs: 11.97 | 7: iteration 129710/ 173500 | consumed samples: 33205760 | consumed tokens: 68005396480 | elapsed time per iteration (s): 0.08 | learning rate: 4.736E-05 | global batch size: 256 | lm loss: 4.496555E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.450 | TFLOPs: 11.97 | 7: iteration 129720/ 173500 | consumed samples: 33208320 | consumed tokens: 68010639360 | elapsed time per iteration (s): 0.08 | learning rate: 4.734E-05 | global batch size: 256 | lm loss: 4.489411E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.588 | TFLOPs: 11.94 | 7: iteration 129730/ 173500 | consumed samples: 33210880 | consumed tokens: 68015882240 | elapsed time per iteration (s): 0.09 | learning rate: 4.733E-05 | global batch size: 256 | lm loss: 4.528421E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2843.904 | TFLOPs: 10.58 | 7: iteration 129740/ 173500 | consumed samples: 33213440 | consumed tokens: 68021125120 | elapsed time per iteration (s): 0.08 | learning rate: 4.732E-05 | global batch size: 256 | lm loss: 4.507700E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.595 | TFLOPs: 11.83 | 7: iteration 129750/ 173500 | consumed samples: 33216000 | consumed tokens: 68026368000 | elapsed time per iteration (s): 0.10 | learning rate: 4.731E-05 | global batch size: 256 | lm loss: 4.495410E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2573.861 | TFLOPs: 9.57 | 7: iteration 129760/ 173500 | consumed samples: 33218560 | consumed tokens: 68031610880 | elapsed time per iteration (s): 0.09 | learning rate: 4.730E-05 | global batch size: 256 | lm loss: 4.505128E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.853 | TFLOPs: 10.13 | 7: iteration 129770/ 173500 | consumed samples: 33221120 | consumed tokens: 68036853760 | elapsed time per iteration (s): 0.08 | learning rate: 4.729E-05 | global batch size: 256 | lm loss: 4.508746E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.778 | TFLOPs: 11.59 | 7: iteration 129780/ 173500 | consumed samples: 33223680 | consumed tokens: 68042096640 | elapsed time per iteration (s): 0.11 | learning rate: 4.727E-05 | global batch size: 256 | lm loss: 4.505190E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2359.590 | TFLOPs: 8.78 | 7: iteration 129790/ 173500 | consumed samples: 33226240 | consumed tokens: 68047339520 | elapsed time per iteration (s): 0.13 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 4.495542E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1955.517 | TFLOPs: 7.27 | 7: iteration 129800/ 173500 | consumed samples: 33228800 | consumed tokens: 68052582400 | elapsed time per iteration (s): 0.10 | learning rate: 4.725E-05 | global batch size: 256 | lm loss: 4.520719E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2494.439 | TFLOPs: 9.28 | 7: iteration 129810/ 173500 | consumed samples: 33231360 | consumed tokens: 68057825280 | elapsed time per iteration (s): 0.10 | learning rate: 4.724E-05 | global batch size: 256 | lm loss: 4.507444E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.791 | TFLOPs: 9.48 | 7: iteration 129820/ 173500 | consumed samples: 33233920 | consumed tokens: 68063068160 | elapsed time per iteration (s): 0.10 | learning rate: 4.723E-05 | global batch size: 256 | lm loss: 4.509410E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.778 | TFLOPs: 9.41 | 7: iteration 129830/ 173500 | consumed samples: 33236480 | consumed tokens: 68068311040 | elapsed time per iteration (s): 0.10 | learning rate: 4.721E-05 | global batch size: 256 | lm loss: 4.504131E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.862 | TFLOPs: 9.52 | 7: iteration 129840/ 173500 | consumed samples: 33239040 | consumed tokens: 68073553920 | elapsed time per iteration (s): 0.10 | learning rate: 4.720E-05 | global batch size: 256 | lm loss: 4.498269E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.906 | TFLOPs: 9.45 | 7: iteration 129850/ 173500 | consumed samples: 33241600 | consumed tokens: 68078796800 | elapsed time per iteration (s): 0.10 | learning rate: 4.719E-05 | global batch size: 256 | lm loss: 4.513294E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.317 | TFLOPs: 9.52 | 7: iteration 129860/ 173500 | consumed samples: 33244160 | consumed tokens: 68084039680 | elapsed time per iteration (s): 0.08 | learning rate: 4.718E-05 | global batch size: 256 | lm loss: 4.514130E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.592 | TFLOPs: 11.70 | 7: iteration 129870/ 173500 | consumed samples: 33246720 | consumed tokens: 68089282560 | elapsed time per iteration (s): 0.08 | learning rate: 4.717E-05 | global batch size: 256 | lm loss: 4.515390E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.519 | TFLOPs: 11.73 | 7: iteration 129880/ 173500 | consumed samples: 33249280 | consumed tokens: 68094525440 | elapsed time per iteration (s): 0.08 | learning rate: 4.716E-05 | global batch size: 256 | lm loss: 4.502972E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.377 | TFLOPs: 11.96 | 7: iteration 129890/ 173500 | consumed samples: 33251840 | consumed tokens: 68099768320 | elapsed time per iteration (s): 0.08 | learning rate: 4.714E-05 | global batch size: 256 | lm loss: 4.508797E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.683 | TFLOPs: 12.00 | 7: iteration 129900/ 173500 | consumed samples: 33254400 | consumed tokens: 68105011200 | elapsed time per iteration (s): 0.09 | learning rate: 4.713E-05 | global batch size: 256 | lm loss: 4.512222E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.402 | TFLOPs: 10.84 | 7: iteration 129910/ 173500 | consumed samples: 33256960 | consumed tokens: 68110254080 | elapsed time per iteration (s): 0.08 | learning rate: 4.712E-05 | global batch size: 256 | lm loss: 4.510773E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.041 | TFLOPs: 11.39 | 7: iteration 129920/ 173500 | consumed samples: 33259520 | consumed tokens: 68115496960 | elapsed time per iteration (s): 0.09 | learning rate: 4.711E-05 | global batch size: 256 | lm loss: 4.511462E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2802.444 | TFLOPs: 10.42 | 7: iteration 129930/ 173500 | consumed samples: 33262080 | consumed tokens: 68120739840 | elapsed time per iteration (s): 0.08 | learning rate: 4.710E-05 | global batch size: 256 | lm loss: 4.501914E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.970 | TFLOPs: 11.74 | 7: iteration 129940/ 173500 | consumed samples: 33264640 | consumed tokens: 68125982720 | elapsed time per iteration (s): 0.09 | learning rate: 4.709E-05 | global batch size: 256 | lm loss: 4.509417E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2745.810 | TFLOPs: 10.21 | 7: iteration 129950/ 173500 | consumed samples: 33267200 | consumed tokens: 68131225600 | elapsed time per iteration (s): 0.08 | learning rate: 4.707E-05 | global batch size: 256 | lm loss: 4.504925E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.330 | TFLOPs: 11.57 | 7: iteration 129960/ 173500 | consumed samples: 33269760 | consumed tokens: 68136468480 | elapsed time per iteration (s): 0.09 | learning rate: 4.706E-05 | global batch size: 256 | lm loss: 4.510822E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.574 | TFLOPs: 11.02 | 7: iteration 129970/ 173500 | consumed samples: 33272320 | consumed tokens: 68141711360 | elapsed time per iteration (s): 0.09 | learning rate: 4.705E-05 | global batch size: 256 | lm loss: 4.513147E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.157 | TFLOPs: 10.83 | 7: iteration 129980/ 173500 | consumed samples: 33274880 | consumed tokens: 68146954240 | elapsed time per iteration (s): 0.08 | learning rate: 4.704E-05 | global batch size: 256 | lm loss: 4.515940E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.092 | TFLOPs: 11.77 | 7: iteration 129990/ 173500 | consumed samples: 33277440 | consumed tokens: 68152197120 | elapsed time per iteration (s): 0.12 | learning rate: 4.703E-05 | global batch size: 256 | lm loss: 4.508122E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2050.967 | TFLOPs: 7.63 | 0: [2023-03-17 03:25:04,720] [INFO] [logging.py:68:log_dist] [Rank 0] step=130000, skipped=0, lr=[4.7014562839599005e-05, 4.7014562839599005e-05, 4.7014562839599005e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 130000/ 173500 | consumed samples: 33280000 | consumed tokens: 68157440000 | elapsed time per iteration (s): 0.12 | learning rate: 4.701E-05 | global batch size: 256 | lm loss: 4.518382E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2148.812 | TFLOPs: 7.99 | 0: steps: 130000 loss: 4.5049 iter time (s): 0.094 samples/sec: 2737.452 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 130000 | lm loss value: 4.364403E+00 | lm loss PPL: 7.860244E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 130000 to checkpoints_14m91b100m 0: [2023-03-17 03:25:04,777] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step130000 is begin to save! 0: [2023-03-17 03:25:04,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:25:04,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:25:04,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:25:04,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:25:04,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:25:04,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:25:04,813] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:25:04,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:25:04,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:25:04,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:25:04,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:25:04,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:25:04,819] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step130000/mp_rank_00_model_states.pt 0: [2023-03-17 03:25:04,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:25:04,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:25:04,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:25:04,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:25:04,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:25:04,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:25:04,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 3: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 1: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 5: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 7: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 2: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:25:04,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:25:04,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 4: [2023-03-17 03:25:04,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:25:04,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step130000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:25:04,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step130000 is ready now! 0: successfully saved checkpoint at iteration 130000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.45 7: iteration 130010/ 173500 | consumed samples: 33282560 | consumed tokens: 68162682880 | elapsed time per iteration (s): 0.09 | learning rate: 4.700E-05 | global batch size: 256 | lm loss: 4.507701E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.317 | TFLOPs: 10.10 | 7: iteration 130020/ 173500 | consumed samples: 33285120 | consumed tokens: 68167925760 | elapsed time per iteration (s): 0.08 | learning rate: 4.699E-05 | global batch size: 256 | lm loss: 4.512260E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.948 | TFLOPs: 12.04 | 7: iteration 130030/ 173500 | consumed samples: 33287680 | consumed tokens: 68173168640 | elapsed time per iteration (s): 0.08 | learning rate: 4.698E-05 | global batch size: 256 | lm loss: 4.515107E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.005 | TFLOPs: 11.78 | 7: iteration 130040/ 173500 | consumed samples: 33290240 | consumed tokens: 68178411520 | elapsed time per iteration (s): 0.08 | learning rate: 4.697E-05 | global batch size: 256 | lm loss: 4.516470E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.907 | TFLOPs: 11.67 | 7: iteration 130050/ 173500 | consumed samples: 33292800 | consumed tokens: 68183654400 | elapsed time per iteration (s): 0.08 | learning rate: 4.696E-05 | global batch size: 256 | lm loss: 4.523483E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.568 | TFLOPs: 11.79 | 7: iteration 130060/ 173500 | consumed samples: 33295360 | consumed tokens: 68188897280 | elapsed time per iteration (s): 0.09 | learning rate: 4.694E-05 | global batch size: 256 | lm loss: 4.511246E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2859.088 | TFLOPs: 10.63 | 7: iteration 130070/ 173500 | consumed samples: 33297920 | consumed tokens: 68194140160 | elapsed time per iteration (s): 0.08 | learning rate: 4.693E-05 | global batch size: 256 | lm loss: 4.527248E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.232 | TFLOPs: 12.08 | 7: iteration 130080/ 173500 | consumed samples: 33300480 | consumed tokens: 68199383040 | elapsed time per iteration (s): 0.08 | learning rate: 4.692E-05 | global batch size: 256 | lm loss: 4.504143E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.298 | TFLOPs: 12.04 | 7: iteration 130090/ 173500 | consumed samples: 33303040 | consumed tokens: 68204625920 | elapsed time per iteration (s): 0.12 | learning rate: 4.691E-05 | global batch size: 256 | lm loss: 4.504446E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2218.632 | TFLOPs: 8.25 | 7: iteration 130100/ 173500 | consumed samples: 33305600 | consumed tokens: 68209868800 | elapsed time per iteration (s): 0.10 | learning rate: 4.690E-05 | global batch size: 256 | lm loss: 4.507304E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.820 | TFLOPs: 9.41 | 7: iteration 130110/ 173500 | consumed samples: 33308160 | consumed tokens: 68215111680 | elapsed time per iteration (s): 0.11 | learning rate: 4.689E-05 | global batch size: 256 | lm loss: 4.495576E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.754 | TFLOPs: 8.59 | 7: iteration 130120/ 173500 | consumed samples: 33310720 | consumed tokens: 68220354560 | elapsed time per iteration (s): 0.10 | learning rate: 4.687E-05 | global batch size: 256 | lm loss: 4.516105E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2539.726 | TFLOPs: 9.45 | 7: iteration 130130/ 173500 | consumed samples: 33313280 | consumed tokens: 68225597440 | elapsed time per iteration (s): 0.10 | learning rate: 4.686E-05 | global batch size: 256 | lm loss: 4.507819E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2550.078 | TFLOPs: 9.49 | 7: iteration 130140/ 173500 | consumed samples: 33315840 | consumed tokens: 68230840320 | elapsed time per iteration (s): 0.10 | learning rate: 4.685E-05 | global batch size: 256 | lm loss: 4.501680E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2592.576 | TFLOPs: 9.64 | 7: iteration 130150/ 173500 | consumed samples: 33318400 | consumed tokens: 68236083200 | elapsed time per iteration (s): 0.08 | learning rate: 4.684E-05 | global batch size: 256 | lm loss: 4.510177E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.302 | TFLOPs: 12.04 | 7: iteration 130160/ 173500 | consumed samples: 33320960 | consumed tokens: 68241326080 | elapsed time per iteration (s): 0.08 | learning rate: 4.683E-05 | global batch size: 256 | lm loss: 4.522626E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.065 | TFLOPs: 11.55 | 7: iteration 130170/ 173500 | consumed samples: 33323520 | consumed tokens: 68246568960 | elapsed time per iteration (s): 0.08 | learning rate: 4.681E-05 | global batch size: 256 | lm loss: 4.507384E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.739 | TFLOPs: 11.73 | 7: iteration 130180/ 173500 | consumed samples: 33326080 | consumed tokens: 68251811840 | elapsed time per iteration (s): 0.08 | learning rate: 4.680E-05 | global batch size: 256 | lm loss: 4.507924E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.165 | TFLOPs: 11.45 | 7: iteration 130190/ 173500 | consumed samples: 33328640 | consumed tokens: 68257054720 | elapsed time per iteration (s): 0.08 | learning rate: 4.679E-05 | global batch size: 256 | lm loss: 4.490989E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.337 | TFLOPs: 12.00 | 7: iteration 130200/ 173500 | consumed samples: 33331200 | consumed tokens: 68262297600 | elapsed time per iteration (s): 0.08 | learning rate: 4.678E-05 | global batch size: 256 | lm loss: 4.523493E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.302 | TFLOPs: 12.06 | 7: iteration 130210/ 173500 | consumed samples: 33333760 | consumed tokens: 68267540480 | elapsed time per iteration (s): 0.08 | learning rate: 4.677E-05 | global batch size: 256 | lm loss: 4.506858E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.433 | TFLOPs: 12.01 | 7: iteration 130220/ 173500 | consumed samples: 33336320 | consumed tokens: 68272783360 | elapsed time per iteration (s): 0.11 | learning rate: 4.676E-05 | global batch size: 256 | lm loss: 4.503548E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.763 | TFLOPs: 8.95 | 7: iteration 130230/ 173500 | consumed samples: 33338880 | consumed tokens: 68278026240 | elapsed time per iteration (s): 0.10 | learning rate: 4.674E-05 | global batch size: 256 | lm loss: 4.515155E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.404 | TFLOPs: 9.53 | 7: iteration 130240/ 173500 | consumed samples: 33341440 | consumed tokens: 68283269120 | elapsed time per iteration (s): 0.08 | learning rate: 4.673E-05 | global batch size: 256 | lm loss: 4.498672E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.098 | TFLOPs: 11.33 | 7: iteration 130250/ 173500 | consumed samples: 33344000 | consumed tokens: 68288512000 | elapsed time per iteration (s): 0.08 | learning rate: 4.672E-05 | global batch size: 256 | lm loss: 4.518134E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.891 | TFLOPs: 11.97 | 7: iteration 130260/ 173500 | consumed samples: 33346560 | consumed tokens: 68293754880 | elapsed time per iteration (s): 0.08 | learning rate: 4.671E-05 | global batch size: 256 | lm loss: 4.515388E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.590 | TFLOPs: 11.85 | 7: iteration 130270/ 173500 | consumed samples: 33349120 | consumed tokens: 68298997760 | elapsed time per iteration (s): 0.09 | learning rate: 4.670E-05 | global batch size: 256 | lm loss: 4.517358E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.826 | TFLOPs: 10.88 | 7: iteration 130280/ 173500 | consumed samples: 33351680 | consumed tokens: 68304240640 | elapsed time per iteration (s): 0.08 | learning rate: 4.669E-05 | global batch size: 256 | lm loss: 4.507476E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3251.065 | TFLOPs: 12.09 | 7: iteration 130290/ 173500 | consumed samples: 33354240 | consumed tokens: 68309483520 | elapsed time per iteration (s): 0.08 | learning rate: 4.667E-05 | global batch size: 256 | lm loss: 4.504827E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.871 | TFLOPs: 11.94 | 7: iteration 130300/ 173500 | consumed samples: 33356800 | consumed tokens: 68314726400 | elapsed time per iteration (s): 0.09 | learning rate: 4.666E-05 | global batch size: 256 | lm loss: 4.517759E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2824.138 | TFLOPs: 10.50 | 7: iteration 130310/ 173500 | consumed samples: 33359360 | consumed tokens: 68319969280 | elapsed time per iteration (s): 0.08 | learning rate: 4.665E-05 | global batch size: 256 | lm loss: 4.505009E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.081 | TFLOPs: 11.36 | 7: iteration 130320/ 173500 | consumed samples: 33361920 | consumed tokens: 68325212160 | elapsed time per iteration (s): 0.09 | learning rate: 4.664E-05 | global batch size: 256 | lm loss: 4.509794E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.009 | TFLOPs: 10.27 | 7: iteration 130330/ 173500 | consumed samples: 33364480 | consumed tokens: 68330455040 | elapsed time per iteration (s): 0.08 | learning rate: 4.663E-05 | global batch size: 256 | lm loss: 4.500441E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.353 | TFLOPs: 12.00 | 7: iteration 130340/ 173500 | consumed samples: 33367040 | consumed tokens: 68335697920 | elapsed time per iteration (s): 0.10 | learning rate: 4.662E-05 | global batch size: 256 | lm loss: 4.512041E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2480.698 | TFLOPs: 9.23 | 7: iteration 130350/ 173500 | consumed samples: 33369600 | consumed tokens: 68340940800 | elapsed time per iteration (s): 0.08 | learning rate: 4.660E-05 | global batch size: 256 | lm loss: 4.500484E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.612 | TFLOPs: 11.89 | 7: iteration 130360/ 173500 | consumed samples: 33372160 | consumed tokens: 68346183680 | elapsed time per iteration (s): 0.08 | learning rate: 4.659E-05 | global batch size: 256 | lm loss: 4.513580E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.424 | TFLOPs: 11.22 | 7: iteration 130370/ 173500 | consumed samples: 33374720 | consumed tokens: 68351426560 | elapsed time per iteration (s): 0.08 | learning rate: 4.658E-05 | global batch size: 256 | lm loss: 4.504189E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.062 | TFLOPs: 11.96 | 7: iteration 130380/ 173500 | consumed samples: 33377280 | consumed tokens: 68356669440 | elapsed time per iteration (s): 0.08 | learning rate: 4.657E-05 | global batch size: 256 | lm loss: 4.523231E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.494 | TFLOPs: 11.98 | 7: iteration 130390/ 173500 | consumed samples: 33379840 | consumed tokens: 68361912320 | elapsed time per iteration (s): 0.08 | learning rate: 4.656E-05 | global batch size: 256 | lm loss: 4.529992E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.915 | TFLOPs: 11.97 | 7: iteration 130400/ 173500 | consumed samples: 33382400 | consumed tokens: 68367155200 | elapsed time per iteration (s): 0.09 | learning rate: 4.655E-05 | global batch size: 256 | lm loss: 4.504029E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.213 | TFLOPs: 11.20 | 7: iteration 130410/ 173500 | consumed samples: 33384960 | consumed tokens: 68372398080 | elapsed time per iteration (s): 0.08 | learning rate: 4.653E-05 | global batch size: 256 | lm loss: 4.511792E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.661 | TFLOPs: 11.89 | 7: iteration 130420/ 173500 | consumed samples: 33387520 | consumed tokens: 68377640960 | elapsed time per iteration (s): 0.12 | learning rate: 4.652E-05 | global batch size: 256 | lm loss: 4.501149E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.147 | TFLOPs: 7.86 | 7: iteration 130430/ 173500 | consumed samples: 33390080 | consumed tokens: 68382883840 | elapsed time per iteration (s): 0.12 | learning rate: 4.651E-05 | global batch size: 256 | lm loss: 4.512498E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2120.337 | TFLOPs: 7.89 | 7: iteration 130440/ 173500 | consumed samples: 33392640 | consumed tokens: 68388126720 | elapsed time per iteration (s): 0.09 | learning rate: 4.650E-05 | global batch size: 256 | lm loss: 4.512944E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.884 | TFLOPs: 10.05 | 7: iteration 130450/ 173500 | consumed samples: 33395200 | consumed tokens: 68393369600 | elapsed time per iteration (s): 0.08 | learning rate: 4.649E-05 | global batch size: 256 | lm loss: 4.495668E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.045 | TFLOPs: 12.05 | 7: iteration 130460/ 173500 | consumed samples: 33397760 | consumed tokens: 68398612480 | elapsed time per iteration (s): 0.08 | learning rate: 4.648E-05 | global batch size: 256 | lm loss: 4.508491E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.798 | TFLOPs: 12.02 | 7: iteration 130470/ 173500 | consumed samples: 33400320 | consumed tokens: 68403855360 | elapsed time per iteration (s): 0.10 | learning rate: 4.646E-05 | global batch size: 256 | lm loss: 4.498006E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2584.886 | TFLOPs: 9.61 | 7: iteration 130480/ 173500 | consumed samples: 33402880 | consumed tokens: 68409098240 | elapsed time per iteration (s): 0.12 | learning rate: 4.645E-05 | global batch size: 256 | lm loss: 4.514697E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2064.191 | TFLOPs: 7.68 | 7: iteration 130490/ 173500 | consumed samples: 33405440 | consumed tokens: 68414341120 | elapsed time per iteration (s): 0.10 | learning rate: 4.644E-05 | global batch size: 256 | lm loss: 4.518783E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.146 | TFLOPs: 9.23 | 7: iteration 130500/ 173500 | consumed samples: 33408000 | consumed tokens: 68419584000 | elapsed time per iteration (s): 0.10 | learning rate: 4.643E-05 | global batch size: 256 | lm loss: 4.509002E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.308 | TFLOPs: 9.76 | 7: iteration 130510/ 173500 | consumed samples: 33410560 | consumed tokens: 68424826880 | elapsed time per iteration (s): 0.08 | learning rate: 4.642E-05 | global batch size: 256 | lm loss: 4.501983E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.524 | TFLOPs: 12.02 | 7: iteration 130520/ 173500 | consumed samples: 33413120 | consumed tokens: 68430069760 | elapsed time per iteration (s): 0.08 | learning rate: 4.641E-05 | global batch size: 256 | lm loss: 4.510702E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.341 | TFLOPs: 11.92 | 7: iteration 130530/ 173500 | consumed samples: 33415680 | consumed tokens: 68435312640 | elapsed time per iteration (s): 0.10 | learning rate: 4.639E-05 | global batch size: 256 | lm loss: 4.509154E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2547.506 | TFLOPs: 9.48 | 7: iteration 130540/ 173500 | consumed samples: 33418240 | consumed tokens: 68440555520 | elapsed time per iteration (s): 0.08 | learning rate: 4.638E-05 | global batch size: 256 | lm loss: 4.515395E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.575 | TFLOPs: 12.08 | 7: iteration 130550/ 173500 | consumed samples: 33420800 | consumed tokens: 68445798400 | elapsed time per iteration (s): 0.10 | learning rate: 4.637E-05 | global batch size: 256 | lm loss: 4.516344E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.810 | TFLOPs: 9.66 | 7: iteration 130560/ 173500 | consumed samples: 33423360 | consumed tokens: 68451041280 | elapsed time per iteration (s): 0.09 | learning rate: 4.636E-05 | global batch size: 256 | lm loss: 4.519480E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2812.850 | TFLOPs: 10.46 | 7: iteration 130570/ 173500 | consumed samples: 33425920 | consumed tokens: 68456284160 | elapsed time per iteration (s): 0.08 | learning rate: 4.635E-05 | global batch size: 256 | lm loss: 4.501118E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.434 | TFLOPs: 12.03 | 7: iteration 130580/ 173500 | consumed samples: 33428480 | consumed tokens: 68461527040 | elapsed time per iteration (s): 0.08 | learning rate: 4.634E-05 | global batch size: 256 | lm loss: 4.507994E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3252.456 | TFLOPs: 12.10 | 7: iteration 130590/ 173500 | consumed samples: 33431040 | consumed tokens: 68466769920 | elapsed time per iteration (s): 0.12 | learning rate: 4.632E-05 | global batch size: 256 | lm loss: 4.496791E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2204.987 | TFLOPs: 8.20 | 7: iteration 130600/ 173500 | consumed samples: 33433600 | consumed tokens: 68472012800 | elapsed time per iteration (s): 0.09 | learning rate: 4.631E-05 | global batch size: 256 | lm loss: 4.510477E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.661 | TFLOPs: 10.97 | 7: iteration 130610/ 173500 | consumed samples: 33436160 | consumed tokens: 68477255680 | elapsed time per iteration (s): 0.08 | learning rate: 4.630E-05 | global batch size: 256 | lm loss: 4.516297E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.702 | TFLOPs: 12.01 | 7: iteration 130620/ 173500 | consumed samples: 33438720 | consumed tokens: 68482498560 | elapsed time per iteration (s): 0.08 | learning rate: 4.629E-05 | global batch size: 256 | lm loss: 4.519875E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.423 | TFLOPs: 12.04 | 7: iteration 130630/ 173500 | consumed samples: 33441280 | consumed tokens: 68487741440 | elapsed time per iteration (s): 0.10 | learning rate: 4.628E-05 | global batch size: 256 | lm loss: 4.516016E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.171 | TFLOPs: 10.00 | 7: iteration 130640/ 173500 | consumed samples: 33443840 | consumed tokens: 68492984320 | elapsed time per iteration (s): 0.08 | learning rate: 4.627E-05 | global batch size: 256 | lm loss: 4.515257E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.238 | TFLOPs: 12.06 | 7: iteration 130650/ 173500 | consumed samples: 33446400 | consumed tokens: 68498227200 | elapsed time per iteration (s): 0.08 | learning rate: 4.625E-05 | global batch size: 256 | lm loss: 4.511586E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.193 | TFLOPs: 11.31 | 7: iteration 130660/ 173500 | consumed samples: 33448960 | consumed tokens: 68503470080 | elapsed time per iteration (s): 0.09 | learning rate: 4.624E-05 | global batch size: 256 | lm loss: 4.510688E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2894.250 | TFLOPs: 10.77 | 7: iteration 130670/ 173500 | consumed samples: 33451520 | consumed tokens: 68508712960 | elapsed time per iteration (s): 0.10 | learning rate: 4.623E-05 | global batch size: 256 | lm loss: 4.508842E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2558.132 | TFLOPs: 9.52 | 7: iteration 130680/ 173500 | consumed samples: 33454080 | consumed tokens: 68513955840 | elapsed time per iteration (s): 0.08 | learning rate: 4.622E-05 | global batch size: 256 | lm loss: 4.516859E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.218 | TFLOPs: 11.77 | 7: iteration 130690/ 173500 | consumed samples: 33456640 | consumed tokens: 68519198720 | elapsed time per iteration (s): 0.08 | learning rate: 4.621E-05 | global batch size: 256 | lm loss: 4.507585E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.333 | TFLOPs: 12.06 | 7: iteration 130700/ 173500 | consumed samples: 33459200 | consumed tokens: 68524441600 | elapsed time per iteration (s): 0.08 | learning rate: 4.620E-05 | global batch size: 256 | lm loss: 4.519946E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.641 | TFLOPs: 11.72 | 7: iteration 130710/ 173500 | consumed samples: 33461760 | consumed tokens: 68529684480 | elapsed time per iteration (s): 0.09 | learning rate: 4.619E-05 | global batch size: 256 | lm loss: 4.520746E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.415 | TFLOPs: 10.05 | 7: iteration 130720/ 173500 | consumed samples: 33464320 | consumed tokens: 68534927360 | elapsed time per iteration (s): 0.10 | learning rate: 4.617E-05 | global batch size: 256 | lm loss: 4.511040E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2531.736 | TFLOPs: 9.42 | 7: iteration 130730/ 173500 | consumed samples: 33466880 | consumed tokens: 68540170240 | elapsed time per iteration (s): 0.08 | learning rate: 4.616E-05 | global batch size: 256 | lm loss: 4.509998E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3248.146 | TFLOPs: 12.08 | 7: iteration 130740/ 173500 | consumed samples: 33469440 | consumed tokens: 68545413120 | elapsed time per iteration (s): 0.11 | learning rate: 4.615E-05 | global batch size: 256 | lm loss: 4.509353E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2333.673 | TFLOPs: 8.68 | 7: iteration 130750/ 173500 | consumed samples: 33472000 | consumed tokens: 68550656000 | elapsed time per iteration (s): 0.12 | learning rate: 4.614E-05 | global batch size: 256 | lm loss: 4.502843E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.736 | TFLOPs: 8.15 | 7: iteration 130760/ 173500 | consumed samples: 33474560 | consumed tokens: 68555898880 | elapsed time per iteration (s): 0.12 | learning rate: 4.613E-05 | global batch size: 256 | lm loss: 4.518013E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.339 | TFLOPs: 7.70 | 7: iteration 130770/ 173500 | consumed samples: 33477120 | consumed tokens: 68561141760 | elapsed time per iteration (s): 0.10 | learning rate: 4.612E-05 | global batch size: 256 | lm loss: 4.496644E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2690.951 | TFLOPs: 10.01 | 7: iteration 130780/ 173500 | consumed samples: 33479680 | consumed tokens: 68566384640 | elapsed time per iteration (s): 0.08 | learning rate: 4.610E-05 | global batch size: 256 | lm loss: 4.512322E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.947 | TFLOPs: 11.81 | 7: iteration 130790/ 173500 | consumed samples: 33482240 | consumed tokens: 68571627520 | elapsed time per iteration (s): 0.09 | learning rate: 4.609E-05 | global batch size: 256 | lm loss: 4.503469E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2780.416 | TFLOPs: 10.34 | 7: iteration 130800/ 173500 | consumed samples: 33484800 | consumed tokens: 68576870400 | elapsed time per iteration (s): 0.08 | learning rate: 4.608E-05 | global batch size: 256 | lm loss: 4.501076E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.398 | TFLOPs: 11.81 | 7: iteration 130810/ 173500 | consumed samples: 33487360 | consumed tokens: 68582113280 | elapsed time per iteration (s): 0.08 | learning rate: 4.607E-05 | global batch size: 256 | lm loss: 4.517955E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.985 | TFLOPs: 11.98 | 7: iteration 130820/ 173500 | consumed samples: 33489920 | consumed tokens: 68587356160 | elapsed time per iteration (s): 0.11 | learning rate: 4.606E-05 | global batch size: 256 | lm loss: 4.512482E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2282.475 | TFLOPs: 8.49 | 7: iteration 130830/ 173500 | consumed samples: 33492480 | consumed tokens: 68592599040 | elapsed time per iteration (s): 0.09 | learning rate: 4.605E-05 | global batch size: 256 | lm loss: 4.505328E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2871.198 | TFLOPs: 10.68 | 7: iteration 130840/ 173500 | consumed samples: 33495040 | consumed tokens: 68597841920 | elapsed time per iteration (s): 0.11 | learning rate: 4.603E-05 | global batch size: 256 | lm loss: 4.516185E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.165 | TFLOPs: 8.85 | 7: iteration 130850/ 173500 | consumed samples: 33497600 | consumed tokens: 68603084800 | elapsed time per iteration (s): 0.12 | learning rate: 4.602E-05 | global batch size: 256 | lm loss: 4.497759E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2174.787 | TFLOPs: 8.09 | 7: iteration 130860/ 173500 | consumed samples: 33500160 | consumed tokens: 68608327680 | elapsed time per iteration (s): 0.11 | learning rate: 4.601E-05 | global batch size: 256 | lm loss: 4.507644E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2355.366 | TFLOPs: 8.76 | 7: iteration 130870/ 173500 | consumed samples: 33502720 | consumed tokens: 68613570560 | elapsed time per iteration (s): 0.11 | learning rate: 4.600E-05 | global batch size: 256 | lm loss: 4.516604E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.332 | TFLOPs: 8.55 | 7: iteration 130880/ 173500 | consumed samples: 33505280 | consumed tokens: 68618813440 | elapsed time per iteration (s): 0.12 | learning rate: 4.599E-05 | global batch size: 256 | lm loss: 4.494164E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2059.988 | TFLOPs: 7.66 | 7: iteration 130890/ 173500 | consumed samples: 33507840 | consumed tokens: 68624056320 | elapsed time per iteration (s): 0.11 | learning rate: 4.598E-05 | global batch size: 256 | lm loss: 4.510983E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2338.798 | TFLOPs: 8.70 | 7: iteration 130900/ 173500 | consumed samples: 33510400 | consumed tokens: 68629299200 | elapsed time per iteration (s): 0.10 | learning rate: 4.596E-05 | global batch size: 256 | lm loss: 4.507264E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.040 | TFLOPs: 9.33 | 7: iteration 130910/ 173500 | consumed samples: 33512960 | consumed tokens: 68634542080 | elapsed time per iteration (s): 0.08 | learning rate: 4.595E-05 | global batch size: 256 | lm loss: 4.516921E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.847 | TFLOPs: 11.80 | 7: iteration 130920/ 173500 | consumed samples: 33515520 | consumed tokens: 68639784960 | elapsed time per iteration (s): 0.12 | learning rate: 4.594E-05 | global batch size: 256 | lm loss: 4.511913E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.175 | TFLOPs: 8.21 | 7: iteration 130930/ 173500 | consumed samples: 33518080 | consumed tokens: 68645027840 | elapsed time per iteration (s): 0.08 | learning rate: 4.593E-05 | global batch size: 256 | lm loss: 4.506239E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.843 | TFLOPs: 11.91 | 7: iteration 130940/ 173500 | consumed samples: 33520640 | consumed tokens: 68650270720 | elapsed time per iteration (s): 0.10 | learning rate: 4.592E-05 | global batch size: 256 | lm loss: 4.509682E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2488.987 | TFLOPs: 9.26 | 7: iteration 130950/ 173500 | consumed samples: 33523200 | consumed tokens: 68655513600 | elapsed time per iteration (s): 0.09 | learning rate: 4.591E-05 | global batch size: 256 | lm loss: 4.507616E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.446 | TFLOPs: 10.57 | 7: iteration 130960/ 173500 | consumed samples: 33525760 | consumed tokens: 68660756480 | elapsed time per iteration (s): 0.08 | learning rate: 4.590E-05 | global batch size: 256 | lm loss: 4.501070E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.028 | TFLOPs: 12.04 | 7: iteration 130970/ 173500 | consumed samples: 33528320 | consumed tokens: 68665999360 | elapsed time per iteration (s): 0.08 | learning rate: 4.588E-05 | global batch size: 256 | lm loss: 4.512208E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.315 | TFLOPs: 11.93 | 7: iteration 130980/ 173500 | consumed samples: 33530880 | consumed tokens: 68671242240 | elapsed time per iteration (s): 0.09 | learning rate: 4.587E-05 | global batch size: 256 | lm loss: 4.518715E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.803 | TFLOPs: 11.17 | 7: iteration 130990/ 173500 | consumed samples: 33533440 | consumed tokens: 68676485120 | elapsed time per iteration (s): 0.08 | learning rate: 4.586E-05 | global batch size: 256 | lm loss: 4.506138E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.488 | TFLOPs: 12.02 | 7: iteration 131000/ 173500 | consumed samples: 33536000 | consumed tokens: 68681728000 | elapsed time per iteration (s): 0.10 | learning rate: 4.585E-05 | global batch size: 256 | lm loss: 4.499467E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.055 | TFLOPs: 9.41 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 131000 | lm loss value: 4.402318E+00 | lm loss PPL: 8.163989E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 131000 to checkpoints_14m91b100m 0: [2023-03-17 03:26:36,289] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step131000 is begin to save! 0: [2023-03-17 03:26:36,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:26:36,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:26:36,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:26:36,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:26:36,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:26:36,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:26:36,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:26:36,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:26:36,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:26:36,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:26:36,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:26:36,334] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:26:36,334] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step131000/mp_rank_00_model_states.pt 0: [2023-03-17 03:26:36,334] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:26:36,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:26:36,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 03:26:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 6: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 3: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 2: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 1: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 4: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 5: [2023-03-17 03:26:36,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:26:36,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:26:36,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 7: time (ms) | save-checkpoint: 81.26 0: [2023-03-17 03:26:36,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:26:36,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:26:36,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step131000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:26:36,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step131000 is ready now! 0: successfully saved checkpoint at iteration 131000 to checkpoints_14m91b100m 7: iteration 131010/ 173500 | consumed samples: 33538560 | consumed tokens: 68686970880 | elapsed time per iteration (s): 0.10 | learning rate: 4.584E-05 | global batch size: 256 | lm loss: 4.506657E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2589.530 | TFLOPs: 9.63 | 7: iteration 131020/ 173500 | consumed samples: 33541120 | consumed tokens: 68692213760 | elapsed time per iteration (s): 0.10 | learning rate: 4.583E-05 | global batch size: 256 | lm loss: 4.506886E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2583.023 | TFLOPs: 9.61 | 7: iteration 131030/ 173500 | consumed samples: 33543680 | consumed tokens: 68697456640 | elapsed time per iteration (s): 0.08 | learning rate: 4.581E-05 | global batch size: 256 | lm loss: 4.503588E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.468 | TFLOPs: 11.96 | 7: iteration 131040/ 173500 | consumed samples: 33546240 | consumed tokens: 68702699520 | elapsed time per iteration (s): 0.10 | learning rate: 4.580E-05 | global batch size: 256 | lm loss: 4.503875E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2676.018 | TFLOPs: 9.95 | 7: iteration 131050/ 173500 | consumed samples: 33548800 | consumed tokens: 68707942400 | elapsed time per iteration (s): 0.08 | learning rate: 4.579E-05 | global batch size: 256 | lm loss: 4.505827E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.106 | TFLOPs: 11.21 | 7: iteration 131060/ 173500 | consumed samples: 33551360 | consumed tokens: 68713185280 | elapsed time per iteration (s): 0.08 | learning rate: 4.578E-05 | global batch size: 256 | lm loss: 4.512854E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.204 | TFLOPs: 11.98 | 7: iteration 131070/ 173500 | consumed samples: 33553920 | consumed tokens: 68718428160 | elapsed time per iteration (s): 0.09 | learning rate: 4.577E-05 | global batch size: 256 | lm loss: 4.507248E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.501 | TFLOPs: 11.05 | 7: iteration 131080/ 173500 | consumed samples: 33556480 | consumed tokens: 68723671040 | elapsed time per iteration (s): 0.08 | learning rate: 4.576E-05 | global batch size: 256 | lm loss: 4.500190E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.884 | TFLOPs: 11.90 | 7: iteration 131090/ 173500 | consumed samples: 33559040 | consumed tokens: 68728913920 | elapsed time per iteration (s): 0.11 | learning rate: 4.575E-05 | global batch size: 256 | lm loss: 4.500351E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2420.511 | TFLOPs: 9.00 | 7: iteration 131100/ 173500 | consumed samples: 33561600 | consumed tokens: 68734156800 | elapsed time per iteration (s): 0.09 | learning rate: 4.573E-05 | global batch size: 256 | lm loss: 4.500799E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.233 | TFLOPs: 10.51 | 7: iteration 131110/ 173500 | consumed samples: 33564160 | consumed tokens: 68739399680 | elapsed time per iteration (s): 0.08 | learning rate: 4.572E-05 | global batch size: 256 | lm loss: 4.509746E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.244 | TFLOPs: 11.85 | 7: iteration 131120/ 173500 | consumed samples: 33566720 | consumed tokens: 68744642560 | elapsed time per iteration (s): 0.10 | learning rate: 4.571E-05 | global batch size: 256 | lm loss: 4.506522E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2597.867 | TFLOPs: 9.66 | 7: iteration 131130/ 173500 | consumed samples: 33569280 | consumed tokens: 68749885440 | elapsed time per iteration (s): 0.10 | learning rate: 4.570E-05 | global batch size: 256 | lm loss: 4.516245E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.824 | TFLOPs: 9.11 | 7: iteration 131140/ 173500 | consumed samples: 33571840 | consumed tokens: 68755128320 | elapsed time per iteration (s): 0.12 | learning rate: 4.569E-05 | global batch size: 256 | lm loss: 4.521572E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.478 | TFLOPs: 7.63 | 7: iteration 131150/ 173500 | consumed samples: 33574400 | consumed tokens: 68760371200 | elapsed time per iteration (s): 0.13 | learning rate: 4.568E-05 | global batch size: 256 | lm loss: 4.504039E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1933.518 | TFLOPs: 7.19 | 7: iteration 131160/ 173500 | consumed samples: 33576960 | consumed tokens: 68765614080 | elapsed time per iteration (s): 0.13 | learning rate: 4.566E-05 | global batch size: 256 | lm loss: 4.498658E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.310 | TFLOPs: 7.46 | 7: iteration 131170/ 173500 | consumed samples: 33579520 | consumed tokens: 68770856960 | elapsed time per iteration (s): 0.13 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 4.525616E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2017.441 | TFLOPs: 7.50 | 7: iteration 131180/ 173500 | consumed samples: 33582080 | consumed tokens: 68776099840 | elapsed time per iteration (s): 0.09 | learning rate: 4.564E-05 | global batch size: 256 | lm loss: 4.516251E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.195 | TFLOPs: 10.88 | 7: iteration 131190/ 173500 | consumed samples: 33584640 | consumed tokens: 68781342720 | elapsed time per iteration (s): 0.12 | learning rate: 4.563E-05 | global batch size: 256 | lm loss: 4.497413E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2184.501 | TFLOPs: 8.13 | 7: iteration 131200/ 173500 | consumed samples: 33587200 | consumed tokens: 68786585600 | elapsed time per iteration (s): 0.13 | learning rate: 4.562E-05 | global batch size: 256 | lm loss: 4.512897E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2043.847 | TFLOPs: 7.60 | 7: iteration 131210/ 173500 | consumed samples: 33589760 | consumed tokens: 68791828480 | elapsed time per iteration (s): 0.11 | learning rate: 4.561E-05 | global batch size: 256 | lm loss: 4.504276E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2305.384 | TFLOPs: 8.58 | 7: iteration 131220/ 173500 | consumed samples: 33592320 | consumed tokens: 68797071360 | elapsed time per iteration (s): 0.12 | learning rate: 4.560E-05 | global batch size: 256 | lm loss: 4.516551E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2160.462 | TFLOPs: 8.04 | 7: iteration 131230/ 173500 | consumed samples: 33594880 | consumed tokens: 68802314240 | elapsed time per iteration (s): 0.11 | learning rate: 4.558E-05 | global batch size: 256 | lm loss: 4.516494E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2237.638 | TFLOPs: 8.32 | 7: iteration 131240/ 173500 | consumed samples: 33597440 | consumed tokens: 68807557120 | elapsed time per iteration (s): 0.12 | learning rate: 4.557E-05 | global batch size: 256 | lm loss: 4.498314E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2054.086 | TFLOPs: 7.64 | 7: iteration 131250/ 173500 | consumed samples: 33600000 | consumed tokens: 68812800000 | elapsed time per iteration (s): 0.14 | learning rate: 4.556E-05 | global batch size: 256 | lm loss: 4.516520E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1854.376 | TFLOPs: 6.90 | 7: iteration 131260/ 173500 | consumed samples: 33602560 | consumed tokens: 68818042880 | elapsed time per iteration (s): 0.12 | learning rate: 4.555E-05 | global batch size: 256 | lm loss: 4.512975E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2069.472 | TFLOPs: 7.70 | 7: iteration 131270/ 173500 | consumed samples: 33605120 | consumed tokens: 68823285760 | elapsed time per iteration (s): 0.11 | learning rate: 4.554E-05 | global batch size: 256 | lm loss: 4.522677E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.241 | TFLOPs: 8.88 | 7: iteration 131280/ 173500 | consumed samples: 33607680 | consumed tokens: 68828528640 | elapsed time per iteration (s): 0.09 | learning rate: 4.553E-05 | global batch size: 256 | lm loss: 4.499983E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2949.147 | TFLOPs: 10.97 | 7: iteration 131290/ 173500 | consumed samples: 33610240 | consumed tokens: 68833771520 | elapsed time per iteration (s): 0.08 | learning rate: 4.552E-05 | global batch size: 256 | lm loss: 4.508459E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.687 | TFLOPs: 12.00 | 7: iteration 131300/ 173500 | consumed samples: 33612800 | consumed tokens: 68839014400 | elapsed time per iteration (s): 0.09 | learning rate: 4.550E-05 | global batch size: 256 | lm loss: 4.508912E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2710.094 | TFLOPs: 10.08 | 7: iteration 131310/ 173500 | consumed samples: 33615360 | consumed tokens: 68844257280 | elapsed time per iteration (s): 0.08 | learning rate: 4.549E-05 | global batch size: 256 | lm loss: 4.502367E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.714 | TFLOPs: 11.97 | 7: iteration 131320/ 173500 | consumed samples: 33617920 | consumed tokens: 68849500160 | elapsed time per iteration (s): 0.08 | learning rate: 4.548E-05 | global batch size: 256 | lm loss: 4.502172E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.162 | TFLOPs: 11.97 | 7: iteration 131330/ 173500 | consumed samples: 33620480 | consumed tokens: 68854743040 | elapsed time per iteration (s): 0.08 | learning rate: 4.547E-05 | global batch size: 256 | lm loss: 4.526760E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.036 | TFLOPs: 11.94 | 7: iteration 131340/ 173500 | consumed samples: 33623040 | consumed tokens: 68859985920 | elapsed time per iteration (s): 0.09 | learning rate: 4.546E-05 | global batch size: 256 | lm loss: 4.508034E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.242 | TFLOPs: 10.67 | 7: iteration 131350/ 173500 | consumed samples: 33625600 | consumed tokens: 68865228800 | elapsed time per iteration (s): 0.11 | learning rate: 4.545E-05 | global batch size: 256 | lm loss: 4.508470E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.683 | TFLOPs: 8.56 | 7: iteration 131360/ 173500 | consumed samples: 33628160 | consumed tokens: 68870471680 | elapsed time per iteration (s): 0.11 | learning rate: 4.544E-05 | global batch size: 256 | lm loss: 4.509929E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.056 | TFLOPs: 9.02 | 7: iteration 131370/ 173500 | consumed samples: 33630720 | consumed tokens: 68875714560 | elapsed time per iteration (s): 0.13 | learning rate: 4.542E-05 | global batch size: 256 | lm loss: 4.517538E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1944.459 | TFLOPs: 7.23 | 7: iteration 131380/ 173500 | consumed samples: 33633280 | consumed tokens: 68880957440 | elapsed time per iteration (s): 0.14 | learning rate: 4.541E-05 | global batch size: 256 | lm loss: 4.516808E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1798.222 | TFLOPs: 6.69 | 7: iteration 131390/ 173500 | consumed samples: 33635840 | consumed tokens: 68886200320 | elapsed time per iteration (s): 0.14 | learning rate: 4.540E-05 | global batch size: 256 | lm loss: 4.510821E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1825.154 | TFLOPs: 6.79 | 7: iteration 131400/ 173500 | consumed samples: 33638400 | consumed tokens: 68891443200 | elapsed time per iteration (s): 0.12 | learning rate: 4.539E-05 | global batch size: 256 | lm loss: 4.501979E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2159.460 | TFLOPs: 8.03 | 7: iteration 131410/ 173500 | consumed samples: 33640960 | consumed tokens: 68896686080 | elapsed time per iteration (s): 0.13 | learning rate: 4.538E-05 | global batch size: 256 | lm loss: 4.506081E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.421 | TFLOPs: 7.46 | 7: iteration 131420/ 173500 | consumed samples: 33643520 | consumed tokens: 68901928960 | elapsed time per iteration (s): 0.12 | learning rate: 4.537E-05 | global batch size: 256 | lm loss: 4.500607E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.273 | TFLOPs: 7.63 | 7: iteration 131430/ 173500 | consumed samples: 33646080 | consumed tokens: 68907171840 | elapsed time per iteration (s): 0.14 | learning rate: 4.535E-05 | global batch size: 256 | lm loss: 4.516136E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1789.746 | TFLOPs: 6.66 | 7: iteration 131440/ 173500 | consumed samples: 33648640 | consumed tokens: 68912414720 | elapsed time per iteration (s): 0.13 | learning rate: 4.534E-05 | global batch size: 256 | lm loss: 4.518664E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1913.978 | TFLOPs: 7.12 | 7: iteration 131450/ 173500 | consumed samples: 33651200 | consumed tokens: 68917657600 | elapsed time per iteration (s): 0.13 | learning rate: 4.533E-05 | global batch size: 256 | lm loss: 4.503982E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1934.772 | TFLOPs: 7.20 | 7: iteration 131460/ 173500 | consumed samples: 33653760 | consumed tokens: 68922900480 | elapsed time per iteration (s): 0.14 | learning rate: 4.532E-05 | global batch size: 256 | lm loss: 4.507821E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1809.748 | TFLOPs: 6.73 | 7: iteration 131470/ 173500 | consumed samples: 33656320 | consumed tokens: 68928143360 | elapsed time per iteration (s): 0.13 | learning rate: 4.531E-05 | global batch size: 256 | lm loss: 4.522555E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1902.417 | TFLOPs: 7.08 | 7: iteration 131480/ 173500 | consumed samples: 33658880 | consumed tokens: 68933386240 | elapsed time per iteration (s): 0.11 | learning rate: 4.530E-05 | global batch size: 256 | lm loss: 4.505215E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2383.308 | TFLOPs: 8.86 | 7: iteration 131490/ 173500 | consumed samples: 33661440 | consumed tokens: 68938629120 | elapsed time per iteration (s): 0.08 | learning rate: 4.529E-05 | global batch size: 256 | lm loss: 4.500830E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.008 | TFLOPs: 11.72 | 7: iteration 131500/ 173500 | consumed samples: 33664000 | consumed tokens: 68943872000 | elapsed time per iteration (s): 0.08 | learning rate: 4.527E-05 | global batch size: 256 | lm loss: 4.507302E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.744 | TFLOPs: 11.90 | 7: iteration 131510/ 173500 | consumed samples: 33666560 | consumed tokens: 68949114880 | elapsed time per iteration (s): 0.09 | learning rate: 4.526E-05 | global batch size: 256 | lm loss: 4.522636E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2855.803 | TFLOPs: 10.62 | 7: iteration 131520/ 173500 | consumed samples: 33669120 | consumed tokens: 68954357760 | elapsed time per iteration (s): 0.10 | learning rate: 4.525E-05 | global batch size: 256 | lm loss: 4.504402E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2650.421 | TFLOPs: 9.86 | 7: iteration 131530/ 173500 | consumed samples: 33671680 | consumed tokens: 68959600640 | elapsed time per iteration (s): 0.10 | learning rate: 4.524E-05 | global batch size: 256 | lm loss: 4.509504E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.768 | TFLOPs: 9.34 | 7: iteration 131540/ 173500 | consumed samples: 33674240 | consumed tokens: 68964843520 | elapsed time per iteration (s): 0.13 | learning rate: 4.523E-05 | global batch size: 256 | lm loss: 4.500987E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2018.792 | TFLOPs: 7.51 | 7: iteration 131550/ 173500 | consumed samples: 33676800 | consumed tokens: 68970086400 | elapsed time per iteration (s): 0.09 | learning rate: 4.522E-05 | global batch size: 256 | lm loss: 4.503285E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2757.462 | TFLOPs: 10.26 | 7: iteration 131560/ 173500 | consumed samples: 33679360 | consumed tokens: 68975329280 | elapsed time per iteration (s): 0.12 | learning rate: 4.521E-05 | global batch size: 256 | lm loss: 4.514403E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2108.494 | TFLOPs: 7.84 | 7: iteration 131570/ 173500 | consumed samples: 33681920 | consumed tokens: 68980572160 | elapsed time per iteration (s): 0.09 | learning rate: 4.519E-05 | global batch size: 256 | lm loss: 4.499322E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.704 | TFLOPs: 10.32 | 7: iteration 131580/ 173500 | consumed samples: 33684480 | consumed tokens: 68985815040 | elapsed time per iteration (s): 0.10 | learning rate: 4.518E-05 | global batch size: 256 | lm loss: 4.505433E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2566.747 | TFLOPs: 9.55 | 7: iteration 131590/ 173500 | consumed samples: 33687040 | consumed tokens: 68991057920 | elapsed time per iteration (s): 0.12 | learning rate: 4.517E-05 | global batch size: 256 | lm loss: 4.501374E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2143.500 | TFLOPs: 7.97 | 7: iteration 131600/ 173500 | consumed samples: 33689600 | consumed tokens: 68996300800 | elapsed time per iteration (s): 0.10 | learning rate: 4.516E-05 | global batch size: 256 | lm loss: 4.510967E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.228 | TFLOPs: 9.11 | 7: iteration 131610/ 173500 | consumed samples: 33692160 | consumed tokens: 69001543680 | elapsed time per iteration (s): 0.11 | learning rate: 4.515E-05 | global batch size: 256 | lm loss: 4.504361E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2312.639 | TFLOPs: 8.60 | 7: iteration 131620/ 173500 | consumed samples: 33694720 | consumed tokens: 69006786560 | elapsed time per iteration (s): 0.09 | learning rate: 4.514E-05 | global batch size: 256 | lm loss: 4.513511E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.421 | TFLOPs: 11.19 | 7: iteration 131630/ 173500 | consumed samples: 33697280 | consumed tokens: 69012029440 | elapsed time per iteration (s): 0.08 | learning rate: 4.513E-05 | global batch size: 256 | lm loss: 4.522992E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.050 | TFLOPs: 11.84 | 7: iteration 131640/ 173500 | consumed samples: 33699840 | consumed tokens: 69017272320 | elapsed time per iteration (s): 0.10 | learning rate: 4.511E-05 | global batch size: 256 | lm loss: 4.513036E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.862 | TFLOPs: 9.69 | 7: iteration 131650/ 173500 | consumed samples: 33702400 | consumed tokens: 69022515200 | elapsed time per iteration (s): 0.08 | learning rate: 4.510E-05 | global batch size: 256 | lm loss: 4.525837E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.504 | TFLOPs: 11.41 | 7: iteration 131660/ 173500 | consumed samples: 33704960 | consumed tokens: 69027758080 | elapsed time per iteration (s): 0.08 | learning rate: 4.509E-05 | global batch size: 256 | lm loss: 4.507747E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.761 | TFLOPs: 11.40 | 7: iteration 131670/ 173500 | consumed samples: 33707520 | consumed tokens: 69033000960 | elapsed time per iteration (s): 0.08 | learning rate: 4.508E-05 | global batch size: 256 | lm loss: 4.517226E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.887 | TFLOPs: 11.82 | 7: iteration 131680/ 173500 | consumed samples: 33710080 | consumed tokens: 69038243840 | elapsed time per iteration (s): 0.09 | learning rate: 4.507E-05 | global batch size: 256 | lm loss: 4.503870E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.185 | TFLOPs: 11.02 | 7: iteration 131690/ 173500 | consumed samples: 33712640 | consumed tokens: 69043486720 | elapsed time per iteration (s): 0.09 | learning rate: 4.506E-05 | global batch size: 256 | lm loss: 4.510982E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.610 | TFLOPs: 10.58 | 7: iteration 131700/ 173500 | consumed samples: 33715200 | consumed tokens: 69048729600 | elapsed time per iteration (s): 0.10 | learning rate: 4.505E-05 | global batch size: 256 | lm loss: 4.511673E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.392 | TFLOPs: 9.59 | 7: iteration 131710/ 173500 | consumed samples: 33717760 | consumed tokens: 69053972480 | elapsed time per iteration (s): 0.10 | learning rate: 4.504E-05 | global batch size: 256 | lm loss: 4.516434E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2680.457 | TFLOPs: 9.97 | 7: iteration 131720/ 173500 | consumed samples: 33720320 | consumed tokens: 69059215360 | elapsed time per iteration (s): 0.09 | learning rate: 4.502E-05 | global batch size: 256 | lm loss: 4.498182E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.412 | TFLOPs: 10.61 | 7: iteration 131730/ 173500 | consumed samples: 33722880 | consumed tokens: 69064458240 | elapsed time per iteration (s): 0.08 | learning rate: 4.501E-05 | global batch size: 256 | lm loss: 4.505561E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.445 | TFLOPs: 11.82 | 7: iteration 131740/ 173500 | consumed samples: 33725440 | consumed tokens: 69069701120 | elapsed time per iteration (s): 0.10 | learning rate: 4.500E-05 | global batch size: 256 | lm loss: 4.521336E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2478.537 | TFLOPs: 9.22 | 7: iteration 131750/ 173500 | consumed samples: 33728000 | consumed tokens: 69074944000 | elapsed time per iteration (s): 0.09 | learning rate: 4.499E-05 | global batch size: 256 | lm loss: 4.514562E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2716.161 | TFLOPs: 10.10 | 7: iteration 131760/ 173500 | consumed samples: 33730560 | consumed tokens: 69080186880 | elapsed time per iteration (s): 0.09 | learning rate: 4.498E-05 | global batch size: 256 | lm loss: 4.516308E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2815.177 | TFLOPs: 10.47 | 7: iteration 131770/ 173500 | consumed samples: 33733120 | consumed tokens: 69085429760 | elapsed time per iteration (s): 0.09 | learning rate: 4.497E-05 | global batch size: 256 | lm loss: 4.512922E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.276 | TFLOPs: 10.70 | 7: iteration 131780/ 173500 | consumed samples: 33735680 | consumed tokens: 69090672640 | elapsed time per iteration (s): 0.08 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 4.510555E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.413 | TFLOPs: 11.86 | 7: iteration 131790/ 173500 | consumed samples: 33738240 | consumed tokens: 69095915520 | elapsed time per iteration (s): 0.09 | learning rate: 4.494E-05 | global batch size: 256 | lm loss: 4.501722E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2784.468 | TFLOPs: 10.36 | 7: iteration 131800/ 173500 | consumed samples: 33740800 | consumed tokens: 69101158400 | elapsed time per iteration (s): 0.08 | learning rate: 4.493E-05 | global batch size: 256 | lm loss: 4.506429E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.561 | TFLOPs: 11.38 | 7: iteration 131810/ 173500 | consumed samples: 33743360 | consumed tokens: 69106401280 | elapsed time per iteration (s): 0.09 | learning rate: 4.492E-05 | global batch size: 256 | lm loss: 4.506593E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2905.874 | TFLOPs: 10.81 | 7: iteration 131820/ 173500 | consumed samples: 33745920 | consumed tokens: 69111644160 | elapsed time per iteration (s): 0.09 | learning rate: 4.491E-05 | global batch size: 256 | lm loss: 4.507520E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.604 | TFLOPs: 10.43 | 7: iteration 131830/ 173500 | consumed samples: 33748480 | consumed tokens: 69116887040 | elapsed time per iteration (s): 0.09 | learning rate: 4.490E-05 | global batch size: 256 | lm loss: 4.499675E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.040 | TFLOPs: 10.63 | 7: iteration 131840/ 173500 | consumed samples: 33751040 | consumed tokens: 69122129920 | elapsed time per iteration (s): 0.08 | learning rate: 4.489E-05 | global batch size: 256 | lm loss: 4.493189E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.446 | TFLOPs: 11.82 | 7: iteration 131850/ 173500 | consumed samples: 33753600 | consumed tokens: 69127372800 | elapsed time per iteration (s): 0.08 | learning rate: 4.488E-05 | global batch size: 256 | lm loss: 4.514477E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.325 | TFLOPs: 11.54 | 7: iteration 131860/ 173500 | consumed samples: 33756160 | consumed tokens: 69132615680 | elapsed time per iteration (s): 0.09 | learning rate: 4.486E-05 | global batch size: 256 | lm loss: 4.506915E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.241 | TFLOPs: 11.09 | 7: iteration 131870/ 173500 | consumed samples: 33758720 | consumed tokens: 69137858560 | elapsed time per iteration (s): 0.13 | learning rate: 4.485E-05 | global batch size: 256 | lm loss: 4.518582E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1915.654 | TFLOPs: 7.13 | 7: iteration 131880/ 173500 | consumed samples: 33761280 | consumed tokens: 69143101440 | elapsed time per iteration (s): 0.15 | learning rate: 4.484E-05 | global batch size: 256 | lm loss: 4.508720E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1716.902 | TFLOPs: 6.39 | 7: iteration 131890/ 173500 | consumed samples: 33763840 | consumed tokens: 69148344320 | elapsed time per iteration (s): 0.13 | learning rate: 4.483E-05 | global batch size: 256 | lm loss: 4.500509E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1984.001 | TFLOPs: 7.38 | 7: iteration 131900/ 173500 | consumed samples: 33766400 | consumed tokens: 69153587200 | elapsed time per iteration (s): 0.16 | learning rate: 4.482E-05 | global batch size: 256 | lm loss: 4.504638E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1606.481 | TFLOPs: 5.98 | 7: iteration 131910/ 173500 | consumed samples: 33768960 | consumed tokens: 69158830080 | elapsed time per iteration (s): 0.14 | learning rate: 4.481E-05 | global batch size: 256 | lm loss: 4.515425E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1782.024 | TFLOPs: 6.63 | 7: iteration 131920/ 173500 | consumed samples: 33771520 | consumed tokens: 69164072960 | elapsed time per iteration (s): 0.11 | learning rate: 4.480E-05 | global batch size: 256 | lm loss: 4.504002E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2267.181 | TFLOPs: 8.43 | 7: iteration 131930/ 173500 | consumed samples: 33774080 | consumed tokens: 69169315840 | elapsed time per iteration (s): 0.10 | learning rate: 4.478E-05 | global batch size: 256 | lm loss: 4.507770E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.528 | TFLOPs: 9.48 | 7: iteration 131940/ 173500 | consumed samples: 33776640 | consumed tokens: 69174558720 | elapsed time per iteration (s): 0.10 | learning rate: 4.477E-05 | global batch size: 256 | lm loss: 4.494476E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.221 | TFLOPs: 9.37 | 7: iteration 131950/ 173500 | consumed samples: 33779200 | consumed tokens: 69179801600 | elapsed time per iteration (s): 0.11 | learning rate: 4.476E-05 | global batch size: 256 | lm loss: 4.500930E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.020 | TFLOPs: 8.95 | 7: iteration 131960/ 173500 | consumed samples: 33781760 | consumed tokens: 69185044480 | elapsed time per iteration (s): 0.10 | learning rate: 4.475E-05 | global batch size: 256 | lm loss: 4.524303E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2549.841 | TFLOPs: 9.48 | 7: iteration 131970/ 173500 | consumed samples: 33784320 | consumed tokens: 69190287360 | elapsed time per iteration (s): 0.10 | learning rate: 4.474E-05 | global batch size: 256 | lm loss: 4.510355E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2521.564 | TFLOPs: 9.38 | 7: iteration 131980/ 173500 | consumed samples: 33786880 | consumed tokens: 69195530240 | elapsed time per iteration (s): 0.08 | learning rate: 4.473E-05 | global batch size: 256 | lm loss: 4.517176E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.642 | TFLOPs: 11.64 | 7: iteration 131990/ 173500 | consumed samples: 33789440 | consumed tokens: 69200773120 | elapsed time per iteration (s): 0.08 | learning rate: 4.472E-05 | global batch size: 256 | lm loss: 4.508401E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.642 | TFLOPs: 12.01 | 0: [2023-03-17 03:28:19,803] [INFO] [logging.py:68:log_dist] [Rank 0] step=132000, skipped=0, lr=[4.4705599266134565e-05, 4.4705599266134565e-05, 4.4705599266134565e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 132000/ 173500 | consumed samples: 33792000 | consumed tokens: 69206016000 | elapsed time per iteration (s): 0.09 | learning rate: 4.471E-05 | global batch size: 256 | lm loss: 4.499881E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2816.495 | TFLOPs: 10.48 | 0: steps: 132000 loss: 4.5157 iter time (s): 0.097 samples/sec: 2644.040 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 132000 | lm loss value: 4.396753E+00 | lm loss PPL: 8.118685E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 132000 to checkpoints_14m91b100m 0: [2023-03-17 03:28:19,885] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step132000 is begin to save! 0: [2023-03-17 03:28:19,888] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:28:19,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:28:19,915] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:28:19,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:28:19,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:28:19,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:28:19,922] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:28:19,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:28:19,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:28:19,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:28:19,927] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:28:19,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:28:19,929] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step132000/mp_rank_00_model_states.pt 0: [2023-03-17 03:28:19,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:28:19,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:28:19,947] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:28:19,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,956] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,956] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 4: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 7: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 1: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 3: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 6: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 5: [2023-03-17 03:28:19,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 2: [2023-03-17 03:28:19,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:28:19,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step132000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:28:19,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step132000 is ready now! 0: successfully saved checkpoint at iteration 132000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.04 7: iteration 132010/ 173500 | consumed samples: 33794560 | consumed tokens: 69211258880 | elapsed time per iteration (s): 0.12 | learning rate: 4.469E-05 | global batch size: 256 | lm loss: 4.511521E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.250 | TFLOPs: 7.70 | 7: iteration 132020/ 173500 | consumed samples: 33797120 | consumed tokens: 69216501760 | elapsed time per iteration (s): 0.08 | learning rate: 4.468E-05 | global batch size: 256 | lm loss: 4.514236E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.797 | TFLOPs: 11.97 | 7: iteration 132030/ 173500 | consumed samples: 33799680 | consumed tokens: 69221744640 | elapsed time per iteration (s): 0.08 | learning rate: 4.467E-05 | global batch size: 256 | lm loss: 4.506592E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.742 | TFLOPs: 11.84 | 7: iteration 132040/ 173500 | consumed samples: 33802240 | consumed tokens: 69226987520 | elapsed time per iteration (s): 0.08 | learning rate: 4.466E-05 | global batch size: 256 | lm loss: 4.498922E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.800 | TFLOPs: 11.85 | 7: iteration 132050/ 173500 | consumed samples: 33804800 | consumed tokens: 69232230400 | elapsed time per iteration (s): 0.08 | learning rate: 4.465E-05 | global batch size: 256 | lm loss: 4.513132E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.181 | TFLOPs: 11.43 | 7: iteration 132060/ 173500 | consumed samples: 33807360 | consumed tokens: 69237473280 | elapsed time per iteration (s): 0.08 | learning rate: 4.464E-05 | global batch size: 256 | lm loss: 4.511987E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.587 | TFLOPs: 12.02 | 7: iteration 132070/ 173500 | consumed samples: 33809920 | consumed tokens: 69242716160 | elapsed time per iteration (s): 0.08 | learning rate: 4.463E-05 | global batch size: 256 | lm loss: 4.521201E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.679 | TFLOPs: 11.96 | 7: iteration 132080/ 173500 | consumed samples: 33812480 | consumed tokens: 69247959040 | elapsed time per iteration (s): 0.08 | learning rate: 4.462E-05 | global batch size: 256 | lm loss: 4.508691E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.615 | TFLOPs: 11.95 | 7: iteration 132090/ 173500 | consumed samples: 33815040 | consumed tokens: 69253201920 | elapsed time per iteration (s): 0.10 | learning rate: 4.460E-05 | global batch size: 256 | lm loss: 4.519792E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.448 | TFLOPs: 10.02 | 7: iteration 132100/ 173500 | consumed samples: 33817600 | consumed tokens: 69258444800 | elapsed time per iteration (s): 0.09 | learning rate: 4.459E-05 | global batch size: 256 | lm loss: 4.510507E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.713 | TFLOPs: 10.98 | 7: iteration 132110/ 173500 | consumed samples: 33820160 | consumed tokens: 69263687680 | elapsed time per iteration (s): 0.10 | learning rate: 4.458E-05 | global batch size: 256 | lm loss: 4.506309E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2609.372 | TFLOPs: 9.71 | 7: iteration 132120/ 173500 | consumed samples: 33822720 | consumed tokens: 69268930560 | elapsed time per iteration (s): 0.11 | learning rate: 4.457E-05 | global batch size: 256 | lm loss: 4.509571E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2315.597 | TFLOPs: 8.61 | 7: iteration 132130/ 173500 | consumed samples: 33825280 | consumed tokens: 69274173440 | elapsed time per iteration (s): 0.10 | learning rate: 4.456E-05 | global batch size: 256 | lm loss: 4.499468E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2553.802 | TFLOPs: 9.50 | 7: iteration 132140/ 173500 | consumed samples: 33827840 | consumed tokens: 69279416320 | elapsed time per iteration (s): 0.09 | learning rate: 4.455E-05 | global batch size: 256 | lm loss: 4.511658E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.452 | TFLOPs: 10.70 | 7: iteration 132150/ 173500 | consumed samples: 33830400 | consumed tokens: 69284659200 | elapsed time per iteration (s): 0.11 | learning rate: 4.454E-05 | global batch size: 256 | lm loss: 4.510212E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2265.877 | TFLOPs: 8.43 | 7: iteration 132160/ 173500 | consumed samples: 33832960 | consumed tokens: 69289902080 | elapsed time per iteration (s): 0.09 | learning rate: 4.452E-05 | global batch size: 256 | lm loss: 4.512994E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2715.592 | TFLOPs: 10.10 | 7: iteration 132170/ 173500 | consumed samples: 33835520 | consumed tokens: 69295144960 | elapsed time per iteration (s): 0.08 | learning rate: 4.451E-05 | global batch size: 256 | lm loss: 4.505985E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.153 | TFLOPs: 12.00 | 7: iteration 132180/ 173500 | consumed samples: 33838080 | consumed tokens: 69300387840 | elapsed time per iteration (s): 0.24 | learning rate: 4.450E-05 | global batch size: 256 | lm loss: 4.501483E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1051.071 | TFLOPs: 3.91 | 7: iteration 132190/ 173500 | consumed samples: 33840640 | consumed tokens: 69305630720 | elapsed time per iteration (s): 0.10 | learning rate: 4.449E-05 | global batch size: 256 | lm loss: 4.524552E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2514.955 | TFLOPs: 9.35 | 7: iteration 132200/ 173500 | consumed samples: 33843200 | consumed tokens: 69310873600 | elapsed time per iteration (s): 0.10 | learning rate: 4.448E-05 | global batch size: 256 | lm loss: 4.519030E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2594.994 | TFLOPs: 9.65 | 7: iteration 132210/ 173500 | consumed samples: 33845760 | consumed tokens: 69316116480 | elapsed time per iteration (s): 0.08 | learning rate: 4.447E-05 | global batch size: 256 | lm loss: 4.497369E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.822 | TFLOPs: 11.41 | 7: iteration 132220/ 173500 | consumed samples: 33848320 | consumed tokens: 69321359360 | elapsed time per iteration (s): 0.08 | learning rate: 4.446E-05 | global batch size: 256 | lm loss: 4.498677E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.914 | TFLOPs: 12.01 | 7: iteration 132230/ 173500 | consumed samples: 33850880 | consumed tokens: 69326602240 | elapsed time per iteration (s): 0.08 | learning rate: 4.445E-05 | global batch size: 256 | lm loss: 4.508954E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.767 | TFLOPs: 11.27 | 7: iteration 132240/ 173500 | consumed samples: 33853440 | consumed tokens: 69331845120 | elapsed time per iteration (s): 0.11 | learning rate: 4.443E-05 | global batch size: 256 | lm loss: 4.518565E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.688 | TFLOPs: 8.78 | 7: iteration 132250/ 173500 | consumed samples: 33856000 | consumed tokens: 69337088000 | elapsed time per iteration (s): 0.08 | learning rate: 4.442E-05 | global batch size: 256 | lm loss: 4.519838E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.909 | TFLOPs: 11.79 | 7: iteration 132260/ 173500 | consumed samples: 33858560 | consumed tokens: 69342330880 | elapsed time per iteration (s): 0.09 | learning rate: 4.441E-05 | global batch size: 256 | lm loss: 4.504764E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.518 | TFLOPs: 10.94 | 7: iteration 132270/ 173500 | consumed samples: 33861120 | consumed tokens: 69347573760 | elapsed time per iteration (s): 0.08 | learning rate: 4.440E-05 | global batch size: 256 | lm loss: 4.511972E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.998 | TFLOPs: 11.92 | 7: iteration 132280/ 173500 | consumed samples: 33863680 | consumed tokens: 69352816640 | elapsed time per iteration (s): 0.08 | learning rate: 4.439E-05 | global batch size: 256 | lm loss: 4.514637E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.958 | TFLOPs: 11.86 | 7: iteration 132290/ 173500 | consumed samples: 33866240 | consumed tokens: 69358059520 | elapsed time per iteration (s): 0.09 | learning rate: 4.438E-05 | global batch size: 256 | lm loss: 4.498872E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.616 | TFLOPs: 10.54 | 7: iteration 132300/ 173500 | consumed samples: 33868800 | consumed tokens: 69363302400 | elapsed time per iteration (s): 0.11 | learning rate: 4.437E-05 | global batch size: 256 | lm loss: 4.513607E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2282.968 | TFLOPs: 8.49 | 7: iteration 132310/ 173500 | consumed samples: 33871360 | consumed tokens: 69368545280 | elapsed time per iteration (s): 0.11 | learning rate: 4.436E-05 | global batch size: 256 | lm loss: 4.498158E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.142 | TFLOPs: 8.91 | 7: iteration 132320/ 173500 | consumed samples: 33873920 | consumed tokens: 69373788160 | elapsed time per iteration (s): 0.08 | learning rate: 4.434E-05 | global batch size: 256 | lm loss: 4.505536E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.051 | TFLOPs: 11.88 | 7: iteration 132330/ 173500 | consumed samples: 33876480 | consumed tokens: 69379031040 | elapsed time per iteration (s): 0.09 | learning rate: 4.433E-05 | global batch size: 256 | lm loss: 4.499103E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.109 | TFLOPs: 11.17 | 7: iteration 132340/ 173500 | consumed samples: 33879040 | consumed tokens: 69384273920 | elapsed time per iteration (s): 0.09 | learning rate: 4.432E-05 | global batch size: 256 | lm loss: 4.502451E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.795 | TFLOPs: 10.47 | 7: iteration 132350/ 173500 | consumed samples: 33881600 | consumed tokens: 69389516800 | elapsed time per iteration (s): 0.10 | learning rate: 4.431E-05 | global batch size: 256 | lm loss: 4.509001E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.070 | TFLOPs: 9.74 | 7: iteration 132360/ 173500 | consumed samples: 33884160 | consumed tokens: 69394759680 | elapsed time per iteration (s): 0.08 | learning rate: 4.430E-05 | global batch size: 256 | lm loss: 4.518992E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.851 | TFLOPs: 11.51 | 7: iteration 132370/ 173500 | consumed samples: 33886720 | consumed tokens: 69400002560 | elapsed time per iteration (s): 0.08 | learning rate: 4.429E-05 | global batch size: 256 | lm loss: 4.514915E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.835 | TFLOPs: 11.75 | 7: iteration 132380/ 173500 | consumed samples: 33889280 | consumed tokens: 69405245440 | elapsed time per iteration (s): 0.10 | learning rate: 4.428E-05 | global batch size: 256 | lm loss: 4.509034E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.696 | TFLOPs: 9.73 | 7: iteration 132390/ 173500 | consumed samples: 33891840 | consumed tokens: 69410488320 | elapsed time per iteration (s): 0.08 | learning rate: 4.427E-05 | global batch size: 256 | lm loss: 4.506214E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.499 | TFLOPs: 11.76 | 7: iteration 132400/ 173500 | consumed samples: 33894400 | consumed tokens: 69415731200 | elapsed time per iteration (s): 0.08 | learning rate: 4.425E-05 | global batch size: 256 | lm loss: 4.506328E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.816 | TFLOPs: 11.68 | 7: iteration 132410/ 173500 | consumed samples: 33896960 | consumed tokens: 69420974080 | elapsed time per iteration (s): 0.08 | learning rate: 4.424E-05 | global batch size: 256 | lm loss: 4.493677E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.269 | TFLOPs: 11.66 | 7: iteration 132420/ 173500 | consumed samples: 33899520 | consumed tokens: 69426216960 | elapsed time per iteration (s): 0.08 | learning rate: 4.423E-05 | global batch size: 256 | lm loss: 4.525456E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.502 | TFLOPs: 11.77 | 7: iteration 132430/ 173500 | consumed samples: 33902080 | consumed tokens: 69431459840 | elapsed time per iteration (s): 0.08 | learning rate: 4.422E-05 | global batch size: 256 | lm loss: 4.496484E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.245 | TFLOPs: 11.42 | 7: iteration 132440/ 173500 | consumed samples: 33904640 | consumed tokens: 69436702720 | elapsed time per iteration (s): 0.08 | learning rate: 4.421E-05 | global batch size: 256 | lm loss: 4.500545E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.561 | TFLOPs: 11.82 | 7: iteration 132450/ 173500 | consumed samples: 33907200 | consumed tokens: 69441945600 | elapsed time per iteration (s): 0.09 | learning rate: 4.420E-05 | global batch size: 256 | lm loss: 4.509563E+00 | grad norm: 0.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.817 | TFLOPs: 10.73 | 7: iteration 132460/ 173500 | consumed samples: 33909760 | consumed tokens: 69447188480 | elapsed time per iteration (s): 0.09 | learning rate: 4.419E-05 | global batch size: 256 | lm loss: 4.502144E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2781.426 | TFLOPs: 10.35 | 7: iteration 132470/ 173500 | consumed samples: 33912320 | consumed tokens: 69452431360 | elapsed time per iteration (s): 0.12 | learning rate: 4.418E-05 | global batch size: 256 | lm loss: 4.495105E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2170.219 | TFLOPs: 8.07 | 7: iteration 132480/ 173500 | consumed samples: 33914880 | consumed tokens: 69457674240 | elapsed time per iteration (s): 0.08 | learning rate: 4.416E-05 | global batch size: 256 | lm loss: 4.520277E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.789 | TFLOPs: 11.73 | 7: iteration 132490/ 173500 | consumed samples: 33917440 | consumed tokens: 69462917120 | elapsed time per iteration (s): 0.09 | learning rate: 4.415E-05 | global batch size: 256 | lm loss: 4.505939E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.610 | TFLOPs: 10.84 | 7: iteration 132500/ 173500 | consumed samples: 33920000 | consumed tokens: 69468160000 | elapsed time per iteration (s): 0.10 | learning rate: 4.414E-05 | global batch size: 256 | lm loss: 4.515715E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.613 | TFLOPs: 9.82 | 7: iteration 132510/ 173500 | consumed samples: 33922560 | consumed tokens: 69473402880 | elapsed time per iteration (s): 0.08 | learning rate: 4.413E-05 | global batch size: 256 | lm loss: 4.524747E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.580 | TFLOPs: 11.67 | 7: iteration 132520/ 173500 | consumed samples: 33925120 | consumed tokens: 69478645760 | elapsed time per iteration (s): 0.08 | learning rate: 4.412E-05 | global batch size: 256 | lm loss: 4.499112E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.657 | TFLOPs: 11.67 | 7: iteration 132530/ 173500 | consumed samples: 33927680 | consumed tokens: 69483888640 | elapsed time per iteration (s): 0.08 | learning rate: 4.411E-05 | global batch size: 256 | lm loss: 4.509164E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.410 | TFLOPs: 11.88 | 7: iteration 132540/ 173500 | consumed samples: 33930240 | consumed tokens: 69489131520 | elapsed time per iteration (s): 0.08 | learning rate: 4.410E-05 | global batch size: 256 | lm loss: 4.495795E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.362 | TFLOPs: 11.56 | 7: iteration 132550/ 173500 | consumed samples: 33932800 | consumed tokens: 69494374400 | elapsed time per iteration (s): 0.09 | learning rate: 4.409E-05 | global batch size: 256 | lm loss: 4.512371E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.464 | TFLOPs: 10.69 | 7: iteration 132560/ 173500 | consumed samples: 33935360 | consumed tokens: 69499617280 | elapsed time per iteration (s): 0.09 | learning rate: 4.407E-05 | global batch size: 256 | lm loss: 4.497881E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.776 | TFLOPs: 10.86 | 7: iteration 132570/ 173500 | consumed samples: 33937920 | consumed tokens: 69504860160 | elapsed time per iteration (s): 0.10 | learning rate: 4.406E-05 | global batch size: 256 | lm loss: 4.500302E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2684.157 | TFLOPs: 9.98 | 7: iteration 132580/ 173500 | consumed samples: 33940480 | consumed tokens: 69510103040 | elapsed time per iteration (s): 0.10 | learning rate: 4.405E-05 | global batch size: 256 | lm loss: 4.510600E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2646.050 | TFLOPs: 9.84 | 7: iteration 132590/ 173500 | consumed samples: 33943040 | consumed tokens: 69515345920 | elapsed time per iteration (s): 0.12 | learning rate: 4.404E-05 | global batch size: 256 | lm loss: 4.514674E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2136.544 | TFLOPs: 7.95 | 7: iteration 132600/ 173500 | consumed samples: 33945600 | consumed tokens: 69520588800 | elapsed time per iteration (s): 0.16 | learning rate: 4.403E-05 | global batch size: 256 | lm loss: 4.506705E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1651.239 | TFLOPs: 6.14 | 7: iteration 132610/ 173500 | consumed samples: 33948160 | consumed tokens: 69525831680 | elapsed time per iteration (s): 0.15 | learning rate: 4.402E-05 | global batch size: 256 | lm loss: 4.499886E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1689.765 | TFLOPs: 6.29 | 7: iteration 132620/ 173500 | consumed samples: 33950720 | consumed tokens: 69531074560 | elapsed time per iteration (s): 0.11 | learning rate: 4.401E-05 | global batch size: 256 | lm loss: 4.497262E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2317.813 | TFLOPs: 8.62 | 7: iteration 132630/ 173500 | consumed samples: 33953280 | consumed tokens: 69536317440 | elapsed time per iteration (s): 0.11 | learning rate: 4.400E-05 | global batch size: 256 | lm loss: 4.504047E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2393.032 | TFLOPs: 8.90 | 7: iteration 132640/ 173500 | consumed samples: 33955840 | consumed tokens: 69541560320 | elapsed time per iteration (s): 0.13 | learning rate: 4.399E-05 | global batch size: 256 | lm loss: 4.509391E+00 | grad norm: 0.333 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2022.728 | TFLOPs: 7.52 | 7: iteration 132650/ 173500 | consumed samples: 33958400 | consumed tokens: 69546803200 | elapsed time per iteration (s): 0.10 | learning rate: 4.397E-05 | global batch size: 256 | lm loss: 4.502436E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2553.732 | TFLOPs: 9.50 | 7: iteration 132660/ 173500 | consumed samples: 33960960 | consumed tokens: 69552046080 | elapsed time per iteration (s): 0.10 | learning rate: 4.396E-05 | global batch size: 256 | lm loss: 4.518343E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2638.862 | TFLOPs: 9.82 | 7: iteration 132670/ 173500 | consumed samples: 33963520 | consumed tokens: 69557288960 | elapsed time per iteration (s): 0.08 | learning rate: 4.395E-05 | global batch size: 256 | lm loss: 4.501958E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.763 | TFLOPs: 11.55 | 7: iteration 132680/ 173500 | consumed samples: 33966080 | consumed tokens: 69562531840 | elapsed time per iteration (s): 0.08 | learning rate: 4.394E-05 | global batch size: 256 | lm loss: 4.512781E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.134 | TFLOPs: 11.69 | 7: iteration 132690/ 173500 | consumed samples: 33968640 | consumed tokens: 69567774720 | elapsed time per iteration (s): 0.08 | learning rate: 4.393E-05 | global batch size: 256 | lm loss: 4.515881E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.599 | TFLOPs: 11.44 | 7: iteration 132700/ 173500 | consumed samples: 33971200 | consumed tokens: 69573017600 | elapsed time per iteration (s): 0.14 | learning rate: 4.392E-05 | global batch size: 256 | lm loss: 4.502051E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1825.196 | TFLOPs: 6.79 | 7: iteration 132710/ 173500 | consumed samples: 33973760 | consumed tokens: 69578260480 | elapsed time per iteration (s): 0.10 | learning rate: 4.391E-05 | global batch size: 256 | lm loss: 4.516008E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2555.187 | TFLOPs: 9.50 | 7: iteration 132720/ 173500 | consumed samples: 33976320 | consumed tokens: 69583503360 | elapsed time per iteration (s): 0.10 | learning rate: 4.390E-05 | global batch size: 256 | lm loss: 4.516467E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.873 | TFLOPs: 9.09 | 7: iteration 132730/ 173500 | consumed samples: 33978880 | consumed tokens: 69588746240 | elapsed time per iteration (s): 0.08 | learning rate: 4.388E-05 | global batch size: 256 | lm loss: 4.513658E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.706 | TFLOPs: 11.50 | 7: iteration 132740/ 173500 | consumed samples: 33981440 | consumed tokens: 69593989120 | elapsed time per iteration (s): 0.10 | learning rate: 4.387E-05 | global batch size: 256 | lm loss: 4.513860E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.206 | TFLOPs: 10.02 | 7: iteration 132750/ 173500 | consumed samples: 33984000 | consumed tokens: 69599232000 | elapsed time per iteration (s): 0.12 | learning rate: 4.386E-05 | global batch size: 256 | lm loss: 4.512246E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2221.549 | TFLOPs: 8.26 | 7: iteration 132760/ 173500 | consumed samples: 33986560 | consumed tokens: 69604474880 | elapsed time per iteration (s): 0.08 | learning rate: 4.385E-05 | global batch size: 256 | lm loss: 4.511127E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.640 | TFLOPs: 11.69 | 7: iteration 132770/ 173500 | consumed samples: 33989120 | consumed tokens: 69609717760 | elapsed time per iteration (s): 0.10 | learning rate: 4.384E-05 | global batch size: 256 | lm loss: 4.511929E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.435 | TFLOPs: 9.23 | 7: iteration 132780/ 173500 | consumed samples: 33991680 | consumed tokens: 69614960640 | elapsed time per iteration (s): 0.11 | learning rate: 4.383E-05 | global batch size: 256 | lm loss: 4.518797E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2279.457 | TFLOPs: 8.48 | 7: iteration 132790/ 173500 | consumed samples: 33994240 | consumed tokens: 69620203520 | elapsed time per iteration (s): 0.10 | learning rate: 4.382E-05 | global batch size: 256 | lm loss: 4.507553E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2553.652 | TFLOPs: 9.50 | 7: iteration 132800/ 173500 | consumed samples: 33996800 | consumed tokens: 69625446400 | elapsed time per iteration (s): 0.11 | learning rate: 4.381E-05 | global batch size: 256 | lm loss: 4.506281E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2372.133 | TFLOPs: 8.82 | 7: iteration 132810/ 173500 | consumed samples: 33999360 | consumed tokens: 69630689280 | elapsed time per iteration (s): 0.08 | learning rate: 4.380E-05 | global batch size: 256 | lm loss: 4.513653E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.306 | TFLOPs: 11.61 | 7: iteration 132820/ 173500 | consumed samples: 34001920 | consumed tokens: 69635932160 | elapsed time per iteration (s): 0.08 | learning rate: 4.378E-05 | global batch size: 256 | lm loss: 4.510819E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.716 | TFLOPs: 11.75 | 7: iteration 132830/ 173500 | consumed samples: 34004480 | consumed tokens: 69641175040 | elapsed time per iteration (s): 0.09 | learning rate: 4.377E-05 | global batch size: 256 | lm loss: 4.499359E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2939.641 | TFLOPs: 10.93 | 7: iteration 132840/ 173500 | consumed samples: 34007040 | consumed tokens: 69646417920 | elapsed time per iteration (s): 0.08 | learning rate: 4.376E-05 | global batch size: 256 | lm loss: 4.500606E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.190 | TFLOPs: 11.87 | 7: iteration 132850/ 173500 | consumed samples: 34009600 | consumed tokens: 69651660800 | elapsed time per iteration (s): 0.08 | learning rate: 4.375E-05 | global batch size: 256 | lm loss: 4.505812E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.163 | TFLOPs: 11.87 | 7: iteration 132860/ 173500 | consumed samples: 34012160 | consumed tokens: 69656903680 | elapsed time per iteration (s): 0.08 | learning rate: 4.374E-05 | global batch size: 256 | lm loss: 4.510160E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.804 | TFLOPs: 11.90 | 7: iteration 132870/ 173500 | consumed samples: 34014720 | consumed tokens: 69662146560 | elapsed time per iteration (s): 0.08 | learning rate: 4.373E-05 | global batch size: 256 | lm loss: 4.502384E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.820 | TFLOPs: 11.60 | 7: iteration 132880/ 173500 | consumed samples: 34017280 | consumed tokens: 69667389440 | elapsed time per iteration (s): 0.08 | learning rate: 4.372E-05 | global batch size: 256 | lm loss: 4.519533E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.704 | TFLOPs: 11.46 | 7: iteration 132890/ 173500 | consumed samples: 34019840 | consumed tokens: 69672632320 | elapsed time per iteration (s): 0.09 | learning rate: 4.371E-05 | global batch size: 256 | lm loss: 4.507961E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.343 | TFLOPs: 10.09 | 7: iteration 132900/ 173500 | consumed samples: 34022400 | consumed tokens: 69677875200 | elapsed time per iteration (s): 0.08 | learning rate: 4.369E-05 | global batch size: 256 | lm loss: 4.501987E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.721 | TFLOPs: 11.76 | 7: iteration 132910/ 173500 | consumed samples: 34024960 | consumed tokens: 69683118080 | elapsed time per iteration (s): 0.09 | learning rate: 4.368E-05 | global batch size: 256 | lm loss: 4.515409E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.718 | TFLOPs: 10.28 | 7: iteration 132920/ 173500 | consumed samples: 34027520 | consumed tokens: 69688360960 | elapsed time per iteration (s): 0.10 | learning rate: 4.367E-05 | global batch size: 256 | lm loss: 4.508610E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.762 | TFLOPs: 9.74 | 7: iteration 132930/ 173500 | consumed samples: 34030080 | consumed tokens: 69693603840 | elapsed time per iteration (s): 0.15 | learning rate: 4.366E-05 | global batch size: 256 | lm loss: 4.508235E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1744.144 | TFLOPs: 6.49 | 7: iteration 132940/ 173500 | consumed samples: 34032640 | consumed tokens: 69698846720 | elapsed time per iteration (s): 0.13 | learning rate: 4.365E-05 | global batch size: 256 | lm loss: 4.510449E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2005.100 | TFLOPs: 7.46 | 7: iteration 132950/ 173500 | consumed samples: 34035200 | consumed tokens: 69704089600 | elapsed time per iteration (s): 0.12 | learning rate: 4.364E-05 | global batch size: 256 | lm loss: 4.509943E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.490 | TFLOPs: 7.88 | 7: iteration 132960/ 173500 | consumed samples: 34037760 | consumed tokens: 69709332480 | elapsed time per iteration (s): 0.09 | learning rate: 4.363E-05 | global batch size: 256 | lm loss: 4.509477E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2759.237 | TFLOPs: 10.26 | 7: iteration 132970/ 173500 | consumed samples: 34040320 | consumed tokens: 69714575360 | elapsed time per iteration (s): 0.08 | learning rate: 4.362E-05 | global batch size: 256 | lm loss: 4.506973E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.027 | TFLOPs: 11.59 | 7: iteration 132980/ 173500 | consumed samples: 34042880 | consumed tokens: 69719818240 | elapsed time per iteration (s): 0.13 | learning rate: 4.361E-05 | global batch size: 256 | lm loss: 4.518009E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2029.017 | TFLOPs: 7.55 | 7: iteration 132990/ 173500 | consumed samples: 34045440 | consumed tokens: 69725061120 | elapsed time per iteration (s): 0.11 | learning rate: 4.359E-05 | global batch size: 256 | lm loss: 4.505676E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.029 | TFLOPs: 8.80 | 7: iteration 133000/ 173500 | consumed samples: 34048000 | consumed tokens: 69730304000 | elapsed time per iteration (s): 0.11 | learning rate: 4.358E-05 | global batch size: 256 | lm loss: 4.510788E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.730 | TFLOPs: 8.59 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 133000 | lm loss value: 4.358884E+00 | lm loss PPL: 7.816987E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 133000 to checkpoints_14m91b100m 0: [2023-03-17 03:29:56,349] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step133000 is begin to save! 0: [2023-03-17 03:29:56,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:29:56,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:29:56,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:29:56,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:29:56,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:29:56,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:29:56,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:29:56,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:29:56,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:29:56,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:29:56,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:29:56,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:29:56,393] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step133000/mp_rank_00_model_states.pt 0: [2023-03-17 03:29:56,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:29:56,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:29:56,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:29:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:29:56,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:29:56,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 7: [2023-03-17 03:29:56,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 3: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 2: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 6: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 4: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 1: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:29:56,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:29:56,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 5: [2023-03-17 03:29:56,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step133000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:29:56,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step133000 is ready now! 0: successfully saved checkpoint at iteration 133000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 111.78 7: iteration 133010/ 173500 | consumed samples: 34050560 | consumed tokens: 69735546880 | elapsed time per iteration (s): 0.13 | learning rate: 4.357E-05 | global batch size: 256 | lm loss: 4.516859E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2036.056 | TFLOPs: 7.57 | 7: iteration 133020/ 173500 | consumed samples: 34053120 | consumed tokens: 69740789760 | elapsed time per iteration (s): 0.15 | learning rate: 4.356E-05 | global batch size: 256 | lm loss: 4.510719E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1679.611 | TFLOPs: 6.25 | 7: iteration 133030/ 173500 | consumed samples: 34055680 | consumed tokens: 69746032640 | elapsed time per iteration (s): 0.14 | learning rate: 4.355E-05 | global batch size: 256 | lm loss: 4.516450E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1774.828 | TFLOPs: 6.60 | 7: iteration 133040/ 173500 | consumed samples: 34058240 | consumed tokens: 69751275520 | elapsed time per iteration (s): 0.13 | learning rate: 4.354E-05 | global batch size: 256 | lm loss: 4.509169E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.250 | TFLOPs: 7.37 | 7: iteration 133050/ 173500 | consumed samples: 34060800 | consumed tokens: 69756518400 | elapsed time per iteration (s): 0.11 | learning rate: 4.353E-05 | global batch size: 256 | lm loss: 4.519150E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2245.549 | TFLOPs: 8.35 | 7: iteration 133060/ 173500 | consumed samples: 34063360 | consumed tokens: 69761761280 | elapsed time per iteration (s): 0.13 | learning rate: 4.352E-05 | global batch size: 256 | lm loss: 4.513229E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.978 | TFLOPs: 7.26 | 7: iteration 133070/ 173500 | consumed samples: 34065920 | consumed tokens: 69767004160 | elapsed time per iteration (s): 0.13 | learning rate: 4.351E-05 | global batch size: 256 | lm loss: 4.507406E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2033.958 | TFLOPs: 7.57 | 7: iteration 133080/ 173500 | consumed samples: 34068480 | consumed tokens: 69772247040 | elapsed time per iteration (s): 0.10 | learning rate: 4.349E-05 | global batch size: 256 | lm loss: 4.500948E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2527.493 | TFLOPs: 9.40 | 7: iteration 133090/ 173500 | consumed samples: 34071040 | consumed tokens: 69777489920 | elapsed time per iteration (s): 0.08 | learning rate: 4.348E-05 | global batch size: 256 | lm loss: 4.503432E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.750 | TFLOPs: 11.91 | 7: iteration 133100/ 173500 | consumed samples: 34073600 | consumed tokens: 69782732800 | elapsed time per iteration (s): 0.09 | learning rate: 4.347E-05 | global batch size: 256 | lm loss: 4.514562E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.565 | TFLOPs: 10.18 | 7: iteration 133110/ 173500 | consumed samples: 34076160 | consumed tokens: 69787975680 | elapsed time per iteration (s): 0.08 | learning rate: 4.346E-05 | global batch size: 256 | lm loss: 4.498376E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.153 | TFLOPs: 11.20 | 7: iteration 133120/ 173500 | consumed samples: 34078720 | consumed tokens: 69793218560 | elapsed time per iteration (s): 0.09 | learning rate: 4.345E-05 | global batch size: 256 | lm loss: 4.506117E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2982.619 | TFLOPs: 11.09 | 7: iteration 133130/ 173500 | consumed samples: 34081280 | consumed tokens: 69798461440 | elapsed time per iteration (s): 0.09 | learning rate: 4.344E-05 | global batch size: 256 | lm loss: 4.515739E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.578 | TFLOPs: 10.38 | 7: iteration 133140/ 173500 | consumed samples: 34083840 | consumed tokens: 69803704320 | elapsed time per iteration (s): 0.10 | learning rate: 4.343E-05 | global batch size: 256 | lm loss: 4.503975E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.420 | TFLOPs: 9.16 | 7: iteration 133150/ 173500 | consumed samples: 34086400 | consumed tokens: 69808947200 | elapsed time per iteration (s): 0.11 | learning rate: 4.342E-05 | global batch size: 256 | lm loss: 4.495274E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2287.712 | TFLOPs: 8.51 | 7: iteration 133160/ 173500 | consumed samples: 34088960 | consumed tokens: 69814190080 | elapsed time per iteration (s): 0.11 | learning rate: 4.341E-05 | global batch size: 256 | lm loss: 4.514372E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.548 | TFLOPs: 8.37 | 7: iteration 133170/ 173500 | consumed samples: 34091520 | consumed tokens: 69819432960 | elapsed time per iteration (s): 0.08 | learning rate: 4.340E-05 | global batch size: 256 | lm loss: 4.521735E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.871 | TFLOPs: 11.80 | 7: iteration 133180/ 173500 | consumed samples: 34094080 | consumed tokens: 69824675840 | elapsed time per iteration (s): 0.08 | learning rate: 4.338E-05 | global batch size: 256 | lm loss: 4.505424E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.182 | TFLOPs: 11.48 | 7: iteration 133190/ 173500 | consumed samples: 34096640 | consumed tokens: 69829918720 | elapsed time per iteration (s): 0.13 | learning rate: 4.337E-05 | global batch size: 256 | lm loss: 4.517426E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1990.115 | TFLOPs: 7.40 | 7: iteration 133200/ 173500 | consumed samples: 34099200 | consumed tokens: 69835161600 | elapsed time per iteration (s): 0.12 | learning rate: 4.336E-05 | global batch size: 256 | lm loss: 4.508673E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.386 | TFLOPs: 7.86 | 7: iteration 133210/ 173500 | consumed samples: 34101760 | consumed tokens: 69840404480 | elapsed time per iteration (s): 0.13 | learning rate: 4.335E-05 | global batch size: 256 | lm loss: 4.512143E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1925.770 | TFLOPs: 7.16 | 7: iteration 133220/ 173500 | consumed samples: 34104320 | consumed tokens: 69845647360 | elapsed time per iteration (s): 0.11 | learning rate: 4.334E-05 | global batch size: 256 | lm loss: 4.506031E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2423.644 | TFLOPs: 9.01 | 7: iteration 133230/ 173500 | consumed samples: 34106880 | consumed tokens: 69850890240 | elapsed time per iteration (s): 0.08 | learning rate: 4.333E-05 | global batch size: 256 | lm loss: 4.521434E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.364 | TFLOPs: 11.92 | 7: iteration 133240/ 173500 | consumed samples: 34109440 | consumed tokens: 69856133120 | elapsed time per iteration (s): 0.10 | learning rate: 4.332E-05 | global batch size: 256 | lm loss: 4.531198E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2648.442 | TFLOPs: 9.85 | 7: iteration 133250/ 173500 | consumed samples: 34112000 | consumed tokens: 69861376000 | elapsed time per iteration (s): 0.13 | learning rate: 4.331E-05 | global batch size: 256 | lm loss: 4.507787E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1941.807 | TFLOPs: 7.22 | 7: iteration 133260/ 173500 | consumed samples: 34114560 | consumed tokens: 69866618880 | elapsed time per iteration (s): 0.11 | learning rate: 4.330E-05 | global batch size: 256 | lm loss: 4.517591E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.894 | TFLOPs: 8.50 | 7: iteration 133270/ 173500 | consumed samples: 34117120 | consumed tokens: 69871861760 | elapsed time per iteration (s): 0.12 | learning rate: 4.328E-05 | global batch size: 256 | lm loss: 4.504966E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.692 | TFLOPs: 8.01 | 7: iteration 133280/ 173500 | consumed samples: 34119680 | consumed tokens: 69877104640 | elapsed time per iteration (s): 0.10 | learning rate: 4.327E-05 | global batch size: 256 | lm loss: 4.498565E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.206 | TFLOPs: 9.24 | 7: iteration 133290/ 173500 | consumed samples: 34122240 | consumed tokens: 69882347520 | elapsed time per iteration (s): 0.09 | learning rate: 4.326E-05 | global batch size: 256 | lm loss: 4.519614E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.278 | TFLOPs: 10.68 | 7: iteration 133300/ 173500 | consumed samples: 34124800 | consumed tokens: 69887590400 | elapsed time per iteration (s): 0.14 | learning rate: 4.325E-05 | global batch size: 256 | lm loss: 4.518942E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1848.842 | TFLOPs: 6.88 | 7: iteration 133310/ 173500 | consumed samples: 34127360 | consumed tokens: 69892833280 | elapsed time per iteration (s): 0.11 | learning rate: 4.324E-05 | global batch size: 256 | lm loss: 4.495904E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2278.449 | TFLOPs: 8.47 | 7: iteration 133320/ 173500 | consumed samples: 34129920 | consumed tokens: 69898076160 | elapsed time per iteration (s): 0.08 | learning rate: 4.323E-05 | global batch size: 256 | lm loss: 4.503038E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.145 | TFLOPs: 11.52 | 7: iteration 133330/ 173500 | consumed samples: 34132480 | consumed tokens: 69903319040 | elapsed time per iteration (s): 0.10 | learning rate: 4.322E-05 | global batch size: 256 | lm loss: 4.520436E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2542.639 | TFLOPs: 9.46 | 7: iteration 133340/ 173500 | consumed samples: 34135040 | consumed tokens: 69908561920 | elapsed time per iteration (s): 0.08 | learning rate: 4.321E-05 | global batch size: 256 | lm loss: 4.509825E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.668 | TFLOPs: 11.58 | 7: iteration 133350/ 173500 | consumed samples: 34137600 | consumed tokens: 69913804800 | elapsed time per iteration (s): 0.08 | learning rate: 4.320E-05 | global batch size: 256 | lm loss: 4.504369E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.526 | TFLOPs: 11.95 | 7: iteration 133360/ 173500 | consumed samples: 34140160 | consumed tokens: 69919047680 | elapsed time per iteration (s): 0.08 | learning rate: 4.319E-05 | global batch size: 256 | lm loss: 4.520949E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.584 | TFLOPs: 11.92 | 7: iteration 133370/ 173500 | consumed samples: 34142720 | consumed tokens: 69924290560 | elapsed time per iteration (s): 0.11 | learning rate: 4.317E-05 | global batch size: 256 | lm loss: 4.502528E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2422.942 | TFLOPs: 9.01 | 7: iteration 133380/ 173500 | consumed samples: 34145280 | consumed tokens: 69929533440 | elapsed time per iteration (s): 0.13 | learning rate: 4.316E-05 | global batch size: 256 | lm loss: 4.521281E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1966.429 | TFLOPs: 7.31 | 7: iteration 133390/ 173500 | consumed samples: 34147840 | consumed tokens: 69934776320 | elapsed time per iteration (s): 0.11 | learning rate: 4.315E-05 | global batch size: 256 | lm loss: 4.500831E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2287.089 | TFLOPs: 8.51 | 7: iteration 133400/ 173500 | consumed samples: 34150400 | consumed tokens: 69940019200 | elapsed time per iteration (s): 0.11 | learning rate: 4.314E-05 | global batch size: 256 | lm loss: 4.525234E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.709 | TFLOPs: 8.56 | 7: iteration 133410/ 173500 | consumed samples: 34152960 | consumed tokens: 69945262080 | elapsed time per iteration (s): 0.12 | learning rate: 4.313E-05 | global batch size: 256 | lm loss: 4.506538E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.117 | TFLOPs: 7.88 | 7: iteration 133420/ 173500 | consumed samples: 34155520 | consumed tokens: 69950504960 | elapsed time per iteration (s): 0.12 | learning rate: 4.312E-05 | global batch size: 256 | lm loss: 4.507879E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2077.622 | TFLOPs: 7.73 | 7: iteration 133430/ 173500 | consumed samples: 34158080 | consumed tokens: 69955747840 | elapsed time per iteration (s): 0.11 | learning rate: 4.311E-05 | global batch size: 256 | lm loss: 4.505547E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2246.209 | TFLOPs: 8.35 | 7: iteration 133440/ 173500 | consumed samples: 34160640 | consumed tokens: 69960990720 | elapsed time per iteration (s): 0.12 | learning rate: 4.310E-05 | global batch size: 256 | lm loss: 4.510640E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.853 | TFLOPs: 8.15 | 7: iteration 133450/ 173500 | consumed samples: 34163200 | consumed tokens: 69966233600 | elapsed time per iteration (s): 0.11 | learning rate: 4.309E-05 | global batch size: 256 | lm loss: 4.507154E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.702 | TFLOPs: 8.50 | 7: iteration 133460/ 173500 | consumed samples: 34165760 | consumed tokens: 69971476480 | elapsed time per iteration (s): 0.08 | learning rate: 4.308E-05 | global batch size: 256 | lm loss: 4.521867E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.829 | TFLOPs: 11.32 | 7: iteration 133470/ 173500 | consumed samples: 34168320 | consumed tokens: 69976719360 | elapsed time per iteration (s): 0.08 | learning rate: 4.306E-05 | global batch size: 256 | lm loss: 4.522316E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.480 | TFLOPs: 11.59 | 7: iteration 133480/ 173500 | consumed samples: 34170880 | consumed tokens: 69981962240 | elapsed time per iteration (s): 0.09 | learning rate: 4.305E-05 | global batch size: 256 | lm loss: 4.513181E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.372 | TFLOPs: 10.57 | 7: iteration 133490/ 173500 | consumed samples: 34173440 | consumed tokens: 69987205120 | elapsed time per iteration (s): 0.11 | learning rate: 4.304E-05 | global batch size: 256 | lm loss: 4.498582E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2240.149 | TFLOPs: 8.33 | 7: iteration 133500/ 173500 | consumed samples: 34176000 | consumed tokens: 69992448000 | elapsed time per iteration (s): 0.09 | learning rate: 4.303E-05 | global batch size: 256 | lm loss: 4.514586E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.548 | TFLOPs: 10.30 | 7: iteration 133510/ 173500 | consumed samples: 34178560 | consumed tokens: 69997690880 | elapsed time per iteration (s): 0.12 | learning rate: 4.302E-05 | global batch size: 256 | lm loss: 4.513547E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2133.346 | TFLOPs: 7.94 | 7: iteration 133520/ 173500 | consumed samples: 34181120 | consumed tokens: 70002933760 | elapsed time per iteration (s): 0.11 | learning rate: 4.301E-05 | global batch size: 256 | lm loss: 4.522586E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2417.369 | TFLOPs: 8.99 | 7: iteration 133530/ 173500 | consumed samples: 34183680 | consumed tokens: 70008176640 | elapsed time per iteration (s): 0.14 | learning rate: 4.300E-05 | global batch size: 256 | lm loss: 4.513331E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1839.010 | TFLOPs: 6.84 | 7: iteration 133540/ 173500 | consumed samples: 34186240 | consumed tokens: 70013419520 | elapsed time per iteration (s): 0.11 | learning rate: 4.299E-05 | global batch size: 256 | lm loss: 4.508921E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2274.541 | TFLOPs: 8.46 | 7: iteration 133550/ 173500 | consumed samples: 34188800 | consumed tokens: 70018662400 | elapsed time per iteration (s): 0.11 | learning rate: 4.298E-05 | global batch size: 256 | lm loss: 4.518048E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2426.241 | TFLOPs: 9.02 | 7: iteration 133560/ 173500 | consumed samples: 34191360 | consumed tokens: 70023905280 | elapsed time per iteration (s): 0.11 | learning rate: 4.297E-05 | global batch size: 256 | lm loss: 4.518668E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2328.168 | TFLOPs: 8.66 | 7: iteration 133570/ 173500 | consumed samples: 34193920 | consumed tokens: 70029148160 | elapsed time per iteration (s): 0.13 | learning rate: 4.295E-05 | global batch size: 256 | lm loss: 4.499302E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1900.819 | TFLOPs: 7.07 | 7: iteration 133580/ 173500 | consumed samples: 34196480 | consumed tokens: 70034391040 | elapsed time per iteration (s): 0.10 | learning rate: 4.294E-05 | global batch size: 256 | lm loss: 4.500205E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2507.252 | TFLOPs: 9.33 | 7: iteration 133590/ 173500 | consumed samples: 34199040 | consumed tokens: 70039633920 | elapsed time per iteration (s): 0.08 | learning rate: 4.293E-05 | global batch size: 256 | lm loss: 4.513020E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.141 | TFLOPs: 11.90 | 7: iteration 133600/ 173500 | consumed samples: 34201600 | consumed tokens: 70044876800 | elapsed time per iteration (s): 0.11 | learning rate: 4.292E-05 | global batch size: 256 | lm loss: 4.511539E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2369.846 | TFLOPs: 8.81 | 7: iteration 133610/ 173500 | consumed samples: 34204160 | consumed tokens: 70050119680 | elapsed time per iteration (s): 0.11 | learning rate: 4.291E-05 | global batch size: 256 | lm loss: 4.511436E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2254.770 | TFLOPs: 8.39 | 7: iteration 133620/ 173500 | consumed samples: 34206720 | consumed tokens: 70055362560 | elapsed time per iteration (s): 0.09 | learning rate: 4.290E-05 | global batch size: 256 | lm loss: 4.514030E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2947.549 | TFLOPs: 10.96 | 7: iteration 133630/ 173500 | consumed samples: 34209280 | consumed tokens: 70060605440 | elapsed time per iteration (s): 0.10 | learning rate: 4.289E-05 | global batch size: 256 | lm loss: 4.514676E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.332 | TFLOPs: 9.64 | 7: iteration 133640/ 173500 | consumed samples: 34211840 | consumed tokens: 70065848320 | elapsed time per iteration (s): 0.10 | learning rate: 4.288E-05 | global batch size: 256 | lm loss: 4.504288E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2670.141 | TFLOPs: 9.93 | 7: iteration 133650/ 173500 | consumed samples: 34214400 | consumed tokens: 70071091200 | elapsed time per iteration (s): 0.08 | learning rate: 4.287E-05 | global batch size: 256 | lm loss: 4.519799E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.690 | TFLOPs: 11.93 | 7: iteration 133660/ 173500 | consumed samples: 34216960 | consumed tokens: 70076334080 | elapsed time per iteration (s): 0.09 | learning rate: 4.286E-05 | global batch size: 256 | lm loss: 4.516133E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.657 | TFLOPs: 10.14 | 7: iteration 133670/ 173500 | consumed samples: 34219520 | consumed tokens: 70081576960 | elapsed time per iteration (s): 0.09 | learning rate: 4.284E-05 | global batch size: 256 | lm loss: 4.510708E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.815 | TFLOPs: 10.71 | 7: iteration 133680/ 173500 | consumed samples: 34222080 | consumed tokens: 70086819840 | elapsed time per iteration (s): 0.08 | learning rate: 4.283E-05 | global batch size: 256 | lm loss: 4.504317E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.665 | TFLOPs: 11.63 | 7: iteration 133690/ 173500 | consumed samples: 34224640 | consumed tokens: 70092062720 | elapsed time per iteration (s): 0.10 | learning rate: 4.282E-05 | global batch size: 256 | lm loss: 4.513468E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2640.578 | TFLOPs: 9.82 | 7: iteration 133700/ 173500 | consumed samples: 34227200 | consumed tokens: 70097305600 | elapsed time per iteration (s): 0.13 | learning rate: 4.281E-05 | global batch size: 256 | lm loss: 4.516999E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1952.698 | TFLOPs: 7.26 | 7: iteration 133710/ 173500 | consumed samples: 34229760 | consumed tokens: 70102548480 | elapsed time per iteration (s): 0.12 | learning rate: 4.280E-05 | global batch size: 256 | lm loss: 4.495035E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2138.121 | TFLOPs: 7.95 | 7: iteration 133720/ 173500 | consumed samples: 34232320 | consumed tokens: 70107791360 | elapsed time per iteration (s): 0.14 | learning rate: 4.279E-05 | global batch size: 256 | lm loss: 4.518761E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1794.874 | TFLOPs: 6.68 | 7: iteration 133730/ 173500 | consumed samples: 34234880 | consumed tokens: 70113034240 | elapsed time per iteration (s): 0.14 | learning rate: 4.278E-05 | global batch size: 256 | lm loss: 4.519085E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1814.484 | TFLOPs: 6.75 | 7: iteration 133740/ 173500 | consumed samples: 34237440 | consumed tokens: 70118277120 | elapsed time per iteration (s): 0.12 | learning rate: 4.277E-05 | global batch size: 256 | lm loss: 4.503562E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2179.355 | TFLOPs: 8.11 | 7: iteration 133750/ 173500 | consumed samples: 34240000 | consumed tokens: 70123520000 | elapsed time per iteration (s): 0.13 | learning rate: 4.276E-05 | global batch size: 256 | lm loss: 4.508646E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.622 | TFLOPs: 7.39 | 7: iteration 133760/ 173500 | consumed samples: 34242560 | consumed tokens: 70128762880 | elapsed time per iteration (s): 0.10 | learning rate: 4.275E-05 | global batch size: 256 | lm loss: 4.495659E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.585 | TFLOPs: 9.80 | 7: iteration 133770/ 173500 | consumed samples: 34245120 | consumed tokens: 70134005760 | elapsed time per iteration (s): 0.08 | learning rate: 4.273E-05 | global batch size: 256 | lm loss: 4.510206E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.039 | TFLOPs: 11.68 | 7: iteration 133780/ 173500 | consumed samples: 34247680 | consumed tokens: 70139248640 | elapsed time per iteration (s): 0.09 | learning rate: 4.272E-05 | global batch size: 256 | lm loss: 4.507914E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.337 | TFLOPs: 10.42 | 7: iteration 133790/ 173500 | consumed samples: 34250240 | consumed tokens: 70144491520 | elapsed time per iteration (s): 0.09 | learning rate: 4.271E-05 | global batch size: 256 | lm loss: 4.513271E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2802.037 | TFLOPs: 10.42 | 7: iteration 133800/ 173500 | consumed samples: 34252800 | consumed tokens: 70149734400 | elapsed time per iteration (s): 0.09 | learning rate: 4.270E-05 | global batch size: 256 | lm loss: 4.495877E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.626 | TFLOPs: 10.95 | 7: iteration 133810/ 173500 | consumed samples: 34255360 | consumed tokens: 70154977280 | elapsed time per iteration (s): 0.08 | learning rate: 4.269E-05 | global batch size: 256 | lm loss: 4.511039E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.936 | TFLOPs: 11.84 | 7: iteration 133820/ 173500 | consumed samples: 34257920 | consumed tokens: 70160220160 | elapsed time per iteration (s): 0.08 | learning rate: 4.268E-05 | global batch size: 256 | lm loss: 4.503674E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.487 | TFLOPs: 11.92 | 7: iteration 133830/ 173500 | consumed samples: 34260480 | consumed tokens: 70165463040 | elapsed time per iteration (s): 0.08 | learning rate: 4.267E-05 | global batch size: 256 | lm loss: 4.506376E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.187 | TFLOPs: 11.92 | 7: iteration 133840/ 173500 | consumed samples: 34263040 | consumed tokens: 70170705920 | elapsed time per iteration (s): 0.09 | learning rate: 4.266E-05 | global batch size: 256 | lm loss: 4.508460E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.581 | TFLOPs: 11.17 | 7: iteration 133850/ 173500 | consumed samples: 34265600 | consumed tokens: 70175948800 | elapsed time per iteration (s): 0.09 | learning rate: 4.265E-05 | global batch size: 256 | lm loss: 4.493644E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.035 | TFLOPs: 11.00 | 7: iteration 133860/ 173500 | consumed samples: 34268160 | consumed tokens: 70181191680 | elapsed time per iteration (s): 0.09 | learning rate: 4.264E-05 | global batch size: 256 | lm loss: 4.498786E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2963.463 | TFLOPs: 11.02 | 7: iteration 133870/ 173500 | consumed samples: 34270720 | consumed tokens: 70186434560 | elapsed time per iteration (s): 0.08 | learning rate: 4.263E-05 | global batch size: 256 | lm loss: 4.501877E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.065 | TFLOPs: 11.91 | 7: iteration 133880/ 173500 | consumed samples: 34273280 | consumed tokens: 70191677440 | elapsed time per iteration (s): 0.08 | learning rate: 4.261E-05 | global batch size: 256 | lm loss: 4.516003E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.745 | TFLOPs: 11.47 | 7: iteration 133890/ 173500 | consumed samples: 34275840 | consumed tokens: 70196920320 | elapsed time per iteration (s): 0.08 | learning rate: 4.260E-05 | global batch size: 256 | lm loss: 4.502874E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.750 | TFLOPs: 11.86 | 7: iteration 133900/ 173500 | consumed samples: 34278400 | consumed tokens: 70202163200 | elapsed time per iteration (s): 0.09 | learning rate: 4.259E-05 | global batch size: 256 | lm loss: 4.501114E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.480 | TFLOPs: 10.69 | 7: iteration 133910/ 173500 | consumed samples: 34280960 | consumed tokens: 70207406080 | elapsed time per iteration (s): 0.08 | learning rate: 4.258E-05 | global batch size: 256 | lm loss: 4.489972E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.058 | TFLOPs: 11.91 | 7: iteration 133920/ 173500 | consumed samples: 34283520 | consumed tokens: 70212648960 | elapsed time per iteration (s): 0.13 | learning rate: 4.257E-05 | global batch size: 256 | lm loss: 4.513807E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2003.037 | TFLOPs: 7.45 | 7: iteration 133930/ 173500 | consumed samples: 34286080 | consumed tokens: 70217891840 | elapsed time per iteration (s): 0.13 | learning rate: 4.256E-05 | global batch size: 256 | lm loss: 4.506890E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2016.351 | TFLOPs: 7.50 | 7: iteration 133940/ 173500 | consumed samples: 34288640 | consumed tokens: 70223134720 | elapsed time per iteration (s): 0.08 | learning rate: 4.255E-05 | global batch size: 256 | lm loss: 4.498190E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.030 | TFLOPs: 11.62 | 7: iteration 133950/ 173500 | consumed samples: 34291200 | consumed tokens: 70228377600 | elapsed time per iteration (s): 0.08 | learning rate: 4.254E-05 | global batch size: 256 | lm loss: 4.514583E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.281 | TFLOPs: 11.55 | 7: iteration 133960/ 173500 | consumed samples: 34293760 | consumed tokens: 70233620480 | elapsed time per iteration (s): 0.08 | learning rate: 4.253E-05 | global batch size: 256 | lm loss: 4.509026E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.331 | TFLOPs: 11.48 | 7: iteration 133970/ 173500 | consumed samples: 34296320 | consumed tokens: 70238863360 | elapsed time per iteration (s): 0.09 | learning rate: 4.252E-05 | global batch size: 256 | lm loss: 4.516792E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2964.843 | TFLOPs: 11.03 | 7: iteration 133980/ 173500 | consumed samples: 34298880 | consumed tokens: 70244106240 | elapsed time per iteration (s): 0.09 | learning rate: 4.251E-05 | global batch size: 256 | lm loss: 4.522560E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.891 | TFLOPs: 10.78 | 7: iteration 133990/ 173500 | consumed samples: 34301440 | consumed tokens: 70249349120 | elapsed time per iteration (s): 0.08 | learning rate: 4.249E-05 | global batch size: 256 | lm loss: 4.509978E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.718 | TFLOPs: 11.27 | 0: [2023-03-17 03:31:39,312] [INFO] [logging.py:68:log_dist] [Rank 0] step=134000, skipped=0, lr=[4.248399618979796e-05, 4.248399618979796e-05, 4.248399618979796e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 134000/ 173500 | consumed samples: 34304000 | consumed tokens: 70254592000 | elapsed time per iteration (s): 0.09 | learning rate: 4.248E-05 | global batch size: 256 | lm loss: 4.517237E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2956.448 | TFLOPs: 11.00 | 0: steps: 134000 loss: 4.5053 iter time (s): 0.099 samples/sec: 2585.799 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 134000 | lm loss value: 4.393382E+00 | lm loss PPL: 8.091361E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 134000 to checkpoints_14m91b100m 0: [2023-03-17 03:31:39,369] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step134000 is begin to save! 0: [2023-03-17 03:31:39,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:31:39,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:31:39,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:31:39,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:31:39,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:31:39,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:31:39,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:31:39,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:31:39,410] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:31:39,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:31:39,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:31:39,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:31:39,414] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step134000/mp_rank_00_model_states.pt 0: [2023-03-17 03:31:39,414] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:31:39,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:31:39,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:31:39,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 6: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 5: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:31:39,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:31:39,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 2: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 7: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 4: [2023-03-17 03:31:39,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 3: [2023-03-17 03:31:39,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:31:39,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 1: [2023-03-17 03:31:39,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:31:39,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step134000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:31:39,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step134000 is ready now! 0: successfully saved checkpoint at iteration 134000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 284.22 7: iteration 134010/ 173500 | consumed samples: 34306560 | consumed tokens: 70259834880 | elapsed time per iteration (s): 0.11 | learning rate: 4.247E-05 | global batch size: 256 | lm loss: 4.500289E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2255.918 | TFLOPs: 8.39 | 7: iteration 134020/ 173500 | consumed samples: 34309120 | consumed tokens: 70265077760 | elapsed time per iteration (s): 0.11 | learning rate: 4.246E-05 | global batch size: 256 | lm loss: 4.505281E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2251.657 | TFLOPs: 8.38 | 7: iteration 134030/ 173500 | consumed samples: 34311680 | consumed tokens: 70270320640 | elapsed time per iteration (s): 0.13 | learning rate: 4.245E-05 | global batch size: 256 | lm loss: 4.504831E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.240 | TFLOPs: 7.26 | 7: iteration 134040/ 173500 | consumed samples: 34314240 | consumed tokens: 70275563520 | elapsed time per iteration (s): 0.08 | learning rate: 4.244E-05 | global batch size: 256 | lm loss: 4.504783E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.203 | TFLOPs: 11.60 | 7: iteration 134050/ 173500 | consumed samples: 34316800 | consumed tokens: 70280806400 | elapsed time per iteration (s): 0.10 | learning rate: 4.243E-05 | global batch size: 256 | lm loss: 4.521522E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2572.277 | TFLOPs: 9.57 | 7: iteration 134060/ 173500 | consumed samples: 34319360 | consumed tokens: 70286049280 | elapsed time per iteration (s): 0.13 | learning rate: 4.242E-05 | global batch size: 256 | lm loss: 4.508593E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1950.066 | TFLOPs: 7.25 | 7: iteration 134070/ 173500 | consumed samples: 34321920 | consumed tokens: 70291292160 | elapsed time per iteration (s): 0.12 | learning rate: 4.241E-05 | global batch size: 256 | lm loss: 4.509206E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2059.488 | TFLOPs: 7.66 | 7: iteration 134080/ 173500 | consumed samples: 34324480 | consumed tokens: 70296535040 | elapsed time per iteration (s): 0.12 | learning rate: 4.240E-05 | global batch size: 256 | lm loss: 4.509785E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.435 | TFLOPs: 7.78 | 7: iteration 134090/ 173500 | consumed samples: 34327040 | consumed tokens: 70301777920 | elapsed time per iteration (s): 0.11 | learning rate: 4.239E-05 | global batch size: 256 | lm loss: 4.506954E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2250.008 | TFLOPs: 8.37 | 7: iteration 134100/ 173500 | consumed samples: 34329600 | consumed tokens: 70307020800 | elapsed time per iteration (s): 0.10 | learning rate: 4.238E-05 | global batch size: 256 | lm loss: 4.514724E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2639.561 | TFLOPs: 9.82 | 7: iteration 134110/ 173500 | consumed samples: 34332160 | consumed tokens: 70312263680 | elapsed time per iteration (s): 0.10 | learning rate: 4.236E-05 | global batch size: 256 | lm loss: 4.508958E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2518.389 | TFLOPs: 9.37 | 7: iteration 134120/ 173500 | consumed samples: 34334720 | consumed tokens: 70317506560 | elapsed time per iteration (s): 0.13 | learning rate: 4.235E-05 | global batch size: 256 | lm loss: 4.507552E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.604 | TFLOPs: 7.37 | 7: iteration 134130/ 173500 | consumed samples: 34337280 | consumed tokens: 70322749440 | elapsed time per iteration (s): 0.09 | learning rate: 4.234E-05 | global batch size: 256 | lm loss: 4.515997E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.847 | TFLOPs: 10.47 | 7: iteration 134140/ 173500 | consumed samples: 34339840 | consumed tokens: 70327992320 | elapsed time per iteration (s): 0.11 | learning rate: 4.233E-05 | global batch size: 256 | lm loss: 4.516991E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2392.559 | TFLOPs: 8.90 | 7: iteration 134150/ 173500 | consumed samples: 34342400 | consumed tokens: 70333235200 | elapsed time per iteration (s): 0.08 | learning rate: 4.232E-05 | global batch size: 256 | lm loss: 4.507340E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.548 | TFLOPs: 11.60 | 7: iteration 134160/ 173500 | consumed samples: 34344960 | consumed tokens: 70338478080 | elapsed time per iteration (s): 0.08 | learning rate: 4.231E-05 | global batch size: 256 | lm loss: 4.518318E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.577 | TFLOPs: 11.29 | 7: iteration 134170/ 173500 | consumed samples: 34347520 | consumed tokens: 70343720960 | elapsed time per iteration (s): 0.11 | learning rate: 4.230E-05 | global batch size: 256 | lm loss: 4.509579E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2432.594 | TFLOPs: 9.05 | 7: iteration 134180/ 173500 | consumed samples: 34350080 | consumed tokens: 70348963840 | elapsed time per iteration (s): 0.12 | learning rate: 4.229E-05 | global batch size: 256 | lm loss: 4.504075E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2217.600 | TFLOPs: 8.25 | 7: iteration 134190/ 173500 | consumed samples: 34352640 | consumed tokens: 70354206720 | elapsed time per iteration (s): 0.12 | learning rate: 4.228E-05 | global batch size: 256 | lm loss: 4.499458E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2192.761 | TFLOPs: 8.16 | 7: iteration 134200/ 173500 | consumed samples: 34355200 | consumed tokens: 70359449600 | elapsed time per iteration (s): 0.11 | learning rate: 4.227E-05 | global batch size: 256 | lm loss: 4.511559E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2286.018 | TFLOPs: 8.50 | 7: iteration 134210/ 173500 | consumed samples: 34357760 | consumed tokens: 70364692480 | elapsed time per iteration (s): 0.11 | learning rate: 4.226E-05 | global batch size: 256 | lm loss: 4.507104E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.492 | TFLOPs: 8.50 | 7: iteration 134220/ 173500 | consumed samples: 34360320 | consumed tokens: 70369935360 | elapsed time per iteration (s): 0.12 | learning rate: 4.225E-05 | global batch size: 256 | lm loss: 4.507197E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2112.381 | TFLOPs: 7.86 | 7: iteration 134230/ 173500 | consumed samples: 34362880 | consumed tokens: 70375178240 | elapsed time per iteration (s): 0.11 | learning rate: 4.223E-05 | global batch size: 256 | lm loss: 4.503022E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.762 | TFLOPs: 8.50 | 7: iteration 134240/ 173500 | consumed samples: 34365440 | consumed tokens: 70380421120 | elapsed time per iteration (s): 0.12 | learning rate: 4.222E-05 | global batch size: 256 | lm loss: 4.509846E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.104 | TFLOPs: 7.63 | 7: iteration 134250/ 173500 | consumed samples: 34368000 | consumed tokens: 70385664000 | elapsed time per iteration (s): 0.11 | learning rate: 4.221E-05 | global batch size: 256 | lm loss: 4.520689E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.025 | TFLOPs: 8.56 | 7: iteration 134260/ 173500 | consumed samples: 34370560 | consumed tokens: 70390906880 | elapsed time per iteration (s): 0.11 | learning rate: 4.220E-05 | global batch size: 256 | lm loss: 4.507672E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.158 | TFLOPs: 8.56 | 7: iteration 134270/ 173500 | consumed samples: 34373120 | consumed tokens: 70396149760 | elapsed time per iteration (s): 0.11 | learning rate: 4.219E-05 | global batch size: 256 | lm loss: 4.502260E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.237 | TFLOPs: 8.56 | 7: iteration 134280/ 173500 | consumed samples: 34375680 | consumed tokens: 70401392640 | elapsed time per iteration (s): 0.11 | learning rate: 4.218E-05 | global batch size: 256 | lm loss: 4.512027E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.302 | TFLOPs: 8.56 | 7: iteration 134290/ 173500 | consumed samples: 34378240 | consumed tokens: 70406635520 | elapsed time per iteration (s): 0.14 | learning rate: 4.217E-05 | global batch size: 256 | lm loss: 4.501401E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1845.559 | TFLOPs: 6.86 | 7: iteration 134300/ 173500 | consumed samples: 34380800 | consumed tokens: 70411878400 | elapsed time per iteration (s): 0.11 | learning rate: 4.216E-05 | global batch size: 256 | lm loss: 4.513647E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.673 | TFLOPs: 8.56 | 7: iteration 134310/ 173500 | consumed samples: 34383360 | consumed tokens: 70417121280 | elapsed time per iteration (s): 0.14 | learning rate: 4.215E-05 | global batch size: 256 | lm loss: 4.498669E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1784.264 | TFLOPs: 6.64 | 7: iteration 134320/ 173500 | consumed samples: 34385920 | consumed tokens: 70422364160 | elapsed time per iteration (s): 0.10 | learning rate: 4.214E-05 | global batch size: 256 | lm loss: 4.509258E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2589.850 | TFLOPs: 9.63 | 7: iteration 134330/ 173500 | consumed samples: 34388480 | consumed tokens: 70427607040 | elapsed time per iteration (s): 0.09 | learning rate: 4.213E-05 | global batch size: 256 | lm loss: 4.520468E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.569 | TFLOPs: 10.88 | 7: iteration 134340/ 173500 | consumed samples: 34391040 | consumed tokens: 70432849920 | elapsed time per iteration (s): 0.16 | learning rate: 4.212E-05 | global batch size: 256 | lm loss: 4.512309E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1634.039 | TFLOPs: 6.08 | 7: iteration 134350/ 173500 | consumed samples: 34393600 | consumed tokens: 70438092800 | elapsed time per iteration (s): 0.14 | learning rate: 4.210E-05 | global batch size: 256 | lm loss: 4.508233E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1771.935 | TFLOPs: 6.59 | 7: iteration 134360/ 173500 | consumed samples: 34396160 | consumed tokens: 70443335680 | elapsed time per iteration (s): 0.12 | learning rate: 4.209E-05 | global batch size: 256 | lm loss: 4.513831E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2072.032 | TFLOPs: 7.71 | 7: iteration 134370/ 173500 | consumed samples: 34398720 | consumed tokens: 70448578560 | elapsed time per iteration (s): 0.15 | learning rate: 4.208E-05 | global batch size: 256 | lm loss: 4.504717E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1666.092 | TFLOPs: 6.20 | 7: iteration 134380/ 173500 | consumed samples: 34401280 | consumed tokens: 70453821440 | elapsed time per iteration (s): 0.15 | learning rate: 4.207E-05 | global batch size: 256 | lm loss: 4.510652E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1763.813 | TFLOPs: 6.56 | 7: iteration 134390/ 173500 | consumed samples: 34403840 | consumed tokens: 70459064320 | elapsed time per iteration (s): 0.14 | learning rate: 4.206E-05 | global batch size: 256 | lm loss: 4.508644E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1782.352 | TFLOPs: 6.63 | 7: iteration 134400/ 173500 | consumed samples: 34406400 | consumed tokens: 70464307200 | elapsed time per iteration (s): 0.16 | learning rate: 4.205E-05 | global batch size: 256 | lm loss: 4.515383E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1602.324 | TFLOPs: 5.96 | 7: iteration 134410/ 173500 | consumed samples: 34408960 | consumed tokens: 70469550080 | elapsed time per iteration (s): 0.12 | learning rate: 4.204E-05 | global batch size: 256 | lm loss: 4.509935E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2206.174 | TFLOPs: 8.21 | 7: iteration 134420/ 173500 | consumed samples: 34411520 | consumed tokens: 70474792960 | elapsed time per iteration (s): 0.10 | learning rate: 4.203E-05 | global batch size: 256 | lm loss: 4.505320E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.470 | TFLOPs: 9.16 | 7: iteration 134430/ 173500 | consumed samples: 34414080 | consumed tokens: 70480035840 | elapsed time per iteration (s): 0.09 | learning rate: 4.202E-05 | global batch size: 256 | lm loss: 4.508704E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2737.494 | TFLOPs: 10.18 | 7: iteration 134440/ 173500 | consumed samples: 34416640 | consumed tokens: 70485278720 | elapsed time per iteration (s): 0.08 | learning rate: 4.201E-05 | global batch size: 256 | lm loss: 4.512258E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.870 | TFLOPs: 11.89 | 7: iteration 134450/ 173500 | consumed samples: 34419200 | consumed tokens: 70490521600 | elapsed time per iteration (s): 0.08 | learning rate: 4.200E-05 | global batch size: 256 | lm loss: 4.490856E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.478 | TFLOPs: 11.57 | 7: iteration 134460/ 173500 | consumed samples: 34421760 | consumed tokens: 70495764480 | elapsed time per iteration (s): 0.10 | learning rate: 4.199E-05 | global batch size: 256 | lm loss: 4.514045E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2573.424 | TFLOPs: 9.57 | 7: iteration 134470/ 173500 | consumed samples: 34424320 | consumed tokens: 70501007360 | elapsed time per iteration (s): 0.11 | learning rate: 4.197E-05 | global batch size: 256 | lm loss: 4.516329E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2413.268 | TFLOPs: 8.98 | 7: iteration 134480/ 173500 | consumed samples: 34426880 | consumed tokens: 70506250240 | elapsed time per iteration (s): 0.08 | learning rate: 4.196E-05 | global batch size: 256 | lm loss: 4.508529E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.504 | TFLOPs: 11.89 | 7: iteration 134490/ 173500 | consumed samples: 34429440 | consumed tokens: 70511493120 | elapsed time per iteration (s): 0.10 | learning rate: 4.195E-05 | global batch size: 256 | lm loss: 4.510492E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.664 | TFLOPs: 9.65 | 7: iteration 134500/ 173500 | consumed samples: 34432000 | consumed tokens: 70516736000 | elapsed time per iteration (s): 0.12 | learning rate: 4.194E-05 | global batch size: 256 | lm loss: 4.521606E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2120.238 | TFLOPs: 7.89 | 7: iteration 134510/ 173500 | consumed samples: 34434560 | consumed tokens: 70521978880 | elapsed time per iteration (s): 0.09 | learning rate: 4.193E-05 | global batch size: 256 | lm loss: 4.507444E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.175 | TFLOPs: 11.00 | 7: iteration 134520/ 173500 | consumed samples: 34437120 | consumed tokens: 70527221760 | elapsed time per iteration (s): 0.09 | learning rate: 4.192E-05 | global batch size: 256 | lm loss: 4.506059E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.094 | TFLOPs: 10.97 | 7: iteration 134530/ 173500 | consumed samples: 34439680 | consumed tokens: 70532464640 | elapsed time per iteration (s): 0.08 | learning rate: 4.191E-05 | global batch size: 256 | lm loss: 4.505923E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.278 | TFLOPs: 11.88 | 7: iteration 134540/ 173500 | consumed samples: 34442240 | consumed tokens: 70537707520 | elapsed time per iteration (s): 0.10 | learning rate: 4.190E-05 | global batch size: 256 | lm loss: 4.502012E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2479.770 | TFLOPs: 9.22 | 7: iteration 134550/ 173500 | consumed samples: 34444800 | consumed tokens: 70542950400 | elapsed time per iteration (s): 0.09 | learning rate: 4.189E-05 | global batch size: 256 | lm loss: 4.510513E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.030 | TFLOPs: 11.15 | 7: iteration 134560/ 173500 | consumed samples: 34447360 | consumed tokens: 70548193280 | elapsed time per iteration (s): 0.08 | learning rate: 4.188E-05 | global batch size: 256 | lm loss: 4.525583E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.507 | TFLOPs: 11.81 | 7: iteration 134570/ 173500 | consumed samples: 34449920 | consumed tokens: 70553436160 | elapsed time per iteration (s): 0.08 | learning rate: 4.187E-05 | global batch size: 256 | lm loss: 4.506013E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.054 | TFLOPs: 11.68 | 7: iteration 134580/ 173500 | consumed samples: 34452480 | consumed tokens: 70558679040 | elapsed time per iteration (s): 0.12 | learning rate: 4.186E-05 | global batch size: 256 | lm loss: 4.511732E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2163.021 | TFLOPs: 8.05 | 7: iteration 134590/ 173500 | consumed samples: 34455040 | consumed tokens: 70563921920 | elapsed time per iteration (s): 0.10 | learning rate: 4.185E-05 | global batch size: 256 | lm loss: 4.514200E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.599 | TFLOPs: 9.11 | 7: iteration 134600/ 173500 | consumed samples: 34457600 | consumed tokens: 70569164800 | elapsed time per iteration (s): 0.10 | learning rate: 4.183E-05 | global batch size: 256 | lm loss: 4.493045E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2574.061 | TFLOPs: 9.57 | 7: iteration 134610/ 173500 | consumed samples: 34460160 | consumed tokens: 70574407680 | elapsed time per iteration (s): 0.09 | learning rate: 4.182E-05 | global batch size: 256 | lm loss: 4.505584E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.030 | TFLOPs: 10.35 | 7: iteration 134620/ 173500 | consumed samples: 34462720 | consumed tokens: 70579650560 | elapsed time per iteration (s): 0.11 | learning rate: 4.181E-05 | global batch size: 256 | lm loss: 4.511103E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2301.543 | TFLOPs: 8.56 | 7: iteration 134630/ 173500 | consumed samples: 34465280 | consumed tokens: 70584893440 | elapsed time per iteration (s): 0.08 | learning rate: 4.180E-05 | global batch size: 256 | lm loss: 4.513768E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.142 | TFLOPs: 11.87 | 7: iteration 134640/ 173500 | consumed samples: 34467840 | consumed tokens: 70590136320 | elapsed time per iteration (s): 0.08 | learning rate: 4.179E-05 | global batch size: 256 | lm loss: 4.497618E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.060 | TFLOPs: 11.90 | 7: iteration 134650/ 173500 | consumed samples: 34470400 | consumed tokens: 70595379200 | elapsed time per iteration (s): 0.10 | learning rate: 4.178E-05 | global batch size: 256 | lm loss: 4.497801E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.376 | TFLOPs: 9.43 | 7: iteration 134660/ 173500 | consumed samples: 34472960 | consumed tokens: 70600622080 | elapsed time per iteration (s): 0.12 | learning rate: 4.177E-05 | global batch size: 256 | lm loss: 4.514774E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2185.918 | TFLOPs: 8.13 | 7: iteration 134670/ 173500 | consumed samples: 34475520 | consumed tokens: 70605864960 | elapsed time per iteration (s): 0.14 | learning rate: 4.176E-05 | global batch size: 256 | lm loss: 4.510111E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1866.527 | TFLOPs: 6.94 | 7: iteration 134680/ 173500 | consumed samples: 34478080 | consumed tokens: 70611107840 | elapsed time per iteration (s): 0.13 | learning rate: 4.175E-05 | global batch size: 256 | lm loss: 4.512184E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1953.237 | TFLOPs: 7.27 | 7: iteration 134690/ 173500 | consumed samples: 34480640 | consumed tokens: 70616350720 | elapsed time per iteration (s): 0.13 | learning rate: 4.174E-05 | global batch size: 256 | lm loss: 4.505252E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.138 | TFLOPs: 7.37 | 7: iteration 134700/ 173500 | consumed samples: 34483200 | consumed tokens: 70621593600 | elapsed time per iteration (s): 0.12 | learning rate: 4.173E-05 | global batch size: 256 | lm loss: 4.510506E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2066.645 | TFLOPs: 7.69 | 7: iteration 134710/ 173500 | consumed samples: 34485760 | consumed tokens: 70626836480 | elapsed time per iteration (s): 0.11 | learning rate: 4.172E-05 | global batch size: 256 | lm loss: 4.519237E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2268.196 | TFLOPs: 8.44 | 7: iteration 134720/ 173500 | consumed samples: 34488320 | consumed tokens: 70632079360 | elapsed time per iteration (s): 0.11 | learning rate: 4.171E-05 | global batch size: 256 | lm loss: 4.506694E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.963 | TFLOPs: 8.57 | 7: iteration 134730/ 173500 | consumed samples: 34490880 | consumed tokens: 70637322240 | elapsed time per iteration (s): 0.10 | learning rate: 4.170E-05 | global batch size: 256 | lm loss: 4.512631E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.224 | TFLOPs: 9.42 | 7: iteration 134740/ 173500 | consumed samples: 34493440 | consumed tokens: 70642565120 | elapsed time per iteration (s): 0.12 | learning rate: 4.168E-05 | global batch size: 256 | lm loss: 4.506043E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2219.590 | TFLOPs: 8.26 | 7: iteration 134750/ 173500 | consumed samples: 34496000 | consumed tokens: 70647808000 | elapsed time per iteration (s): 0.12 | learning rate: 4.167E-05 | global batch size: 256 | lm loss: 4.510733E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2208.492 | TFLOPs: 8.21 | 7: iteration 134760/ 173500 | consumed samples: 34498560 | consumed tokens: 70653050880 | elapsed time per iteration (s): 0.12 | learning rate: 4.166E-05 | global batch size: 256 | lm loss: 4.506409E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2191.881 | TFLOPs: 8.15 | 7: iteration 134770/ 173500 | consumed samples: 34501120 | consumed tokens: 70658293760 | elapsed time per iteration (s): 0.11 | learning rate: 4.165E-05 | global batch size: 256 | lm loss: 4.518808E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2378.272 | TFLOPs: 8.85 | 7: iteration 134780/ 173500 | consumed samples: 34503680 | consumed tokens: 70663536640 | elapsed time per iteration (s): 0.09 | learning rate: 4.164E-05 | global batch size: 256 | lm loss: 4.509008E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2976.633 | TFLOPs: 11.07 | 7: iteration 134790/ 173500 | consumed samples: 34506240 | consumed tokens: 70668779520 | elapsed time per iteration (s): 0.08 | learning rate: 4.163E-05 | global batch size: 256 | lm loss: 4.498949E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.052 | TFLOPs: 11.94 | 7: iteration 134800/ 173500 | consumed samples: 34508800 | consumed tokens: 70674022400 | elapsed time per iteration (s): 0.08 | learning rate: 4.162E-05 | global batch size: 256 | lm loss: 4.518819E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.434 | TFLOPs: 11.24 | 7: iteration 134810/ 173500 | consumed samples: 34511360 | consumed tokens: 70679265280 | elapsed time per iteration (s): 0.08 | learning rate: 4.161E-05 | global batch size: 256 | lm loss: 4.513565E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.405 | TFLOPs: 11.33 | 7: iteration 134820/ 173500 | consumed samples: 34513920 | consumed tokens: 70684508160 | elapsed time per iteration (s): 0.09 | learning rate: 4.160E-05 | global batch size: 256 | lm loss: 4.508862E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.981 | TFLOPs: 10.40 | 7: iteration 134830/ 173500 | consumed samples: 34516480 | consumed tokens: 70689751040 | elapsed time per iteration (s): 0.09 | learning rate: 4.159E-05 | global batch size: 256 | lm loss: 4.500760E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2725.963 | TFLOPs: 10.14 | 7: iteration 134840/ 173500 | consumed samples: 34519040 | consumed tokens: 70694993920 | elapsed time per iteration (s): 0.08 | learning rate: 4.158E-05 | global batch size: 256 | lm loss: 4.504325E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.408 | TFLOPs: 11.83 | 7: iteration 134850/ 173500 | consumed samples: 34521600 | consumed tokens: 70700236800 | elapsed time per iteration (s): 0.09 | learning rate: 4.157E-05 | global batch size: 256 | lm loss: 4.519309E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.571 | TFLOPs: 11.06 | 7: iteration 134860/ 173500 | consumed samples: 34524160 | consumed tokens: 70705479680 | elapsed time per iteration (s): 0.08 | learning rate: 4.156E-05 | global batch size: 256 | lm loss: 4.508025E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.745 | TFLOPs: 11.99 | 7: iteration 134870/ 173500 | consumed samples: 34526720 | consumed tokens: 70710722560 | elapsed time per iteration (s): 0.08 | learning rate: 4.155E-05 | global batch size: 256 | lm loss: 4.501193E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.053 | TFLOPs: 11.84 | 7: iteration 134880/ 173500 | consumed samples: 34529280 | consumed tokens: 70715965440 | elapsed time per iteration (s): 0.08 | learning rate: 4.153E-05 | global batch size: 256 | lm loss: 4.513778E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.410 | TFLOPs: 11.82 | 7: iteration 134890/ 173500 | consumed samples: 34531840 | consumed tokens: 70721208320 | elapsed time per iteration (s): 0.08 | learning rate: 4.152E-05 | global batch size: 256 | lm loss: 4.507647E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.419 | TFLOPs: 11.83 | 7: iteration 134900/ 173500 | consumed samples: 34534400 | consumed tokens: 70726451200 | elapsed time per iteration (s): 0.08 | learning rate: 4.151E-05 | global batch size: 256 | lm loss: 4.506055E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.758 | TFLOPs: 11.77 | 7: iteration 134910/ 173500 | consumed samples: 34536960 | consumed tokens: 70731694080 | elapsed time per iteration (s): 0.08 | learning rate: 4.150E-05 | global batch size: 256 | lm loss: 4.519864E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.792 | TFLOPs: 11.83 | 7: iteration 134920/ 173500 | consumed samples: 34539520 | consumed tokens: 70736936960 | elapsed time per iteration (s): 0.08 | learning rate: 4.149E-05 | global batch size: 256 | lm loss: 4.513780E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.119 | TFLOPs: 11.62 | 7: iteration 134930/ 173500 | consumed samples: 34542080 | consumed tokens: 70742179840 | elapsed time per iteration (s): 0.08 | learning rate: 4.148E-05 | global batch size: 256 | lm loss: 4.520045E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.575 | TFLOPs: 11.81 | 7: iteration 134940/ 173500 | consumed samples: 34544640 | consumed tokens: 70747422720 | elapsed time per iteration (s): 0.08 | learning rate: 4.147E-05 | global batch size: 256 | lm loss: 4.511912E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.679 | TFLOPs: 11.83 | 7: iteration 134950/ 173500 | consumed samples: 34547200 | consumed tokens: 70752665600 | elapsed time per iteration (s): 0.08 | learning rate: 4.146E-05 | global batch size: 256 | lm loss: 4.514716E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.040 | TFLOPs: 11.85 | 7: iteration 134960/ 173500 | consumed samples: 34549760 | consumed tokens: 70757908480 | elapsed time per iteration (s): 0.08 | learning rate: 4.145E-05 | global batch size: 256 | lm loss: 4.509369E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.217 | TFLOPs: 11.82 | 7: iteration 134970/ 173500 | consumed samples: 34552320 | consumed tokens: 70763151360 | elapsed time per iteration (s): 0.08 | learning rate: 4.144E-05 | global batch size: 256 | lm loss: 4.529674E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.933 | TFLOPs: 11.79 | 7: iteration 134980/ 173500 | consumed samples: 34554880 | consumed tokens: 70768394240 | elapsed time per iteration (s): 0.08 | learning rate: 4.143E-05 | global batch size: 256 | lm loss: 4.509836E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.172 | TFLOPs: 11.39 | 7: iteration 134990/ 173500 | consumed samples: 34557440 | consumed tokens: 70773637120 | elapsed time per iteration (s): 0.10 | learning rate: 4.142E-05 | global batch size: 256 | lm loss: 4.507801E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2577.856 | TFLOPs: 9.59 | 7: iteration 135000/ 173500 | consumed samples: 34560000 | consumed tokens: 70778880000 | elapsed time per iteration (s): 0.08 | learning rate: 4.141E-05 | global batch size: 256 | lm loss: 4.502145E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.507 | TFLOPs: 11.58 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 135000 | lm loss value: 4.409667E+00 | lm loss PPL: 8.224211E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 135000 to checkpoints_14m91b100m 0: [2023-03-17 03:33:23,401] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step135000 is begin to save! 0: [2023-03-17 03:33:23,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:33:23,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:33:23,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:33:23,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:33:23,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:33:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:33:23,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:33:23,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:33:23,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:33:23,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:33:23,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:33:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:33:23,443] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step135000/mp_rank_00_model_states.pt 0: [2023-03-17 03:33:23,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:33:23,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:33:23,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:33:23,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 2: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 4: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 5: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 7: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 6: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 1: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 3: [2023-03-17 03:33:23,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step135000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:33:23,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step135000 is ready now! 0: successfully saved checkpoint at iteration 135000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.56 7: iteration 135010/ 173500 | consumed samples: 34562560 | consumed tokens: 70784122880 | elapsed time per iteration (s): 0.09 | learning rate: 4.140E-05 | global batch size: 256 | lm loss: 4.505467E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2786.966 | TFLOPs: 10.37 | 7: iteration 135020/ 173500 | consumed samples: 34565120 | consumed tokens: 70789365760 | elapsed time per iteration (s): 0.08 | learning rate: 4.139E-05 | global batch size: 256 | lm loss: 4.517107E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.712 | TFLOPs: 11.71 | 7: iteration 135030/ 173500 | consumed samples: 34567680 | consumed tokens: 70794608640 | elapsed time per iteration (s): 0.08 | learning rate: 4.137E-05 | global batch size: 256 | lm loss: 4.513210E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.096 | TFLOPs: 11.77 | 7: iteration 135040/ 173500 | consumed samples: 34570240 | consumed tokens: 70799851520 | elapsed time per iteration (s): 0.08 | learning rate: 4.136E-05 | global batch size: 256 | lm loss: 4.499179E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.048 | TFLOPs: 11.81 | 7: iteration 135050/ 173500 | consumed samples: 34572800 | consumed tokens: 70805094400 | elapsed time per iteration (s): 0.08 | learning rate: 4.135E-05 | global batch size: 256 | lm loss: 4.519891E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.212 | TFLOPs: 11.74 | 7: iteration 135060/ 173500 | consumed samples: 34575360 | consumed tokens: 70810337280 | elapsed time per iteration (s): 0.08 | learning rate: 4.134E-05 | global batch size: 256 | lm loss: 4.511548E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.339 | TFLOPs: 11.73 | 7: iteration 135070/ 173500 | consumed samples: 34577920 | consumed tokens: 70815580160 | elapsed time per iteration (s): 0.09 | learning rate: 4.133E-05 | global batch size: 256 | lm loss: 4.502834E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.769 | TFLOPs: 10.35 | 7: iteration 135080/ 173500 | consumed samples: 34580480 | consumed tokens: 70820823040 | elapsed time per iteration (s): 0.08 | learning rate: 4.132E-05 | global batch size: 256 | lm loss: 4.517529E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.778 | TFLOPs: 11.78 | 7: iteration 135090/ 173500 | consumed samples: 34583040 | consumed tokens: 70826065920 | elapsed time per iteration (s): 0.08 | learning rate: 4.131E-05 | global batch size: 256 | lm loss: 4.498619E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.974 | TFLOPs: 11.76 | 7: iteration 135100/ 173500 | consumed samples: 34585600 | consumed tokens: 70831308800 | elapsed time per iteration (s): 0.08 | learning rate: 4.130E-05 | global batch size: 256 | lm loss: 4.497968E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.171 | TFLOPs: 11.80 | 7: iteration 135110/ 173500 | consumed samples: 34588160 | consumed tokens: 70836551680 | elapsed time per iteration (s): 0.08 | learning rate: 4.129E-05 | global batch size: 256 | lm loss: 4.505500E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.510 | TFLOPs: 11.84 | 7: iteration 135120/ 173500 | consumed samples: 34590720 | consumed tokens: 70841794560 | elapsed time per iteration (s): 0.08 | learning rate: 4.128E-05 | global batch size: 256 | lm loss: 4.510884E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.541 | TFLOPs: 11.82 | 7: iteration 135130/ 173500 | consumed samples: 34593280 | consumed tokens: 70847037440 | elapsed time per iteration (s): 0.08 | learning rate: 4.127E-05 | global batch size: 256 | lm loss: 4.505048E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.571 | TFLOPs: 11.79 | 7: iteration 135140/ 173500 | consumed samples: 34595840 | consumed tokens: 70852280320 | elapsed time per iteration (s): 0.09 | learning rate: 4.126E-05 | global batch size: 256 | lm loss: 4.527555E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2707.963 | TFLOPs: 10.07 | 7: iteration 135150/ 173500 | consumed samples: 34598400 | consumed tokens: 70857523200 | elapsed time per iteration (s): 0.08 | learning rate: 4.125E-05 | global batch size: 256 | lm loss: 4.514021E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.611 | TFLOPs: 11.48 | 7: iteration 135160/ 173500 | consumed samples: 34600960 | consumed tokens: 70862766080 | elapsed time per iteration (s): 0.08 | learning rate: 4.124E-05 | global batch size: 256 | lm loss: 4.510392E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.553 | TFLOPs: 11.86 | 7: iteration 135170/ 173500 | consumed samples: 34603520 | consumed tokens: 70868008960 | elapsed time per iteration (s): 0.08 | learning rate: 4.123E-05 | global batch size: 256 | lm loss: 4.510406E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.175 | TFLOPs: 11.45 | 7: iteration 135180/ 173500 | consumed samples: 34606080 | consumed tokens: 70873251840 | elapsed time per iteration (s): 0.08 | learning rate: 4.122E-05 | global batch size: 256 | lm loss: 4.510294E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.977 | TFLOPs: 11.51 | 7: iteration 135190/ 173500 | consumed samples: 34608640 | consumed tokens: 70878494720 | elapsed time per iteration (s): 0.09 | learning rate: 4.120E-05 | global batch size: 256 | lm loss: 4.508722E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.414 | TFLOPs: 10.86 | 7: iteration 135200/ 173500 | consumed samples: 34611200 | consumed tokens: 70883737600 | elapsed time per iteration (s): 0.10 | learning rate: 4.119E-05 | global batch size: 256 | lm loss: 4.501427E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.937 | TFLOPs: 9.92 | 7: iteration 135210/ 173500 | consumed samples: 34613760 | consumed tokens: 70888980480 | elapsed time per iteration (s): 0.08 | learning rate: 4.118E-05 | global batch size: 256 | lm loss: 4.510313E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.374 | TFLOPs: 11.61 | 7: iteration 135220/ 173500 | consumed samples: 34616320 | consumed tokens: 70894223360 | elapsed time per iteration (s): 0.08 | learning rate: 4.117E-05 | global batch size: 256 | lm loss: 4.511524E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.155 | TFLOPs: 11.75 | 7: iteration 135230/ 173500 | consumed samples: 34618880 | consumed tokens: 70899466240 | elapsed time per iteration (s): 0.09 | learning rate: 4.116E-05 | global batch size: 256 | lm loss: 4.511818E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.685 | TFLOPs: 10.80 | 7: iteration 135240/ 173500 | consumed samples: 34621440 | consumed tokens: 70904709120 | elapsed time per iteration (s): 0.08 | learning rate: 4.115E-05 | global batch size: 256 | lm loss: 4.524649E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.077 | TFLOPs: 11.22 | 7: iteration 135250/ 173500 | consumed samples: 34624000 | consumed tokens: 70909952000 | elapsed time per iteration (s): 0.08 | learning rate: 4.114E-05 | global batch size: 256 | lm loss: 4.499572E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.204 | TFLOPs: 11.99 | 7: iteration 135260/ 173500 | consumed samples: 34626560 | consumed tokens: 70915194880 | elapsed time per iteration (s): 0.08 | learning rate: 4.113E-05 | global batch size: 256 | lm loss: 4.504175E+00 | grad norm: 0.522 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.130 | TFLOPs: 11.85 | 7: iteration 135270/ 173500 | consumed samples: 34629120 | consumed tokens: 70920437760 | elapsed time per iteration (s): 0.09 | learning rate: 4.112E-05 | global batch size: 256 | lm loss: 4.503352E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.241 | TFLOPs: 10.05 | 7: iteration 135280/ 173500 | consumed samples: 34631680 | consumed tokens: 70925680640 | elapsed time per iteration (s): 0.08 | learning rate: 4.111E-05 | global batch size: 256 | lm loss: 4.512377E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.086 | TFLOPs: 11.31 | 7: iteration 135290/ 173500 | consumed samples: 34634240 | consumed tokens: 70930923520 | elapsed time per iteration (s): 0.09 | learning rate: 4.110E-05 | global batch size: 256 | lm loss: 4.506168E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2834.054 | TFLOPs: 10.54 | 7: iteration 135300/ 173500 | consumed samples: 34636800 | consumed tokens: 70936166400 | elapsed time per iteration (s): 0.10 | learning rate: 4.109E-05 | global batch size: 256 | lm loss: 4.508104E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.624 | TFLOPs: 9.92 | 7: iteration 135310/ 173500 | consumed samples: 34639360 | consumed tokens: 70941409280 | elapsed time per iteration (s): 0.08 | learning rate: 4.108E-05 | global batch size: 256 | lm loss: 4.524193E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.564 | TFLOPs: 11.32 | 7: iteration 135320/ 173500 | consumed samples: 34641920 | consumed tokens: 70946652160 | elapsed time per iteration (s): 0.08 | learning rate: 4.107E-05 | global batch size: 256 | lm loss: 4.514304E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.854 | TFLOPs: 11.30 | 7: iteration 135330/ 173500 | consumed samples: 34644480 | consumed tokens: 70951895040 | elapsed time per iteration (s): 0.08 | learning rate: 4.106E-05 | global batch size: 256 | lm loss: 4.499382E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.842 | TFLOPs: 11.85 | 7: iteration 135340/ 173500 | consumed samples: 34647040 | consumed tokens: 70957137920 | elapsed time per iteration (s): 0.08 | learning rate: 4.105E-05 | global batch size: 256 | lm loss: 4.508931E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.867 | TFLOPs: 11.65 | 7: iteration 135350/ 173500 | consumed samples: 34649600 | consumed tokens: 70962380800 | elapsed time per iteration (s): 0.08 | learning rate: 4.104E-05 | global batch size: 256 | lm loss: 4.510372E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.769 | TFLOPs: 11.48 | 7: iteration 135360/ 173500 | consumed samples: 34652160 | consumed tokens: 70967623680 | elapsed time per iteration (s): 0.08 | learning rate: 4.102E-05 | global batch size: 256 | lm loss: 4.507493E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.195 | TFLOPs: 11.85 | 7: iteration 135370/ 173500 | consumed samples: 34654720 | consumed tokens: 70972866560 | elapsed time per iteration (s): 0.09 | learning rate: 4.101E-05 | global batch size: 256 | lm loss: 4.505643E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.664 | TFLOPs: 10.96 | 7: iteration 135380/ 173500 | consumed samples: 34657280 | consumed tokens: 70978109440 | elapsed time per iteration (s): 0.08 | learning rate: 4.100E-05 | global batch size: 256 | lm loss: 4.496152E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.527 | TFLOPs: 11.37 | 7: iteration 135390/ 173500 | consumed samples: 34659840 | consumed tokens: 70983352320 | elapsed time per iteration (s): 0.12 | learning rate: 4.099E-05 | global batch size: 256 | lm loss: 4.508385E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2208.386 | TFLOPs: 8.21 | 7: iteration 135400/ 173500 | consumed samples: 34662400 | consumed tokens: 70988595200 | elapsed time per iteration (s): 0.14 | learning rate: 4.098E-05 | global batch size: 256 | lm loss: 4.505307E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1877.603 | TFLOPs: 6.98 | 7: iteration 135410/ 173500 | consumed samples: 34664960 | consumed tokens: 70993838080 | elapsed time per iteration (s): 0.12 | learning rate: 4.097E-05 | global batch size: 256 | lm loss: 4.504182E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2057.800 | TFLOPs: 7.65 | 7: iteration 135420/ 173500 | consumed samples: 34667520 | consumed tokens: 70999080960 | elapsed time per iteration (s): 0.13 | learning rate: 4.096E-05 | global batch size: 256 | lm loss: 4.512892E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.502 | TFLOPs: 7.39 | 7: iteration 135430/ 173500 | consumed samples: 34670080 | consumed tokens: 71004323840 | elapsed time per iteration (s): 0.13 | learning rate: 4.095E-05 | global batch size: 256 | lm loss: 4.501950E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1976.528 | TFLOPs: 7.35 | 7: iteration 135440/ 173500 | consumed samples: 34672640 | consumed tokens: 71009566720 | elapsed time per iteration (s): 0.13 | learning rate: 4.094E-05 | global batch size: 256 | lm loss: 4.512663E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1926.081 | TFLOPs: 7.16 | 7: iteration 135450/ 173500 | consumed samples: 34675200 | consumed tokens: 71014809600 | elapsed time per iteration (s): 0.12 | learning rate: 4.093E-05 | global batch size: 256 | lm loss: 4.512409E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2069.111 | TFLOPs: 7.70 | 7: iteration 135460/ 173500 | consumed samples: 34677760 | consumed tokens: 71020052480 | elapsed time per iteration (s): 0.14 | learning rate: 4.092E-05 | global batch size: 256 | lm loss: 4.500016E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1841.010 | TFLOPs: 6.85 | 7: iteration 135470/ 173500 | consumed samples: 34680320 | consumed tokens: 71025295360 | elapsed time per iteration (s): 0.13 | learning rate: 4.091E-05 | global batch size: 256 | lm loss: 4.507226E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.174 | TFLOPs: 7.21 | 7: iteration 135480/ 173500 | consumed samples: 34682880 | consumed tokens: 71030538240 | elapsed time per iteration (s): 0.13 | learning rate: 4.090E-05 | global batch size: 256 | lm loss: 4.517416E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.466 | TFLOPs: 7.26 | 7: iteration 135490/ 173500 | consumed samples: 34685440 | consumed tokens: 71035781120 | elapsed time per iteration (s): 0.13 | learning rate: 4.089E-05 | global batch size: 256 | lm loss: 4.496853E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.428 | TFLOPs: 7.46 | 7: iteration 135500/ 173500 | consumed samples: 34688000 | consumed tokens: 71041024000 | elapsed time per iteration (s): 0.13 | learning rate: 4.088E-05 | global batch size: 256 | lm loss: 4.514731E+00 | grad norm: 0.447 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.379 | TFLOPs: 7.53 | 7: iteration 135510/ 173500 | consumed samples: 34690560 | consumed tokens: 71046266880 | elapsed time per iteration (s): 0.13 | learning rate: 4.087E-05 | global batch size: 256 | lm loss: 4.513709E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.141 | TFLOPs: 7.26 | 7: iteration 135520/ 173500 | consumed samples: 34693120 | consumed tokens: 71051509760 | elapsed time per iteration (s): 0.13 | learning rate: 4.086E-05 | global batch size: 256 | lm loss: 4.511399E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.161 | TFLOPs: 7.53 | 7: iteration 135530/ 173500 | consumed samples: 34695680 | consumed tokens: 71056752640 | elapsed time per iteration (s): 0.13 | learning rate: 4.085E-05 | global batch size: 256 | lm loss: 4.511379E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.310 | TFLOPs: 7.21 | 7: iteration 135540/ 173500 | consumed samples: 34698240 | consumed tokens: 71061995520 | elapsed time per iteration (s): 0.13 | learning rate: 4.083E-05 | global batch size: 256 | lm loss: 4.503242E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1955.913 | TFLOPs: 7.28 | 7: iteration 135550/ 173500 | consumed samples: 34700800 | consumed tokens: 71067238400 | elapsed time per iteration (s): 0.13 | learning rate: 4.082E-05 | global batch size: 256 | lm loss: 4.510669E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1952.635 | TFLOPs: 7.26 | 7: iteration 135560/ 173500 | consumed samples: 34703360 | consumed tokens: 71072481280 | elapsed time per iteration (s): 0.13 | learning rate: 4.081E-05 | global batch size: 256 | lm loss: 4.506580E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.388 | TFLOPs: 7.28 | 7: iteration 135570/ 173500 | consumed samples: 34705920 | consumed tokens: 71077724160 | elapsed time per iteration (s): 0.13 | learning rate: 4.080E-05 | global batch size: 256 | lm loss: 4.516145E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.588 | TFLOPs: 7.42 | 7: iteration 135580/ 173500 | consumed samples: 34708480 | consumed tokens: 71082967040 | elapsed time per iteration (s): 0.13 | learning rate: 4.079E-05 | global batch size: 256 | lm loss: 4.514732E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.359 | TFLOPs: 7.24 | 7: iteration 135590/ 173500 | consumed samples: 34711040 | consumed tokens: 71088209920 | elapsed time per iteration (s): 0.13 | learning rate: 4.078E-05 | global batch size: 256 | lm loss: 4.507197E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.509 | TFLOPs: 7.35 | 7: iteration 135600/ 173500 | consumed samples: 34713600 | consumed tokens: 71093452800 | elapsed time per iteration (s): 0.11 | learning rate: 4.077E-05 | global batch size: 256 | lm loss: 4.501551E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.991 | TFLOPs: 8.57 | 7: iteration 135610/ 173500 | consumed samples: 34716160 | consumed tokens: 71098695680 | elapsed time per iteration (s): 0.08 | learning rate: 4.076E-05 | global batch size: 256 | lm loss: 4.512695E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.807 | TFLOPs: 11.75 | 7: iteration 135620/ 173500 | consumed samples: 34718720 | consumed tokens: 71103938560 | elapsed time per iteration (s): 0.08 | learning rate: 4.075E-05 | global batch size: 256 | lm loss: 4.510133E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.837 | TFLOPs: 11.91 | 7: iteration 135630/ 173500 | consumed samples: 34721280 | consumed tokens: 71109181440 | elapsed time per iteration (s): 0.09 | learning rate: 4.074E-05 | global batch size: 256 | lm loss: 4.508588E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2885.625 | TFLOPs: 10.73 | 7: iteration 135640/ 173500 | consumed samples: 34723840 | consumed tokens: 71114424320 | elapsed time per iteration (s): 0.08 | learning rate: 4.073E-05 | global batch size: 256 | lm loss: 4.507052E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.505 | TFLOPs: 11.81 | 7: iteration 135650/ 173500 | consumed samples: 34726400 | consumed tokens: 71119667200 | elapsed time per iteration (s): 0.08 | learning rate: 4.072E-05 | global batch size: 256 | lm loss: 4.510917E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.338 | TFLOPs: 11.84 | 7: iteration 135660/ 173500 | consumed samples: 34728960 | consumed tokens: 71124910080 | elapsed time per iteration (s): 0.10 | learning rate: 4.071E-05 | global batch size: 256 | lm loss: 4.508134E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2599.283 | TFLOPs: 9.67 | 7: iteration 135670/ 173500 | consumed samples: 34731520 | consumed tokens: 71130152960 | elapsed time per iteration (s): 0.08 | learning rate: 4.070E-05 | global batch size: 256 | lm loss: 4.508588E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.286 | TFLOPs: 11.81 | 7: iteration 135680/ 173500 | consumed samples: 34734080 | consumed tokens: 71135395840 | elapsed time per iteration (s): 0.08 | learning rate: 4.069E-05 | global batch size: 256 | lm loss: 4.509374E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.574 | TFLOPs: 11.62 | 7: iteration 135690/ 173500 | consumed samples: 34736640 | consumed tokens: 71140638720 | elapsed time per iteration (s): 0.08 | learning rate: 4.068E-05 | global batch size: 256 | lm loss: 4.505180E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.392 | TFLOPs: 11.89 | 7: iteration 135700/ 173500 | consumed samples: 34739200 | consumed tokens: 71145881600 | elapsed time per iteration (s): 0.09 | learning rate: 4.067E-05 | global batch size: 256 | lm loss: 4.514098E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2802.379 | TFLOPs: 10.42 | 7: iteration 135710/ 173500 | consumed samples: 34741760 | consumed tokens: 71151124480 | elapsed time per iteration (s): 0.08 | learning rate: 4.066E-05 | global batch size: 256 | lm loss: 4.507275E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.384 | TFLOPs: 11.85 | 7: iteration 135720/ 173500 | consumed samples: 34744320 | consumed tokens: 71156367360 | elapsed time per iteration (s): 0.09 | learning rate: 4.065E-05 | global batch size: 256 | lm loss: 4.502340E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.449 | TFLOPs: 11.07 | 7: iteration 135730/ 173500 | consumed samples: 34746880 | consumed tokens: 71161610240 | elapsed time per iteration (s): 0.10 | learning rate: 4.064E-05 | global batch size: 256 | lm loss: 4.516322E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.288 | TFLOPs: 9.64 | 7: iteration 135740/ 173500 | consumed samples: 34749440 | consumed tokens: 71166853120 | elapsed time per iteration (s): 0.08 | learning rate: 4.062E-05 | global batch size: 256 | lm loss: 4.499866E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.806 | TFLOPs: 11.34 | 7: iteration 135750/ 173500 | consumed samples: 34752000 | consumed tokens: 71172096000 | elapsed time per iteration (s): 0.08 | learning rate: 4.061E-05 | global batch size: 256 | lm loss: 4.517760E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.846 | TFLOPs: 11.89 | 7: iteration 135760/ 173500 | consumed samples: 34754560 | consumed tokens: 71177338880 | elapsed time per iteration (s): 0.08 | learning rate: 4.060E-05 | global batch size: 256 | lm loss: 4.508195E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.586 | TFLOPs: 11.73 | 7: iteration 135770/ 173500 | consumed samples: 34757120 | consumed tokens: 71182581760 | elapsed time per iteration (s): 0.08 | learning rate: 4.059E-05 | global batch size: 256 | lm loss: 4.497461E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3021.832 | TFLOPs: 11.24 | 7: iteration 135780/ 173500 | consumed samples: 34759680 | consumed tokens: 71187824640 | elapsed time per iteration (s): 0.08 | learning rate: 4.058E-05 | global batch size: 256 | lm loss: 4.510284E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.118 | TFLOPs: 11.75 | 7: iteration 135790/ 173500 | consumed samples: 34762240 | consumed tokens: 71193067520 | elapsed time per iteration (s): 0.09 | learning rate: 4.057E-05 | global batch size: 256 | lm loss: 4.498118E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.010 | TFLOPs: 10.40 | 7: iteration 135800/ 173500 | consumed samples: 34764800 | consumed tokens: 71198310400 | elapsed time per iteration (s): 0.09 | learning rate: 4.056E-05 | global batch size: 256 | lm loss: 4.508719E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2749.450 | TFLOPs: 10.23 | 7: iteration 135810/ 173500 | consumed samples: 34767360 | consumed tokens: 71203553280 | elapsed time per iteration (s): 0.10 | learning rate: 4.055E-05 | global batch size: 256 | lm loss: 4.511758E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.887 | TFLOPs: 9.60 | 7: iteration 135820/ 173500 | consumed samples: 34769920 | consumed tokens: 71208796160 | elapsed time per iteration (s): 0.10 | learning rate: 4.054E-05 | global batch size: 256 | lm loss: 4.515674E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.971 | TFLOPs: 9.30 | 7: iteration 135830/ 173500 | consumed samples: 34772480 | consumed tokens: 71214039040 | elapsed time per iteration (s): 0.10 | learning rate: 4.053E-05 | global batch size: 256 | lm loss: 4.514078E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.169 | TFLOPs: 9.47 | 7: iteration 135840/ 173500 | consumed samples: 34775040 | consumed tokens: 71219281920 | elapsed time per iteration (s): 0.10 | learning rate: 4.052E-05 | global batch size: 256 | lm loss: 4.513505E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2513.443 | TFLOPs: 9.35 | 7: iteration 135850/ 173500 | consumed samples: 34777600 | consumed tokens: 71224524800 | elapsed time per iteration (s): 0.11 | learning rate: 4.051E-05 | global batch size: 256 | lm loss: 4.524300E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.383 | TFLOPs: 8.59 | 7: iteration 135860/ 173500 | consumed samples: 34780160 | consumed tokens: 71229767680 | elapsed time per iteration (s): 0.11 | learning rate: 4.050E-05 | global batch size: 256 | lm loss: 4.492212E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.011 | TFLOPs: 8.75 | 7: iteration 135870/ 173500 | consumed samples: 34782720 | consumed tokens: 71235010560 | elapsed time per iteration (s): 0.11 | learning rate: 4.049E-05 | global batch size: 256 | lm loss: 4.502866E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.625 | TFLOPs: 8.50 | 7: iteration 135880/ 173500 | consumed samples: 34785280 | consumed tokens: 71240253440 | elapsed time per iteration (s): 0.11 | learning rate: 4.048E-05 | global batch size: 256 | lm loss: 4.518555E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2288.487 | TFLOPs: 8.51 | 7: iteration 135890/ 173500 | consumed samples: 34787840 | consumed tokens: 71245496320 | elapsed time per iteration (s): 0.11 | learning rate: 4.047E-05 | global batch size: 256 | lm loss: 4.511992E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.526 | TFLOPs: 8.55 | 7: iteration 135900/ 173500 | consumed samples: 34790400 | consumed tokens: 71250739200 | elapsed time per iteration (s): 0.11 | learning rate: 4.046E-05 | global batch size: 256 | lm loss: 4.502298E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.187 | TFLOPs: 8.66 | 7: iteration 135910/ 173500 | consumed samples: 34792960 | consumed tokens: 71255982080 | elapsed time per iteration (s): 0.12 | learning rate: 4.045E-05 | global batch size: 256 | lm loss: 4.509978E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2207.012 | TFLOPs: 8.21 | 7: iteration 135920/ 173500 | consumed samples: 34795520 | consumed tokens: 71261224960 | elapsed time per iteration (s): 0.11 | learning rate: 4.044E-05 | global batch size: 256 | lm loss: 4.515076E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.508 | TFLOPs: 8.59 | 7: iteration 135930/ 173500 | consumed samples: 34798080 | consumed tokens: 71266467840 | elapsed time per iteration (s): 0.12 | learning rate: 4.043E-05 | global batch size: 256 | lm loss: 4.509998E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.359 | TFLOPs: 8.18 | 7: iteration 135940/ 173500 | consumed samples: 34800640 | consumed tokens: 71271710720 | elapsed time per iteration (s): 0.11 | learning rate: 4.042E-05 | global batch size: 256 | lm loss: 4.506226E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2302.360 | TFLOPs: 8.56 | 7: iteration 135950/ 173500 | consumed samples: 34803200 | consumed tokens: 71276953600 | elapsed time per iteration (s): 0.11 | learning rate: 4.040E-05 | global batch size: 256 | lm loss: 4.514693E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2277.470 | TFLOPs: 8.47 | 7: iteration 135960/ 173500 | consumed samples: 34805760 | consumed tokens: 71282196480 | elapsed time per iteration (s): 0.14 | learning rate: 4.039E-05 | global batch size: 256 | lm loss: 4.511270E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1887.420 | TFLOPs: 7.02 | 7: iteration 135970/ 173500 | consumed samples: 34808320 | consumed tokens: 71287439360 | elapsed time per iteration (s): 0.12 | learning rate: 4.038E-05 | global batch size: 256 | lm loss: 4.507875E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2157.666 | TFLOPs: 8.03 | 7: iteration 135980/ 173500 | consumed samples: 34810880 | consumed tokens: 71292682240 | elapsed time per iteration (s): 0.12 | learning rate: 4.037E-05 | global batch size: 256 | lm loss: 4.506140E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2123.711 | TFLOPs: 7.90 | 7: iteration 135990/ 173500 | consumed samples: 34813440 | consumed tokens: 71297925120 | elapsed time per iteration (s): 0.13 | learning rate: 4.036E-05 | global batch size: 256 | lm loss: 4.501857E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.984 | TFLOPs: 7.37 | 0: [2023-03-17 03:35:03,271] [INFO] [logging.py:68:log_dist] [Rank 0] step=136000, skipped=0, lr=[4.035272599944626e-05, 4.035272599944626e-05, 4.035272599944626e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 136000/ 173500 | consumed samples: 34816000 | consumed tokens: 71303168000 | elapsed time per iteration (s): 0.11 | learning rate: 4.035E-05 | global batch size: 256 | lm loss: 4.515311E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.593 | TFLOPs: 8.55 | 0: steps: 136000 loss: 4.4885 iter time (s): 0.101 samples/sec: 2530.817 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 136000 | lm loss value: 4.388088E+00 | lm loss PPL: 8.048636E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 136000 to checkpoints_14m91b100m 0: [2023-03-17 03:35:03,343] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step136000 is begin to save! 0: [2023-03-17 03:35:03,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:35:03,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:35:03,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:35:03,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:35:03,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:35:03,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:35:03,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:35:03,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:35:03,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:35:03,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:35:03,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:35:03,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:35:03,387] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step136000/mp_rank_00_model_states.pt 0: [2023-03-17 03:35:03,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:35:03,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:35:03,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:35:03,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 7: [2023-03-17 03:35:03,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 7: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 7: [2023-03-17 03:35:03,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 6: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 7: [2023-03-17 03:35:03,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 2: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 5: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 7: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 4: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 03:35:03,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step136000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 1: [2023-03-17 03:35:03,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step136000 is ready now! 0: successfully saved checkpoint at iteration 136000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.57 7: iteration 136010/ 173500 | consumed samples: 34818560 | consumed tokens: 71308410880 | elapsed time per iteration (s): 0.14 | learning rate: 4.034E-05 | global batch size: 256 | lm loss: 4.512192E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1895.275 | TFLOPs: 7.05 | 7: iteration 136020/ 173500 | consumed samples: 34821120 | consumed tokens: 71313653760 | elapsed time per iteration (s): 0.11 | learning rate: 4.033E-05 | global batch size: 256 | lm loss: 4.512682E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.641 | TFLOPs: 8.50 | 7: iteration 136030/ 173500 | consumed samples: 34823680 | consumed tokens: 71318896640 | elapsed time per iteration (s): 0.11 | learning rate: 4.032E-05 | global batch size: 256 | lm loss: 4.500562E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2281.168 | TFLOPs: 8.48 | 7: iteration 136040/ 173500 | consumed samples: 34826240 | consumed tokens: 71324139520 | elapsed time per iteration (s): 0.12 | learning rate: 4.031E-05 | global batch size: 256 | lm loss: 4.511299E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2094.484 | TFLOPs: 7.79 | 7: iteration 136050/ 173500 | consumed samples: 34828800 | consumed tokens: 71329382400 | elapsed time per iteration (s): 0.12 | learning rate: 4.030E-05 | global batch size: 256 | lm loss: 4.509772E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2116.703 | TFLOPs: 7.87 | 7: iteration 136060/ 173500 | consumed samples: 34831360 | consumed tokens: 71334625280 | elapsed time per iteration (s): 0.16 | learning rate: 4.029E-05 | global batch size: 256 | lm loss: 4.513261E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1598.771 | TFLOPs: 5.95 | 7: iteration 136070/ 173500 | consumed samples: 34833920 | consumed tokens: 71339868160 | elapsed time per iteration (s): 0.15 | learning rate: 4.028E-05 | global batch size: 256 | lm loss: 4.514520E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1677.668 | TFLOPs: 6.24 | 7: iteration 136080/ 173500 | consumed samples: 34836480 | consumed tokens: 71345111040 | elapsed time per iteration (s): 0.12 | learning rate: 4.027E-05 | global batch size: 256 | lm loss: 4.526952E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2172.984 | TFLOPs: 8.08 | 7: iteration 136090/ 173500 | consumed samples: 34839040 | consumed tokens: 71350353920 | elapsed time per iteration (s): 0.11 | learning rate: 4.026E-05 | global batch size: 256 | lm loss: 4.507138E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2427.857 | TFLOPs: 9.03 | 7: iteration 136100/ 173500 | consumed samples: 34841600 | consumed tokens: 71355596800 | elapsed time per iteration (s): 0.10 | learning rate: 4.025E-05 | global batch size: 256 | lm loss: 4.515071E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.657 | TFLOPs: 9.60 | 7: iteration 136110/ 173500 | consumed samples: 34844160 | consumed tokens: 71360839680 | elapsed time per iteration (s): 0.09 | learning rate: 4.024E-05 | global batch size: 256 | lm loss: 4.515643E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.712 | TFLOPs: 11.10 | 7: iteration 136120/ 173500 | consumed samples: 34846720 | consumed tokens: 71366082560 | elapsed time per iteration (s): 0.08 | learning rate: 4.023E-05 | global batch size: 256 | lm loss: 4.517704E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.028 | TFLOPs: 11.53 | 7: iteration 136130/ 173500 | consumed samples: 34849280 | consumed tokens: 71371325440 | elapsed time per iteration (s): 0.09 | learning rate: 4.022E-05 | global batch size: 256 | lm loss: 4.507561E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2793.287 | TFLOPs: 10.39 | 7: iteration 136140/ 173500 | consumed samples: 34851840 | consumed tokens: 71376568320 | elapsed time per iteration (s): 0.08 | learning rate: 4.021E-05 | global batch size: 256 | lm loss: 4.515724E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.955 | TFLOPs: 11.87 | 7: iteration 136150/ 173500 | consumed samples: 34854400 | consumed tokens: 71381811200 | elapsed time per iteration (s): 0.13 | learning rate: 4.020E-05 | global batch size: 256 | lm loss: 4.515799E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2015.367 | TFLOPs: 7.50 | 7: iteration 136160/ 173500 | consumed samples: 34856960 | consumed tokens: 71387054080 | elapsed time per iteration (s): 0.11 | learning rate: 4.019E-05 | global batch size: 256 | lm loss: 4.502099E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2321.617 | TFLOPs: 8.64 | 7: iteration 136170/ 173500 | consumed samples: 34859520 | consumed tokens: 71392296960 | elapsed time per iteration (s): 0.08 | learning rate: 4.018E-05 | global batch size: 256 | lm loss: 4.510850E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.879 | TFLOPs: 11.68 | 7: iteration 136180/ 173500 | consumed samples: 34862080 | consumed tokens: 71397539840 | elapsed time per iteration (s): 0.08 | learning rate: 4.017E-05 | global batch size: 256 | lm loss: 4.521447E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.670 | TFLOPs: 11.89 | 7: iteration 136190/ 173500 | consumed samples: 34864640 | consumed tokens: 71402782720 | elapsed time per iteration (s): 0.13 | learning rate: 4.016E-05 | global batch size: 256 | lm loss: 4.519274E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.010 | TFLOPs: 7.58 | 7: iteration 136200/ 173500 | consumed samples: 34867200 | consumed tokens: 71408025600 | elapsed time per iteration (s): 0.13 | learning rate: 4.014E-05 | global batch size: 256 | lm loss: 4.505376E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1970.535 | TFLOPs: 7.33 | 7: iteration 136210/ 173500 | consumed samples: 34869760 | consumed tokens: 71413268480 | elapsed time per iteration (s): 0.13 | learning rate: 4.013E-05 | global batch size: 256 | lm loss: 4.499955E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.962 | TFLOPs: 7.22 | 7: iteration 136220/ 173500 | consumed samples: 34872320 | consumed tokens: 71418511360 | elapsed time per iteration (s): 0.13 | learning rate: 4.012E-05 | global batch size: 256 | lm loss: 4.520584E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1910.477 | TFLOPs: 7.11 | 7: iteration 136230/ 173500 | consumed samples: 34874880 | consumed tokens: 71423754240 | elapsed time per iteration (s): 0.10 | learning rate: 4.011E-05 | global batch size: 256 | lm loss: 4.506107E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2532.451 | TFLOPs: 9.42 | 7: iteration 136240/ 173500 | consumed samples: 34877440 | consumed tokens: 71428997120 | elapsed time per iteration (s): 0.08 | learning rate: 4.010E-05 | global batch size: 256 | lm loss: 4.500568E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.182 | TFLOPs: 12.00 | 7: iteration 136250/ 173500 | consumed samples: 34880000 | consumed tokens: 71434240000 | elapsed time per iteration (s): 0.08 | learning rate: 4.009E-05 | global batch size: 256 | lm loss: 4.496711E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.641 | TFLOPs: 11.73 | 7: iteration 136260/ 173500 | consumed samples: 34882560 | consumed tokens: 71439482880 | elapsed time per iteration (s): 0.09 | learning rate: 4.008E-05 | global batch size: 256 | lm loss: 4.511371E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2828.392 | TFLOPs: 10.52 | 7: iteration 136270/ 173500 | consumed samples: 34885120 | consumed tokens: 71444725760 | elapsed time per iteration (s): 0.10 | learning rate: 4.007E-05 | global batch size: 256 | lm loss: 4.501782E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.432 | TFLOPs: 9.30 | 7: iteration 136280/ 173500 | consumed samples: 34887680 | consumed tokens: 71449968640 | elapsed time per iteration (s): 0.09 | learning rate: 4.006E-05 | global batch size: 256 | lm loss: 4.516863E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.706 | TFLOPs: 10.80 | 7: iteration 136290/ 173500 | consumed samples: 34890240 | consumed tokens: 71455211520 | elapsed time per iteration (s): 0.08 | learning rate: 4.005E-05 | global batch size: 256 | lm loss: 4.518708E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.940 | TFLOPs: 12.00 | 7: iteration 136300/ 173500 | consumed samples: 34892800 | consumed tokens: 71460454400 | elapsed time per iteration (s): 0.08 | learning rate: 4.004E-05 | global batch size: 256 | lm loss: 4.502734E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.510 | TFLOPs: 12.02 | 7: iteration 136310/ 173500 | consumed samples: 34895360 | consumed tokens: 71465697280 | elapsed time per iteration (s): 0.08 | learning rate: 4.003E-05 | global batch size: 256 | lm loss: 4.502843E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.002 | TFLOPs: 11.97 | 7: iteration 136320/ 173500 | consumed samples: 34897920 | consumed tokens: 71470940160 | elapsed time per iteration (s): 0.08 | learning rate: 4.002E-05 | global batch size: 256 | lm loss: 4.508662E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.613 | TFLOPs: 11.95 | 7: iteration 136330/ 173500 | consumed samples: 34900480 | consumed tokens: 71476183040 | elapsed time per iteration (s): 0.08 | learning rate: 4.001E-05 | global batch size: 256 | lm loss: 4.507928E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.430 | TFLOPs: 11.73 | 7: iteration 136340/ 173500 | consumed samples: 34903040 | consumed tokens: 71481425920 | elapsed time per iteration (s): 0.08 | learning rate: 4.000E-05 | global batch size: 256 | lm loss: 4.510778E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.117 | TFLOPs: 11.82 | 7: iteration 136350/ 173500 | consumed samples: 34905600 | consumed tokens: 71486668800 | elapsed time per iteration (s): 0.08 | learning rate: 3.999E-05 | global batch size: 256 | lm loss: 4.506863E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.522 | TFLOPs: 11.98 | 7: iteration 136360/ 173500 | consumed samples: 34908160 | consumed tokens: 71491911680 | elapsed time per iteration (s): 0.10 | learning rate: 3.998E-05 | global batch size: 256 | lm loss: 4.495521E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2669.953 | TFLOPs: 9.93 | 7: iteration 136370/ 173500 | consumed samples: 34910720 | consumed tokens: 71497154560 | elapsed time per iteration (s): 0.11 | learning rate: 3.997E-05 | global batch size: 256 | lm loss: 4.506078E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2309.048 | TFLOPs: 8.59 | 7: iteration 136380/ 173500 | consumed samples: 34913280 | consumed tokens: 71502397440 | elapsed time per iteration (s): 0.13 | learning rate: 3.996E-05 | global batch size: 256 | lm loss: 4.503181E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.370 | TFLOPs: 7.53 | 7: iteration 136390/ 173500 | consumed samples: 34915840 | consumed tokens: 71507640320 | elapsed time per iteration (s): 0.11 | learning rate: 3.995E-05 | global batch size: 256 | lm loss: 4.512001E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.221 | TFLOPs: 8.50 | 7: iteration 136400/ 173500 | consumed samples: 34918400 | consumed tokens: 71512883200 | elapsed time per iteration (s): 0.08 | learning rate: 3.994E-05 | global batch size: 256 | lm loss: 4.511427E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.839 | TFLOPs: 11.88 | 7: iteration 136410/ 173500 | consumed samples: 34920960 | consumed tokens: 71518126080 | elapsed time per iteration (s): 0.09 | learning rate: 3.993E-05 | global batch size: 256 | lm loss: 4.518078E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2743.866 | TFLOPs: 10.21 | 7: iteration 136420/ 173500 | consumed samples: 34923520 | consumed tokens: 71523368960 | elapsed time per iteration (s): 0.09 | learning rate: 3.992E-05 | global batch size: 256 | lm loss: 4.513792E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.254 | TFLOPs: 11.11 | 7: iteration 136430/ 173500 | consumed samples: 34926080 | consumed tokens: 71528611840 | elapsed time per iteration (s): 0.13 | learning rate: 3.991E-05 | global batch size: 256 | lm loss: 4.502387E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1939.549 | TFLOPs: 7.21 | 7: iteration 136440/ 173500 | consumed samples: 34928640 | consumed tokens: 71533854720 | elapsed time per iteration (s): 0.09 | learning rate: 3.990E-05 | global batch size: 256 | lm loss: 4.520097E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.380 | TFLOPs: 10.61 | 7: iteration 136450/ 173500 | consumed samples: 34931200 | consumed tokens: 71539097600 | elapsed time per iteration (s): 0.11 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 4.494314E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2322.196 | TFLOPs: 8.64 | 7: iteration 136460/ 173500 | consumed samples: 34933760 | consumed tokens: 71544340480 | elapsed time per iteration (s): 0.13 | learning rate: 3.988E-05 | global batch size: 256 | lm loss: 4.514842E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.506 | TFLOPs: 7.39 | 7: iteration 136470/ 173500 | consumed samples: 34936320 | consumed tokens: 71549583360 | elapsed time per iteration (s): 0.13 | learning rate: 3.987E-05 | global batch size: 256 | lm loss: 4.510875E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1969.056 | TFLOPs: 7.32 | 7: iteration 136480/ 173500 | consumed samples: 34938880 | consumed tokens: 71554826240 | elapsed time per iteration (s): 0.12 | learning rate: 3.985E-05 | global batch size: 256 | lm loss: 4.515514E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2128.332 | TFLOPs: 7.92 | 7: iteration 136490/ 173500 | consumed samples: 34941440 | consumed tokens: 71560069120 | elapsed time per iteration (s): 0.08 | learning rate: 3.984E-05 | global batch size: 256 | lm loss: 4.506301E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.690 | TFLOPs: 11.56 | 7: iteration 136500/ 173500 | consumed samples: 34944000 | consumed tokens: 71565312000 | elapsed time per iteration (s): 0.08 | learning rate: 3.983E-05 | global batch size: 256 | lm loss: 4.515341E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.262 | TFLOPs: 12.00 | 7: iteration 136510/ 173500 | consumed samples: 34946560 | consumed tokens: 71570554880 | elapsed time per iteration (s): 0.08 | learning rate: 3.982E-05 | global batch size: 256 | lm loss: 4.516685E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.233 | TFLOPs: 11.93 | 7: iteration 136520/ 173500 | consumed samples: 34949120 | consumed tokens: 71575797760 | elapsed time per iteration (s): 0.08 | learning rate: 3.981E-05 | global batch size: 256 | lm loss: 4.512859E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.472 | TFLOPs: 11.44 | 7: iteration 136530/ 173500 | consumed samples: 34951680 | consumed tokens: 71581040640 | elapsed time per iteration (s): 0.12 | learning rate: 3.980E-05 | global batch size: 256 | lm loss: 4.514214E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2093.730 | TFLOPs: 7.79 | 7: iteration 136540/ 173500 | consumed samples: 34954240 | consumed tokens: 71586283520 | elapsed time per iteration (s): 0.11 | learning rate: 3.979E-05 | global batch size: 256 | lm loss: 4.502512E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2418.707 | TFLOPs: 9.00 | 7: iteration 136550/ 173500 | consumed samples: 34956800 | consumed tokens: 71591526400 | elapsed time per iteration (s): 0.13 | learning rate: 3.978E-05 | global batch size: 256 | lm loss: 4.507298E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1951.198 | TFLOPs: 7.26 | 7: iteration 136560/ 173500 | consumed samples: 34959360 | consumed tokens: 71596769280 | elapsed time per iteration (s): 0.12 | learning rate: 3.977E-05 | global batch size: 256 | lm loss: 4.496828E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2128.280 | TFLOPs: 7.92 | 7: iteration 136570/ 173500 | consumed samples: 34961920 | consumed tokens: 71602012160 | elapsed time per iteration (s): 0.08 | learning rate: 3.976E-05 | global batch size: 256 | lm loss: 4.513416E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.918 | TFLOPs: 11.97 | 7: iteration 136580/ 173500 | consumed samples: 34964480 | consumed tokens: 71607255040 | elapsed time per iteration (s): 0.13 | learning rate: 3.975E-05 | global batch size: 256 | lm loss: 4.507593E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1991.209 | TFLOPs: 7.41 | 7: iteration 136590/ 173500 | consumed samples: 34967040 | consumed tokens: 71612497920 | elapsed time per iteration (s): 0.08 | learning rate: 3.974E-05 | global batch size: 256 | lm loss: 4.512408E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.419 | TFLOPs: 11.91 | 7: iteration 136600/ 173500 | consumed samples: 34969600 | consumed tokens: 71617740800 | elapsed time per iteration (s): 0.08 | learning rate: 3.973E-05 | global batch size: 256 | lm loss: 4.508512E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.739 | TFLOPs: 11.80 | 7: iteration 136610/ 173500 | consumed samples: 34972160 | consumed tokens: 71622983680 | elapsed time per iteration (s): 0.08 | learning rate: 3.972E-05 | global batch size: 256 | lm loss: 4.511250E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.476 | TFLOPs: 11.96 | 7: iteration 136620/ 173500 | consumed samples: 34974720 | consumed tokens: 71628226560 | elapsed time per iteration (s): 0.08 | learning rate: 3.971E-05 | global batch size: 256 | lm loss: 4.511635E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.448 | TFLOPs: 12.00 | 7: iteration 136630/ 173500 | consumed samples: 34977280 | consumed tokens: 71633469440 | elapsed time per iteration (s): 0.08 | learning rate: 3.970E-05 | global batch size: 256 | lm loss: 4.507293E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.551 | TFLOPs: 11.66 | 7: iteration 136640/ 173500 | consumed samples: 34979840 | consumed tokens: 71638712320 | elapsed time per iteration (s): 0.08 | learning rate: 3.969E-05 | global batch size: 256 | lm loss: 4.508258E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.906 | TFLOPs: 12.00 | 7: iteration 136650/ 173500 | consumed samples: 34982400 | consumed tokens: 71643955200 | elapsed time per iteration (s): 0.09 | learning rate: 3.968E-05 | global batch size: 256 | lm loss: 4.499996E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2878.426 | TFLOPs: 10.71 | 7: iteration 136660/ 173500 | consumed samples: 34984960 | consumed tokens: 71649198080 | elapsed time per iteration (s): 0.08 | learning rate: 3.967E-05 | global batch size: 256 | lm loss: 4.520148E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.963 | TFLOPs: 11.96 | 7: iteration 136670/ 173500 | consumed samples: 34987520 | consumed tokens: 71654440960 | elapsed time per iteration (s): 0.09 | learning rate: 3.966E-05 | global batch size: 256 | lm loss: 4.510722E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.259 | TFLOPs: 10.33 | 7: iteration 136680/ 173500 | consumed samples: 34990080 | consumed tokens: 71659683840 | elapsed time per iteration (s): 0.08 | learning rate: 3.965E-05 | global batch size: 256 | lm loss: 4.512304E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.693 | TFLOPs: 11.85 | 7: iteration 136690/ 173500 | consumed samples: 34992640 | consumed tokens: 71664926720 | elapsed time per iteration (s): 0.09 | learning rate: 3.964E-05 | global batch size: 256 | lm loss: 4.499011E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2875.812 | TFLOPs: 10.70 | 7: iteration 136700/ 173500 | consumed samples: 34995200 | consumed tokens: 71670169600 | elapsed time per iteration (s): 0.08 | learning rate: 3.963E-05 | global batch size: 256 | lm loss: 4.494445E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.981 | TFLOPs: 11.29 | 7: iteration 136710/ 173500 | consumed samples: 34997760 | consumed tokens: 71675412480 | elapsed time per iteration (s): 0.08 | learning rate: 3.962E-05 | global batch size: 256 | lm loss: 4.504410E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.042 | TFLOPs: 11.80 | 7: iteration 136720/ 173500 | consumed samples: 35000320 | consumed tokens: 71680655360 | elapsed time per iteration (s): 0.10 | learning rate: 3.961E-05 | global batch size: 256 | lm loss: 4.513633E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2495.194 | TFLOPs: 9.28 | 7: iteration 136730/ 173500 | consumed samples: 35002880 | consumed tokens: 71685898240 | elapsed time per iteration (s): 0.08 | learning rate: 3.960E-05 | global batch size: 256 | lm loss: 4.511619E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.490 | TFLOPs: 11.93 | 7: iteration 136740/ 173500 | consumed samples: 35005440 | consumed tokens: 71691141120 | elapsed time per iteration (s): 0.08 | learning rate: 3.959E-05 | global batch size: 256 | lm loss: 4.506845E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.258 | TFLOPs: 11.90 | 7: iteration 136750/ 173500 | consumed samples: 35008000 | consumed tokens: 71696384000 | elapsed time per iteration (s): 0.10 | learning rate: 3.958E-05 | global batch size: 256 | lm loss: 4.504751E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2675.030 | TFLOPs: 9.95 | 7: iteration 136760/ 173500 | consumed samples: 35010560 | consumed tokens: 71701626880 | elapsed time per iteration (s): 0.09 | learning rate: 3.957E-05 | global batch size: 256 | lm loss: 4.517622E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2756.989 | TFLOPs: 10.25 | 7: iteration 136770/ 173500 | consumed samples: 35013120 | consumed tokens: 71706869760 | elapsed time per iteration (s): 0.08 | learning rate: 3.956E-05 | global batch size: 256 | lm loss: 4.509470E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.092 | TFLOPs: 11.78 | 7: iteration 136780/ 173500 | consumed samples: 35015680 | consumed tokens: 71712112640 | elapsed time per iteration (s): 0.09 | learning rate: 3.955E-05 | global batch size: 256 | lm loss: 4.503128E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2969.472 | TFLOPs: 11.05 | 7: iteration 136790/ 173500 | consumed samples: 35018240 | consumed tokens: 71717355520 | elapsed time per iteration (s): 0.08 | learning rate: 3.954E-05 | global batch size: 256 | lm loss: 4.505595E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.885 | TFLOPs: 11.29 | 7: iteration 136800/ 173500 | consumed samples: 35020800 | consumed tokens: 71722598400 | elapsed time per iteration (s): 0.10 | learning rate: 3.953E-05 | global batch size: 256 | lm loss: 4.517486E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.412 | TFLOPs: 9.16 | 7: iteration 136810/ 173500 | consumed samples: 35023360 | consumed tokens: 71727841280 | elapsed time per iteration (s): 0.10 | learning rate: 3.952E-05 | global batch size: 256 | lm loss: 4.501365E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.368 | TFLOPs: 9.15 | 7: iteration 136820/ 173500 | consumed samples: 35025920 | consumed tokens: 71733084160 | elapsed time per iteration (s): 0.10 | learning rate: 3.951E-05 | global batch size: 256 | lm loss: 4.506919E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2564.882 | TFLOPs: 9.54 | 7: iteration 136830/ 173500 | consumed samples: 35028480 | consumed tokens: 71738327040 | elapsed time per iteration (s): 0.13 | learning rate: 3.950E-05 | global batch size: 256 | lm loss: 4.507680E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1931.604 | TFLOPs: 7.18 | 7: iteration 136840/ 173500 | consumed samples: 35031040 | consumed tokens: 71743569920 | elapsed time per iteration (s): 0.13 | learning rate: 3.949E-05 | global batch size: 256 | lm loss: 4.515648E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1965.754 | TFLOPs: 7.31 | 7: iteration 136850/ 173500 | consumed samples: 35033600 | consumed tokens: 71748812800 | elapsed time per iteration (s): 0.12 | learning rate: 3.947E-05 | global batch size: 256 | lm loss: 4.507516E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2145.463 | TFLOPs: 7.98 | 7: iteration 136860/ 173500 | consumed samples: 35036160 | consumed tokens: 71754055680 | elapsed time per iteration (s): 0.08 | learning rate: 3.946E-05 | global batch size: 256 | lm loss: 4.524479E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.516 | TFLOPs: 11.89 | 7: iteration 136870/ 173500 | consumed samples: 35038720 | consumed tokens: 71759298560 | elapsed time per iteration (s): 0.08 | learning rate: 3.945E-05 | global batch size: 256 | lm loss: 4.505291E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.233 | TFLOPs: 12.00 | 7: iteration 136880/ 173500 | consumed samples: 35041280 | consumed tokens: 71764541440 | elapsed time per iteration (s): 0.08 | learning rate: 3.944E-05 | global batch size: 256 | lm loss: 4.498319E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.345 | TFLOPs: 11.46 | 7: iteration 136890/ 173500 | consumed samples: 35043840 | consumed tokens: 71769784320 | elapsed time per iteration (s): 0.10 | learning rate: 3.943E-05 | global batch size: 256 | lm loss: 4.512121E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2686.693 | TFLOPs: 9.99 | 7: iteration 136900/ 173500 | consumed samples: 35046400 | consumed tokens: 71775027200 | elapsed time per iteration (s): 0.13 | learning rate: 3.942E-05 | global batch size: 256 | lm loss: 4.502551E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2032.023 | TFLOPs: 7.56 | 7: iteration 136910/ 173500 | consumed samples: 35048960 | consumed tokens: 71780270080 | elapsed time per iteration (s): 0.13 | learning rate: 3.941E-05 | global batch size: 256 | lm loss: 4.512462E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1999.849 | TFLOPs: 7.44 | 7: iteration 136920/ 173500 | consumed samples: 35051520 | consumed tokens: 71785512960 | elapsed time per iteration (s): 0.08 | learning rate: 3.940E-05 | global batch size: 256 | lm loss: 4.509852E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.531 | TFLOPs: 11.47 | 7: iteration 136930/ 173500 | consumed samples: 35054080 | consumed tokens: 71790755840 | elapsed time per iteration (s): 0.08 | learning rate: 3.939E-05 | global batch size: 256 | lm loss: 4.507171E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.770 | TFLOPs: 11.37 | 7: iteration 136940/ 173500 | consumed samples: 35056640 | consumed tokens: 71795998720 | elapsed time per iteration (s): 0.09 | learning rate: 3.938E-05 | global batch size: 256 | lm loss: 4.509026E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.117 | TFLOPs: 10.37 | 7: iteration 136950/ 173500 | consumed samples: 35059200 | consumed tokens: 71801241600 | elapsed time per iteration (s): 0.10 | learning rate: 3.937E-05 | global batch size: 256 | lm loss: 4.518874E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2616.260 | TFLOPs: 9.73 | 7: iteration 136960/ 173500 | consumed samples: 35061760 | consumed tokens: 71806484480 | elapsed time per iteration (s): 0.12 | learning rate: 3.936E-05 | global batch size: 256 | lm loss: 4.510439E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.849 | TFLOPs: 8.02 | 7: iteration 136970/ 173500 | consumed samples: 35064320 | consumed tokens: 71811727360 | elapsed time per iteration (s): 0.08 | learning rate: 3.935E-05 | global batch size: 256 | lm loss: 4.506327E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.647 | TFLOPs: 11.86 | 7: iteration 136980/ 173500 | consumed samples: 35066880 | consumed tokens: 71816970240 | elapsed time per iteration (s): 0.08 | learning rate: 3.934E-05 | global batch size: 256 | lm loss: 4.504802E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.496 | TFLOPs: 11.86 | 7: iteration 136990/ 173500 | consumed samples: 35069440 | consumed tokens: 71822213120 | elapsed time per iteration (s): 0.24 | learning rate: 3.933E-05 | global batch size: 256 | lm loss: 4.507074E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1061.402 | TFLOPs: 3.95 | 7: iteration 137000/ 173500 | consumed samples: 35072000 | consumed tokens: 71827456000 | elapsed time per iteration (s): 0.08 | learning rate: 3.932E-05 | global batch size: 256 | lm loss: 4.514433E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.448 | TFLOPs: 11.61 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 137000 | lm loss value: 4.406133E+00 | lm loss PPL: 8.195196E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 137000 to checkpoints_14m91b100m 0: [2023-03-17 03:36:43,797] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step137000 is begin to save! 0: [2023-03-17 03:36:43,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:36:43,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:36:43,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:36:43,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:36:43,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:36:43,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:36:43,834] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:36:43,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:36:43,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:36:43,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:36:43,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:36:43,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:36:43,841] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step137000/mp_rank_00_model_states.pt 0: [2023-03-17 03:36:43,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:36:43,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:36:43,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:36:43,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 3: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 7: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: [2023-03-17 03:36:43,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:36:43,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 2: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 4: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 5: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:36:43,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:36:43,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 1: [2023-03-17 03:36:43,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:36:43,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:36:43,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 6: [2023-03-17 03:36:43,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:36:43,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step137000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:36:43,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step137000 is ready now! 0: successfully saved checkpoint at iteration 137000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.41 7: iteration 137010/ 173500 | consumed samples: 35074560 | consumed tokens: 71832698880 | elapsed time per iteration (s): 0.09 | learning rate: 3.931E-05 | global batch size: 256 | lm loss: 4.514516E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2762.568 | TFLOPs: 10.28 | 7: iteration 137020/ 173500 | consumed samples: 35077120 | consumed tokens: 71837941760 | elapsed time per iteration (s): 0.08 | learning rate: 3.930E-05 | global batch size: 256 | lm loss: 4.512329E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.991 | TFLOPs: 11.34 | 7: iteration 137030/ 173500 | consumed samples: 35079680 | consumed tokens: 71843184640 | elapsed time per iteration (s): 0.08 | learning rate: 3.929E-05 | global batch size: 256 | lm loss: 4.503654E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.197 | TFLOPs: 11.84 | 7: iteration 137040/ 173500 | consumed samples: 35082240 | consumed tokens: 71848427520 | elapsed time per iteration (s): 0.08 | learning rate: 3.928E-05 | global batch size: 256 | lm loss: 4.505616E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.406 | TFLOPs: 11.87 | 7: iteration 137050/ 173500 | consumed samples: 35084800 | consumed tokens: 71853670400 | elapsed time per iteration (s): 0.08 | learning rate: 3.927E-05 | global batch size: 256 | lm loss: 4.500134E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.523 | TFLOPs: 11.89 | 7: iteration 137060/ 173500 | consumed samples: 35087360 | consumed tokens: 71858913280 | elapsed time per iteration (s): 0.08 | learning rate: 3.926E-05 | global batch size: 256 | lm loss: 4.507630E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.179 | TFLOPs: 11.87 | 7: iteration 137070/ 173500 | consumed samples: 35089920 | consumed tokens: 71864156160 | elapsed time per iteration (s): 0.09 | learning rate: 3.925E-05 | global batch size: 256 | lm loss: 4.503971E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.279 | TFLOPs: 10.83 | 7: iteration 137080/ 173500 | consumed samples: 35092480 | consumed tokens: 71869399040 | elapsed time per iteration (s): 0.08 | learning rate: 3.924E-05 | global batch size: 256 | lm loss: 4.500149E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.049 | TFLOPs: 11.85 | 7: iteration 137090/ 173500 | consumed samples: 35095040 | consumed tokens: 71874641920 | elapsed time per iteration (s): 0.08 | learning rate: 3.923E-05 | global batch size: 256 | lm loss: 4.499317E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.874 | TFLOPs: 11.88 | 7: iteration 137100/ 173500 | consumed samples: 35097600 | consumed tokens: 71879884800 | elapsed time per iteration (s): 0.08 | learning rate: 3.922E-05 | global batch size: 256 | lm loss: 4.511025E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.030 | TFLOPs: 11.94 | 7: iteration 137110/ 173500 | consumed samples: 35100160 | consumed tokens: 71885127680 | elapsed time per iteration (s): 0.08 | learning rate: 3.921E-05 | global batch size: 256 | lm loss: 4.506860E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.887 | TFLOPs: 11.82 | 7: iteration 137120/ 173500 | consumed samples: 35102720 | consumed tokens: 71890370560 | elapsed time per iteration (s): 0.08 | learning rate: 3.920E-05 | global batch size: 256 | lm loss: 4.511381E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.672 | TFLOPs: 11.89 | 7: iteration 137130/ 173500 | consumed samples: 35105280 | consumed tokens: 71895613440 | elapsed time per iteration (s): 0.13 | learning rate: 3.919E-05 | global batch size: 256 | lm loss: 4.509442E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2029.333 | TFLOPs: 7.55 | 7: iteration 137140/ 173500 | consumed samples: 35107840 | consumed tokens: 71900856320 | elapsed time per iteration (s): 0.10 | learning rate: 3.918E-05 | global batch size: 256 | lm loss: 4.505488E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2613.992 | TFLOPs: 9.72 | 7: iteration 137150/ 173500 | consumed samples: 35110400 | consumed tokens: 71906099200 | elapsed time per iteration (s): 0.12 | learning rate: 3.917E-05 | global batch size: 256 | lm loss: 4.497101E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2164.364 | TFLOPs: 8.05 | 7: iteration 137160/ 173500 | consumed samples: 35112960 | consumed tokens: 71911342080 | elapsed time per iteration (s): 0.08 | learning rate: 3.916E-05 | global batch size: 256 | lm loss: 4.514568E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.617 | TFLOPs: 11.97 | 7: iteration 137170/ 173500 | consumed samples: 35115520 | consumed tokens: 71916584960 | elapsed time per iteration (s): 0.08 | learning rate: 3.915E-05 | global batch size: 256 | lm loss: 4.512263E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.993 | TFLOPs: 11.74 | 7: iteration 137180/ 173500 | consumed samples: 35118080 | consumed tokens: 71921827840 | elapsed time per iteration (s): 0.08 | learning rate: 3.914E-05 | global batch size: 256 | lm loss: 4.500528E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.714 | TFLOPs: 12.01 | 7: iteration 137190/ 173500 | consumed samples: 35120640 | consumed tokens: 71927070720 | elapsed time per iteration (s): 0.08 | learning rate: 3.913E-05 | global batch size: 256 | lm loss: 4.509702E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.943 | TFLOPs: 11.69 | 7: iteration 137200/ 173500 | consumed samples: 35123200 | consumed tokens: 71932313600 | elapsed time per iteration (s): 0.09 | learning rate: 3.912E-05 | global batch size: 256 | lm loss: 4.505743E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.806 | TFLOPs: 10.44 | 7: iteration 137210/ 173500 | consumed samples: 35125760 | consumed tokens: 71937556480 | elapsed time per iteration (s): 0.09 | learning rate: 3.911E-05 | global batch size: 256 | lm loss: 4.520640E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.617 | TFLOPs: 11.14 | 7: iteration 137220/ 173500 | consumed samples: 35128320 | consumed tokens: 71942799360 | elapsed time per iteration (s): 0.08 | learning rate: 3.910E-05 | global batch size: 256 | lm loss: 4.508972E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.529 | TFLOPs: 11.95 | 7: iteration 137230/ 173500 | consumed samples: 35130880 | consumed tokens: 71948042240 | elapsed time per iteration (s): 0.09 | learning rate: 3.909E-05 | global batch size: 256 | lm loss: 4.502644E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2920.529 | TFLOPs: 10.86 | 7: iteration 137240/ 173500 | consumed samples: 35133440 | consumed tokens: 71953285120 | elapsed time per iteration (s): 0.10 | learning rate: 3.908E-05 | global batch size: 256 | lm loss: 4.507680E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2637.783 | TFLOPs: 9.81 | 7: iteration 137250/ 173500 | consumed samples: 35136000 | consumed tokens: 71958528000 | elapsed time per iteration (s): 0.08 | learning rate: 3.907E-05 | global batch size: 256 | lm loss: 4.520020E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.141 | TFLOPs: 11.74 | 7: iteration 137260/ 173500 | consumed samples: 35138560 | consumed tokens: 71963770880 | elapsed time per iteration (s): 0.10 | learning rate: 3.906E-05 | global batch size: 256 | lm loss: 4.502431E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2522.275 | TFLOPs: 9.38 | 7: iteration 137270/ 173500 | consumed samples: 35141120 | consumed tokens: 71969013760 | elapsed time per iteration (s): 0.08 | learning rate: 3.905E-05 | global batch size: 256 | lm loss: 4.508874E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.742 | TFLOPs: 11.99 | 7: iteration 137280/ 173500 | consumed samples: 35143680 | consumed tokens: 71974256640 | elapsed time per iteration (s): 0.08 | learning rate: 3.904E-05 | global batch size: 256 | lm loss: 4.511048E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.444 | TFLOPs: 12.02 | 7: iteration 137290/ 173500 | consumed samples: 35146240 | consumed tokens: 71979499520 | elapsed time per iteration (s): 0.08 | learning rate: 3.903E-05 | global batch size: 256 | lm loss: 4.514865E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.174 | TFLOPs: 12.00 | 7: iteration 137300/ 173500 | consumed samples: 35148800 | consumed tokens: 71984742400 | elapsed time per iteration (s): 0.08 | learning rate: 3.902E-05 | global batch size: 256 | lm loss: 4.516965E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.632 | TFLOPs: 11.68 | 7: iteration 137310/ 173500 | consumed samples: 35151360 | consumed tokens: 71989985280 | elapsed time per iteration (s): 0.10 | learning rate: 3.901E-05 | global batch size: 256 | lm loss: 4.508256E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2598.925 | TFLOPs: 9.67 | 7: iteration 137320/ 173500 | consumed samples: 35153920 | consumed tokens: 71995228160 | elapsed time per iteration (s): 0.09 | learning rate: 3.900E-05 | global batch size: 256 | lm loss: 4.515008E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.281 | TFLOPs: 10.69 | 7: iteration 137330/ 173500 | consumed samples: 35156480 | consumed tokens: 72000471040 | elapsed time per iteration (s): 0.13 | learning rate: 3.899E-05 | global batch size: 256 | lm loss: 4.505861E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1999.964 | TFLOPs: 7.44 | 7: iteration 137340/ 173500 | consumed samples: 35159040 | consumed tokens: 72005713920 | elapsed time per iteration (s): 0.12 | learning rate: 3.898E-05 | global batch size: 256 | lm loss: 4.506184E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2120.029 | TFLOPs: 7.89 | 7: iteration 137350/ 173500 | consumed samples: 35161600 | consumed tokens: 72010956800 | elapsed time per iteration (s): 0.09 | learning rate: 3.897E-05 | global batch size: 256 | lm loss: 4.518694E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2889.602 | TFLOPs: 10.75 | 7: iteration 137360/ 173500 | consumed samples: 35164160 | consumed tokens: 72016199680 | elapsed time per iteration (s): 0.10 | learning rate: 3.896E-05 | global batch size: 256 | lm loss: 4.517893E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.412 | TFLOPs: 9.17 | 7: iteration 137370/ 173500 | consumed samples: 35166720 | consumed tokens: 72021442560 | elapsed time per iteration (s): 0.11 | learning rate: 3.895E-05 | global batch size: 256 | lm loss: 4.507265E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2377.397 | TFLOPs: 8.84 | 7: iteration 137380/ 173500 | consumed samples: 35169280 | consumed tokens: 72026685440 | elapsed time per iteration (s): 0.08 | learning rate: 3.894E-05 | global batch size: 256 | lm loss: 4.514591E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.441 | TFLOPs: 11.98 | 7: iteration 137390/ 173500 | consumed samples: 35171840 | consumed tokens: 72031928320 | elapsed time per iteration (s): 0.08 | learning rate: 3.893E-05 | global batch size: 256 | lm loss: 4.513767E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.939 | TFLOPs: 11.82 | 7: iteration 137400/ 173500 | consumed samples: 35174400 | consumed tokens: 72037171200 | elapsed time per iteration (s): 0.08 | learning rate: 3.892E-05 | global batch size: 256 | lm loss: 4.513399E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.730 | TFLOPs: 11.53 | 7: iteration 137410/ 173500 | consumed samples: 35176960 | consumed tokens: 72042414080 | elapsed time per iteration (s): 0.09 | learning rate: 3.891E-05 | global batch size: 256 | lm loss: 4.510812E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.117 | TFLOPs: 10.96 | 7: iteration 137420/ 173500 | consumed samples: 35179520 | consumed tokens: 72047656960 | elapsed time per iteration (s): 0.08 | learning rate: 3.890E-05 | global batch size: 256 | lm loss: 4.509114E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.707 | TFLOPs: 11.95 | 7: iteration 137430/ 173500 | consumed samples: 35182080 | consumed tokens: 72052899840 | elapsed time per iteration (s): 0.08 | learning rate: 3.889E-05 | global batch size: 256 | lm loss: 4.501052E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.123 | TFLOPs: 11.80 | 7: iteration 137440/ 173500 | consumed samples: 35184640 | consumed tokens: 72058142720 | elapsed time per iteration (s): 0.08 | learning rate: 3.888E-05 | global batch size: 256 | lm loss: 4.512611E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.882 | TFLOPs: 11.88 | 7: iteration 137450/ 173500 | consumed samples: 35187200 | consumed tokens: 72063385600 | elapsed time per iteration (s): 0.10 | learning rate: 3.887E-05 | global batch size: 256 | lm loss: 4.496830E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2472.117 | TFLOPs: 9.20 | 7: iteration 137460/ 173500 | consumed samples: 35189760 | consumed tokens: 72068628480 | elapsed time per iteration (s): 0.08 | learning rate: 3.886E-05 | global batch size: 256 | lm loss: 4.512674E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.786 | TFLOPs: 11.84 | 7: iteration 137470/ 173500 | consumed samples: 35192320 | consumed tokens: 72073871360 | elapsed time per iteration (s): 0.08 | learning rate: 3.885E-05 | global batch size: 256 | lm loss: 4.517604E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3056.123 | TFLOPs: 11.37 | 7: iteration 137480/ 173500 | consumed samples: 35194880 | consumed tokens: 72079114240 | elapsed time per iteration (s): 0.08 | learning rate: 3.884E-05 | global batch size: 256 | lm loss: 4.525123E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.053 | TFLOPs: 11.87 | 7: iteration 137490/ 173500 | consumed samples: 35197440 | consumed tokens: 72084357120 | elapsed time per iteration (s): 0.09 | learning rate: 3.883E-05 | global batch size: 256 | lm loss: 4.510368E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2725.354 | TFLOPs: 10.14 | 7: iteration 137500/ 173500 | consumed samples: 35200000 | consumed tokens: 72089600000 | elapsed time per iteration (s): 0.09 | learning rate: 3.882E-05 | global batch size: 256 | lm loss: 4.519076E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.593 | TFLOPs: 10.51 | 7: iteration 137510/ 173500 | consumed samples: 35202560 | consumed tokens: 72094842880 | elapsed time per iteration (s): 0.08 | learning rate: 3.881E-05 | global batch size: 256 | lm loss: 4.505995E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.705 | TFLOPs: 11.92 | 7: iteration 137520/ 173500 | consumed samples: 35205120 | consumed tokens: 72100085760 | elapsed time per iteration (s): 0.08 | learning rate: 3.880E-05 | global batch size: 256 | lm loss: 4.518355E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.861 | TFLOPs: 11.57 | 7: iteration 137530/ 173500 | consumed samples: 35207680 | consumed tokens: 72105328640 | elapsed time per iteration (s): 0.12 | learning rate: 3.879E-05 | global batch size: 256 | lm loss: 4.519117E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2090.741 | TFLOPs: 7.78 | 7: iteration 137540/ 173500 | consumed samples: 35210240 | consumed tokens: 72110571520 | elapsed time per iteration (s): 0.13 | learning rate: 3.878E-05 | global batch size: 256 | lm loss: 4.507414E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.824 | TFLOPs: 7.61 | 7: iteration 137550/ 173500 | consumed samples: 35212800 | consumed tokens: 72115814400 | elapsed time per iteration (s): 0.13 | learning rate: 3.876E-05 | global batch size: 256 | lm loss: 4.515450E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1945.266 | TFLOPs: 7.24 | 7: iteration 137560/ 173500 | consumed samples: 35215360 | consumed tokens: 72121057280 | elapsed time per iteration (s): 0.13 | learning rate: 3.875E-05 | global batch size: 256 | lm loss: 4.511153E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1901.370 | TFLOPs: 7.07 | 7: iteration 137570/ 173500 | consumed samples: 35217920 | consumed tokens: 72126300160 | elapsed time per iteration (s): 0.15 | learning rate: 3.874E-05 | global batch size: 256 | lm loss: 4.521344E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1674.213 | TFLOPs: 6.23 | 7: iteration 137580/ 173500 | consumed samples: 35220480 | consumed tokens: 72131543040 | elapsed time per iteration (s): 0.09 | learning rate: 3.873E-05 | global batch size: 256 | lm loss: 4.504475E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3011.434 | TFLOPs: 11.20 | 7: iteration 137590/ 173500 | consumed samples: 35223040 | consumed tokens: 72136785920 | elapsed time per iteration (s): 0.08 | learning rate: 3.872E-05 | global batch size: 256 | lm loss: 4.500974E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.360 | TFLOPs: 12.01 | 7: iteration 137600/ 173500 | consumed samples: 35225600 | consumed tokens: 72142028800 | elapsed time per iteration (s): 0.08 | learning rate: 3.871E-05 | global batch size: 256 | lm loss: 4.500772E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.937 | TFLOPs: 12.05 | 7: iteration 137610/ 173500 | consumed samples: 35228160 | consumed tokens: 72147271680 | elapsed time per iteration (s): 0.13 | learning rate: 3.870E-05 | global batch size: 256 | lm loss: 4.512549E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2040.603 | TFLOPs: 7.59 | 7: iteration 137620/ 173500 | consumed samples: 35230720 | consumed tokens: 72152514560 | elapsed time per iteration (s): 0.10 | learning rate: 3.869E-05 | global batch size: 256 | lm loss: 4.515000E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2618.305 | TFLOPs: 9.74 | 7: iteration 137630/ 173500 | consumed samples: 35233280 | consumed tokens: 72157757440 | elapsed time per iteration (s): 0.09 | learning rate: 3.868E-05 | global batch size: 256 | lm loss: 4.506902E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2775.861 | TFLOPs: 10.32 | 7: iteration 137640/ 173500 | consumed samples: 35235840 | consumed tokens: 72163000320 | elapsed time per iteration (s): 0.09 | learning rate: 3.867E-05 | global batch size: 256 | lm loss: 4.516931E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2757.123 | TFLOPs: 10.26 | 7: iteration 137650/ 173500 | consumed samples: 35238400 | consumed tokens: 72168243200 | elapsed time per iteration (s): 0.09 | learning rate: 3.866E-05 | global batch size: 256 | lm loss: 4.513180E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.372 | TFLOPs: 10.13 | 7: iteration 137660/ 173500 | consumed samples: 35240960 | consumed tokens: 72173486080 | elapsed time per iteration (s): 0.08 | learning rate: 3.865E-05 | global batch size: 256 | lm loss: 4.498512E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.000 | TFLOPs: 11.65 | 7: iteration 137670/ 173500 | consumed samples: 35243520 | consumed tokens: 72178728960 | elapsed time per iteration (s): 0.08 | learning rate: 3.864E-05 | global batch size: 256 | lm loss: 4.515893E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.208 | TFLOPs: 11.93 | 7: iteration 137680/ 173500 | consumed samples: 35246080 | consumed tokens: 72183971840 | elapsed time per iteration (s): 0.08 | learning rate: 3.863E-05 | global batch size: 256 | lm loss: 4.515205E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3015.371 | TFLOPs: 11.22 | 7: iteration 137690/ 173500 | consumed samples: 35248640 | consumed tokens: 72189214720 | elapsed time per iteration (s): 0.08 | learning rate: 3.862E-05 | global batch size: 256 | lm loss: 4.509088E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3245.365 | TFLOPs: 12.07 | 7: iteration 137700/ 173500 | consumed samples: 35251200 | consumed tokens: 72194457600 | elapsed time per iteration (s): 0.08 | learning rate: 3.861E-05 | global batch size: 256 | lm loss: 4.511384E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.278 | TFLOPs: 12.01 | 7: iteration 137710/ 173500 | consumed samples: 35253760 | consumed tokens: 72199700480 | elapsed time per iteration (s): 0.08 | learning rate: 3.860E-05 | global batch size: 256 | lm loss: 4.515643E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.908 | TFLOPs: 11.97 | 7: iteration 137720/ 173500 | consumed samples: 35256320 | consumed tokens: 72204943360 | elapsed time per iteration (s): 0.10 | learning rate: 3.859E-05 | global batch size: 256 | lm loss: 4.503983E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2663.289 | TFLOPs: 9.91 | 7: iteration 137730/ 173500 | consumed samples: 35258880 | consumed tokens: 72210186240 | elapsed time per iteration (s): 0.08 | learning rate: 3.858E-05 | global batch size: 256 | lm loss: 4.516600E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.426 | TFLOPs: 11.99 | 7: iteration 137740/ 173500 | consumed samples: 35261440 | consumed tokens: 72215429120 | elapsed time per iteration (s): 0.08 | learning rate: 3.857E-05 | global batch size: 256 | lm loss: 4.509341E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.904 | TFLOPs: 12.07 | 7: iteration 137750/ 173500 | consumed samples: 35264000 | consumed tokens: 72220672000 | elapsed time per iteration (s): 0.08 | learning rate: 3.856E-05 | global batch size: 256 | lm loss: 4.500687E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.976 | TFLOPs: 12.02 | 7: iteration 137760/ 173500 | consumed samples: 35266560 | consumed tokens: 72225914880 | elapsed time per iteration (s): 0.09 | learning rate: 3.855E-05 | global batch size: 256 | lm loss: 4.528766E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2842.325 | TFLOPs: 10.57 | 7: iteration 137770/ 173500 | consumed samples: 35269120 | consumed tokens: 72231157760 | elapsed time per iteration (s): 0.11 | learning rate: 3.854E-05 | global batch size: 256 | lm loss: 4.515206E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.896 | TFLOPs: 8.60 | 7: iteration 137780/ 173500 | consumed samples: 35271680 | consumed tokens: 72236400640 | elapsed time per iteration (s): 0.11 | learning rate: 3.853E-05 | global batch size: 256 | lm loss: 4.506974E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.822 | TFLOPs: 8.92 | 7: iteration 137790/ 173500 | consumed samples: 35274240 | consumed tokens: 72241643520 | elapsed time per iteration (s): 0.13 | learning rate: 3.852E-05 | global batch size: 256 | lm loss: 4.510878E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.258 | TFLOPs: 7.58 | 7: iteration 137800/ 173500 | consumed samples: 35276800 | consumed tokens: 72246886400 | elapsed time per iteration (s): 0.12 | learning rate: 3.851E-05 | global batch size: 256 | lm loss: 4.512217E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2154.799 | TFLOPs: 8.01 | 7: iteration 137810/ 173500 | consumed samples: 35279360 | consumed tokens: 72252129280 | elapsed time per iteration (s): 0.09 | learning rate: 3.850E-05 | global batch size: 256 | lm loss: 4.518908E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.487 | TFLOPs: 10.49 | 7: iteration 137820/ 173500 | consumed samples: 35281920 | consumed tokens: 72257372160 | elapsed time per iteration (s): 0.09 | learning rate: 3.849E-05 | global batch size: 256 | lm loss: 4.499688E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2957.859 | TFLOPs: 11.00 | 7: iteration 137830/ 173500 | consumed samples: 35284480 | consumed tokens: 72262615040 | elapsed time per iteration (s): 0.11 | learning rate: 3.848E-05 | global batch size: 256 | lm loss: 4.496046E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2351.995 | TFLOPs: 8.75 | 7: iteration 137840/ 173500 | consumed samples: 35287040 | consumed tokens: 72267857920 | elapsed time per iteration (s): 0.13 | learning rate: 3.847E-05 | global batch size: 256 | lm loss: 4.504941E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2047.260 | TFLOPs: 7.61 | 7: iteration 137850/ 173500 | consumed samples: 35289600 | consumed tokens: 72273100800 | elapsed time per iteration (s): 0.11 | learning rate: 3.846E-05 | global batch size: 256 | lm loss: 4.502053E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.137 | TFLOPs: 8.66 | 7: iteration 137860/ 173500 | consumed samples: 35292160 | consumed tokens: 72278343680 | elapsed time per iteration (s): 0.11 | learning rate: 3.845E-05 | global batch size: 256 | lm loss: 4.512356E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2267.774 | TFLOPs: 8.44 | 7: iteration 137870/ 173500 | consumed samples: 35294720 | consumed tokens: 72283586560 | elapsed time per iteration (s): 0.11 | learning rate: 3.844E-05 | global batch size: 256 | lm loss: 4.513523E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2308.678 | TFLOPs: 8.59 | 7: iteration 137880/ 173500 | consumed samples: 35297280 | consumed tokens: 72288829440 | elapsed time per iteration (s): 0.11 | learning rate: 3.843E-05 | global batch size: 256 | lm loss: 4.507269E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.647 | TFLOPs: 8.66 | 7: iteration 137890/ 173500 | consumed samples: 35299840 | consumed tokens: 72294072320 | elapsed time per iteration (s): 0.11 | learning rate: 3.842E-05 | global batch size: 256 | lm loss: 4.519302E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.669 | TFLOPs: 8.53 | 7: iteration 137900/ 173500 | consumed samples: 35302400 | consumed tokens: 72299315200 | elapsed time per iteration (s): 0.11 | learning rate: 3.841E-05 | global batch size: 256 | lm loss: 4.516542E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.914 | TFLOPs: 8.60 | 7: iteration 137910/ 173500 | consumed samples: 35304960 | consumed tokens: 72304558080 | elapsed time per iteration (s): 0.12 | learning rate: 3.840E-05 | global batch size: 256 | lm loss: 4.501253E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2129.852 | TFLOPs: 7.92 | 7: iteration 137920/ 173500 | consumed samples: 35307520 | consumed tokens: 72309800960 | elapsed time per iteration (s): 0.08 | learning rate: 3.839E-05 | global batch size: 256 | lm loss: 4.506581E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.899 | TFLOPs: 11.99 | 7: iteration 137930/ 173500 | consumed samples: 35310080 | consumed tokens: 72315043840 | elapsed time per iteration (s): 0.09 | learning rate: 3.838E-05 | global batch size: 256 | lm loss: 4.519418E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2879.057 | TFLOPs: 10.71 | 7: iteration 137940/ 173500 | consumed samples: 35312640 | consumed tokens: 72320286720 | elapsed time per iteration (s): 0.08 | learning rate: 3.837E-05 | global batch size: 256 | lm loss: 4.513407E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.392 | TFLOPs: 11.98 | 7: iteration 137950/ 173500 | consumed samples: 35315200 | consumed tokens: 72325529600 | elapsed time per iteration (s): 0.08 | learning rate: 3.836E-05 | global batch size: 256 | lm loss: 4.505840E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.429 | TFLOPs: 11.97 | 7: iteration 137960/ 173500 | consumed samples: 35317760 | consumed tokens: 72330772480 | elapsed time per iteration (s): 0.08 | learning rate: 3.835E-05 | global batch size: 256 | lm loss: 4.509002E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.165 | TFLOPs: 11.81 | 7: iteration 137970/ 173500 | consumed samples: 35320320 | consumed tokens: 72336015360 | elapsed time per iteration (s): 0.08 | learning rate: 3.834E-05 | global batch size: 256 | lm loss: 4.502222E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.612 | TFLOPs: 11.81 | 7: iteration 137980/ 173500 | consumed samples: 35322880 | consumed tokens: 72341258240 | elapsed time per iteration (s): 0.08 | learning rate: 3.833E-05 | global batch size: 256 | lm loss: 4.500161E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.111 | TFLOPs: 11.85 | 7: iteration 137990/ 173500 | consumed samples: 35325440 | consumed tokens: 72346501120 | elapsed time per iteration (s): 0.08 | learning rate: 3.832E-05 | global batch size: 256 | lm loss: 4.503583E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.058 | TFLOPs: 11.81 | 0: [2023-03-17 03:38:16,874] [INFO] [logging.py:68:log_dist] [Rank 0] step=138000, skipped=0, lr=[3.831464022325417e-05, 3.831464022325417e-05, 3.831464022325417e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 138000/ 173500 | consumed samples: 35328000 | consumed tokens: 72351744000 | elapsed time per iteration (s): 0.09 | learning rate: 3.831E-05 | global batch size: 256 | lm loss: 4.506619E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2843.650 | TFLOPs: 10.58 | 0: steps: 138000 loss: 4.4856 iter time (s): 0.096 samples/sec: 2674.370 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 138000 | lm loss value: 4.354590E+00 | lm loss PPL: 7.783494E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 138000 to checkpoints_14m91b100m 0: [2023-03-17 03:38:16,958] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step138000 is begin to save! 0: [2023-03-17 03:38:16,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:38:16,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:38:16,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:38:16,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:38:16,992] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:38:16,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:38:16,995] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:38:16,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:38:16,998] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:38:17,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:38:17,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:38:17,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:38:17,002] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step138000/mp_rank_00_model_states.pt 0: [2023-03-17 03:38:17,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:38:17,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:38:17,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:38:17,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:38:17,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 2: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 7: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 1: [2023-03-17 03:38:17,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 3: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 6: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 4: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:38:17,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:38:17,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 5: [2023-03-17 03:38:17,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:38:17,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step138000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:38:17,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step138000 is ready now! 0: successfully saved checkpoint at iteration 138000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.51 7: iteration 138010/ 173500 | consumed samples: 35330560 | consumed tokens: 72356986880 | elapsed time per iteration (s): 0.10 | learning rate: 3.830E-05 | global batch size: 256 | lm loss: 4.500902E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.877 | TFLOPs: 9.83 | 7: iteration 138020/ 173500 | consumed samples: 35333120 | consumed tokens: 72362229760 | elapsed time per iteration (s): 0.08 | learning rate: 3.829E-05 | global batch size: 256 | lm loss: 4.509561E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.522 | TFLOPs: 11.84 | 7: iteration 138030/ 173500 | consumed samples: 35335680 | consumed tokens: 72367472640 | elapsed time per iteration (s): 0.09 | learning rate: 3.828E-05 | global batch size: 256 | lm loss: 4.513494E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.398 | TFLOPs: 10.13 | 7: iteration 138040/ 173500 | consumed samples: 35338240 | consumed tokens: 72372715520 | elapsed time per iteration (s): 0.08 | learning rate: 3.827E-05 | global batch size: 256 | lm loss: 4.499405E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.584 | TFLOPs: 11.89 | 7: iteration 138050/ 173500 | consumed samples: 35340800 | consumed tokens: 72377958400 | elapsed time per iteration (s): 0.09 | learning rate: 3.826E-05 | global batch size: 256 | lm loss: 4.514436E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2916.826 | TFLOPs: 10.85 | 7: iteration 138060/ 173500 | consumed samples: 35343360 | consumed tokens: 72383201280 | elapsed time per iteration (s): 0.10 | learning rate: 3.825E-05 | global batch size: 256 | lm loss: 4.499701E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.018 | TFLOPs: 9.99 | 7: iteration 138070/ 173500 | consumed samples: 35345920 | consumed tokens: 72388444160 | elapsed time per iteration (s): 0.08 | learning rate: 3.825E-05 | global batch size: 256 | lm loss: 4.521098E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.495 | TFLOPs: 11.32 | 7: iteration 138080/ 173500 | consumed samples: 35348480 | consumed tokens: 72393687040 | elapsed time per iteration (s): 0.08 | learning rate: 3.824E-05 | global batch size: 256 | lm loss: 4.512587E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.253 | TFLOPs: 11.76 | 7: iteration 138090/ 173500 | consumed samples: 35351040 | consumed tokens: 72398929920 | elapsed time per iteration (s): 0.09 | learning rate: 3.823E-05 | global batch size: 256 | lm loss: 4.504636E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.202 | TFLOPs: 11.11 | 7: iteration 138100/ 173500 | consumed samples: 35353600 | consumed tokens: 72404172800 | elapsed time per iteration (s): 0.08 | learning rate: 3.822E-05 | global batch size: 256 | lm loss: 4.515668E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.974 | TFLOPs: 11.84 | 7: iteration 138110/ 173500 | consumed samples: 35356160 | consumed tokens: 72409415680 | elapsed time per iteration (s): 0.09 | learning rate: 3.821E-05 | global batch size: 256 | lm loss: 4.513675E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.967 | TFLOPs: 10.91 | 7: iteration 138120/ 173500 | consumed samples: 35358720 | consumed tokens: 72414658560 | elapsed time per iteration (s): 0.09 | learning rate: 3.820E-05 | global batch size: 256 | lm loss: 4.516570E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.955 | TFLOPs: 10.47 | 7: iteration 138130/ 173500 | consumed samples: 35361280 | consumed tokens: 72419901440 | elapsed time per iteration (s): 0.10 | learning rate: 3.819E-05 | global batch size: 256 | lm loss: 4.511803E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2466.958 | TFLOPs: 9.18 | 7: iteration 138140/ 173500 | consumed samples: 35363840 | consumed tokens: 72425144320 | elapsed time per iteration (s): 0.09 | learning rate: 3.818E-05 | global batch size: 256 | lm loss: 4.500760E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.237 | TFLOPs: 10.94 | 7: iteration 138150/ 173500 | consumed samples: 35366400 | consumed tokens: 72430387200 | elapsed time per iteration (s): 0.08 | learning rate: 3.817E-05 | global batch size: 256 | lm loss: 4.497213E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.191 | TFLOPs: 11.87 | 7: iteration 138160/ 173500 | consumed samples: 35368960 | consumed tokens: 72435630080 | elapsed time per iteration (s): 0.09 | learning rate: 3.816E-05 | global batch size: 256 | lm loss: 4.497788E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2812.450 | TFLOPs: 10.46 | 7: iteration 138170/ 173500 | consumed samples: 35371520 | consumed tokens: 72440872960 | elapsed time per iteration (s): 0.09 | learning rate: 3.815E-05 | global batch size: 256 | lm loss: 4.511450E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2845.260 | TFLOPs: 10.58 | 7: iteration 138180/ 173500 | consumed samples: 35374080 | consumed tokens: 72446115840 | elapsed time per iteration (s): 0.11 | learning rate: 3.814E-05 | global batch size: 256 | lm loss: 4.489100E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2424.296 | TFLOPs: 9.02 | 7: iteration 138190/ 173500 | consumed samples: 35376640 | consumed tokens: 72451358720 | elapsed time per iteration (s): 0.09 | learning rate: 3.813E-05 | global batch size: 256 | lm loss: 4.506547E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.955 | TFLOPs: 10.50 | 7: iteration 138200/ 173500 | consumed samples: 35379200 | consumed tokens: 72456601600 | elapsed time per iteration (s): 0.08 | learning rate: 3.812E-05 | global batch size: 256 | lm loss: 4.511438E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.408 | TFLOPs: 11.92 | 7: iteration 138210/ 173500 | consumed samples: 35381760 | consumed tokens: 72461844480 | elapsed time per iteration (s): 0.08 | learning rate: 3.811E-05 | global batch size: 256 | lm loss: 4.515613E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.127 | TFLOPs: 11.87 | 7: iteration 138220/ 173500 | consumed samples: 35384320 | consumed tokens: 72467087360 | elapsed time per iteration (s): 0.09 | learning rate: 3.810E-05 | global batch size: 256 | lm loss: 4.514236E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2703.460 | TFLOPs: 10.06 | 7: iteration 138230/ 173500 | consumed samples: 35386880 | consumed tokens: 72472330240 | elapsed time per iteration (s): 0.09 | learning rate: 3.809E-05 | global batch size: 256 | lm loss: 4.511169E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3010.001 | TFLOPs: 11.20 | 7: iteration 138240/ 173500 | consumed samples: 35389440 | consumed tokens: 72477573120 | elapsed time per iteration (s): 0.08 | learning rate: 3.808E-05 | global batch size: 256 | lm loss: 4.497680E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.799 | TFLOPs: 11.92 | 7: iteration 138250/ 173500 | consumed samples: 35392000 | consumed tokens: 72482816000 | elapsed time per iteration (s): 0.08 | learning rate: 3.807E-05 | global batch size: 256 | lm loss: 4.505272E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.376 | TFLOPs: 11.89 | 7: iteration 138260/ 173500 | consumed samples: 35394560 | consumed tokens: 72488058880 | elapsed time per iteration (s): 0.08 | learning rate: 3.806E-05 | global batch size: 256 | lm loss: 4.514404E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.329 | TFLOPs: 11.86 | 7: iteration 138270/ 173500 | consumed samples: 35397120 | consumed tokens: 72493301760 | elapsed time per iteration (s): 0.10 | learning rate: 3.805E-05 | global batch size: 256 | lm loss: 4.496559E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2536.120 | TFLOPs: 9.43 | 7: iteration 138280/ 173500 | consumed samples: 35399680 | consumed tokens: 72498544640 | elapsed time per iteration (s): 0.13 | learning rate: 3.804E-05 | global batch size: 256 | lm loss: 4.522424E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2039.424 | TFLOPs: 7.59 | 7: iteration 138290/ 173500 | consumed samples: 35402240 | consumed tokens: 72503787520 | elapsed time per iteration (s): 0.10 | learning rate: 3.803E-05 | global batch size: 256 | lm loss: 4.506735E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.692 | TFLOPs: 9.41 | 7: iteration 138300/ 173500 | consumed samples: 35404800 | consumed tokens: 72509030400 | elapsed time per iteration (s): 0.09 | learning rate: 3.802E-05 | global batch size: 256 | lm loss: 4.507909E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.787 | TFLOPs: 10.50 | 7: iteration 138310/ 173500 | consumed samples: 35407360 | consumed tokens: 72514273280 | elapsed time per iteration (s): 0.08 | learning rate: 3.801E-05 | global batch size: 256 | lm loss: 4.531673E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.876 | TFLOPs: 11.59 | 7: iteration 138320/ 173500 | consumed samples: 35409920 | consumed tokens: 72519516160 | elapsed time per iteration (s): 0.08 | learning rate: 3.800E-05 | global batch size: 256 | lm loss: 4.506995E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.936 | TFLOPs: 11.84 | 7: iteration 138330/ 173500 | consumed samples: 35412480 | consumed tokens: 72524759040 | elapsed time per iteration (s): 0.08 | learning rate: 3.799E-05 | global batch size: 256 | lm loss: 4.508793E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.636 | TFLOPs: 11.84 | 7: iteration 138340/ 173500 | consumed samples: 35415040 | consumed tokens: 72530001920 | elapsed time per iteration (s): 0.08 | learning rate: 3.798E-05 | global batch size: 256 | lm loss: 4.499867E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.473 | TFLOPs: 11.87 | 7: iteration 138350/ 173500 | consumed samples: 35417600 | consumed tokens: 72535244800 | elapsed time per iteration (s): 0.09 | learning rate: 3.797E-05 | global batch size: 256 | lm loss: 4.504581E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2766.824 | TFLOPs: 10.29 | 7: iteration 138360/ 173500 | consumed samples: 35420160 | consumed tokens: 72540487680 | elapsed time per iteration (s): 0.13 | learning rate: 3.796E-05 | global batch size: 256 | lm loss: 4.494722E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.273 | TFLOPs: 7.28 | 7: iteration 138370/ 173500 | consumed samples: 35422720 | consumed tokens: 72545730560 | elapsed time per iteration (s): 0.11 | learning rate: 3.795E-05 | global batch size: 256 | lm loss: 4.522393E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2300.302 | TFLOPs: 8.56 | 7: iteration 138380/ 173500 | consumed samples: 35425280 | consumed tokens: 72550973440 | elapsed time per iteration (s): 0.08 | learning rate: 3.794E-05 | global batch size: 256 | lm loss: 4.511297E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.315 | TFLOPs: 11.44 | 7: iteration 138390/ 173500 | consumed samples: 35427840 | consumed tokens: 72556216320 | elapsed time per iteration (s): 0.11 | learning rate: 3.793E-05 | global batch size: 256 | lm loss: 4.512296E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2227.691 | TFLOPs: 8.29 | 7: iteration 138400/ 173500 | consumed samples: 35430400 | consumed tokens: 72561459200 | elapsed time per iteration (s): 0.12 | learning rate: 3.792E-05 | global batch size: 256 | lm loss: 4.511325E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2147.081 | TFLOPs: 7.99 | 7: iteration 138410/ 173500 | consumed samples: 35432960 | consumed tokens: 72566702080 | elapsed time per iteration (s): 0.09 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 4.503374E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.761 | TFLOPs: 11.15 | 7: iteration 138420/ 173500 | consumed samples: 35435520 | consumed tokens: 72571944960 | elapsed time per iteration (s): 0.08 | learning rate: 3.790E-05 | global batch size: 256 | lm loss: 4.495229E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.377 | TFLOPs: 11.77 | 7: iteration 138430/ 173500 | consumed samples: 35438080 | consumed tokens: 72577187840 | elapsed time per iteration (s): 0.08 | learning rate: 3.789E-05 | global batch size: 256 | lm loss: 4.508171E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.160 | TFLOPs: 11.80 | 7: iteration 138440/ 173500 | consumed samples: 35440640 | consumed tokens: 72582430720 | elapsed time per iteration (s): 0.08 | learning rate: 3.788E-05 | global batch size: 256 | lm loss: 4.509715E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.290 | TFLOPs: 11.86 | 7: iteration 138450/ 173500 | consumed samples: 35443200 | consumed tokens: 72587673600 | elapsed time per iteration (s): 0.08 | learning rate: 3.787E-05 | global batch size: 256 | lm loss: 4.513985E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.783 | TFLOPs: 11.30 | 7: iteration 138460/ 173500 | consumed samples: 35445760 | consumed tokens: 72592916480 | elapsed time per iteration (s): 0.08 | learning rate: 3.786E-05 | global batch size: 256 | lm loss: 4.509670E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.133 | TFLOPs: 11.85 | 7: iteration 138470/ 173500 | consumed samples: 35448320 | consumed tokens: 72598159360 | elapsed time per iteration (s): 0.08 | learning rate: 3.785E-05 | global batch size: 256 | lm loss: 4.526059E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.015 | TFLOPs: 11.51 | 7: iteration 138480/ 173500 | consumed samples: 35450880 | consumed tokens: 72603402240 | elapsed time per iteration (s): 0.13 | learning rate: 3.784E-05 | global batch size: 256 | lm loss: 4.485633E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1986.378 | TFLOPs: 7.39 | 7: iteration 138490/ 173500 | consumed samples: 35453440 | consumed tokens: 72608645120 | elapsed time per iteration (s): 0.13 | learning rate: 3.783E-05 | global batch size: 256 | lm loss: 4.498729E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2006.151 | TFLOPs: 7.46 | 7: iteration 138500/ 173500 | consumed samples: 35456000 | consumed tokens: 72613888000 | elapsed time per iteration (s): 0.12 | learning rate: 3.782E-05 | global batch size: 256 | lm loss: 4.513786E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.297 | TFLOPs: 7.70 | 7: iteration 138510/ 173500 | consumed samples: 35458560 | consumed tokens: 72619130880 | elapsed time per iteration (s): 0.13 | learning rate: 3.781E-05 | global batch size: 256 | lm loss: 4.511100E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.282 | TFLOPs: 7.37 | 7: iteration 138520/ 173500 | consumed samples: 35461120 | consumed tokens: 72624373760 | elapsed time per iteration (s): 0.09 | learning rate: 3.780E-05 | global batch size: 256 | lm loss: 4.504929E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.629 | TFLOPs: 10.55 | 7: iteration 138530/ 173500 | consumed samples: 35463680 | consumed tokens: 72629616640 | elapsed time per iteration (s): 0.09 | learning rate: 3.779E-05 | global batch size: 256 | lm loss: 4.503656E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2882.082 | TFLOPs: 10.72 | 7: iteration 138540/ 173500 | consumed samples: 35466240 | consumed tokens: 72634859520 | elapsed time per iteration (s): 0.08 | learning rate: 3.778E-05 | global batch size: 256 | lm loss: 4.523696E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.421 | TFLOPs: 11.81 | 7: iteration 138550/ 173500 | consumed samples: 35468800 | consumed tokens: 72640102400 | elapsed time per iteration (s): 0.11 | learning rate: 3.777E-05 | global batch size: 256 | lm loss: 4.502898E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.399 | TFLOPs: 8.57 | 7: iteration 138560/ 173500 | consumed samples: 35471360 | consumed tokens: 72645345280 | elapsed time per iteration (s): 0.12 | learning rate: 3.776E-05 | global batch size: 256 | lm loss: 4.508409E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2209.373 | TFLOPs: 8.22 | 7: iteration 138570/ 173500 | consumed samples: 35473920 | consumed tokens: 72650588160 | elapsed time per iteration (s): 0.13 | learning rate: 3.775E-05 | global batch size: 256 | lm loss: 4.497456E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1938.488 | TFLOPs: 7.21 | 7: iteration 138580/ 173500 | consumed samples: 35476480 | consumed tokens: 72655831040 | elapsed time per iteration (s): 0.09 | learning rate: 3.774E-05 | global batch size: 256 | lm loss: 4.492511E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.849 | TFLOPs: 10.50 | 7: iteration 138590/ 173500 | consumed samples: 35479040 | consumed tokens: 72661073920 | elapsed time per iteration (s): 0.11 | learning rate: 3.773E-05 | global batch size: 256 | lm loss: 4.495801E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2235.473 | TFLOPs: 8.31 | 7: iteration 138600/ 173500 | consumed samples: 35481600 | consumed tokens: 72666316800 | elapsed time per iteration (s): 0.09 | learning rate: 3.772E-05 | global batch size: 256 | lm loss: 4.498675E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.797 | TFLOPs: 11.19 | 7: iteration 138610/ 173500 | consumed samples: 35484160 | consumed tokens: 72671559680 | elapsed time per iteration (s): 0.08 | learning rate: 3.771E-05 | global batch size: 256 | lm loss: 4.501523E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.864 | TFLOPs: 11.93 | 7: iteration 138620/ 173500 | consumed samples: 35486720 | consumed tokens: 72676802560 | elapsed time per iteration (s): 0.11 | learning rate: 3.770E-05 | global batch size: 256 | lm loss: 4.515079E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.946 | TFLOPs: 8.57 | 7: iteration 138630/ 173500 | consumed samples: 35489280 | consumed tokens: 72682045440 | elapsed time per iteration (s): 0.10 | learning rate: 3.769E-05 | global batch size: 256 | lm loss: 4.505386E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2493.154 | TFLOPs: 9.27 | 7: iteration 138640/ 173500 | consumed samples: 35491840 | consumed tokens: 72687288320 | elapsed time per iteration (s): 0.10 | learning rate: 3.768E-05 | global batch size: 256 | lm loss: 4.512757E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.497 | TFLOPs: 9.94 | 7: iteration 138650/ 173500 | consumed samples: 35494400 | consumed tokens: 72692531200 | elapsed time per iteration (s): 0.08 | learning rate: 3.767E-05 | global batch size: 256 | lm loss: 4.513622E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.777 | TFLOPs: 11.47 | 7: iteration 138660/ 173500 | consumed samples: 35496960 | consumed tokens: 72697774080 | elapsed time per iteration (s): 0.13 | learning rate: 3.766E-05 | global batch size: 256 | lm loss: 4.518013E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1956.664 | TFLOPs: 7.28 | 7: iteration 138670/ 173500 | consumed samples: 35499520 | consumed tokens: 72703016960 | elapsed time per iteration (s): 0.13 | learning rate: 3.765E-05 | global batch size: 256 | lm loss: 4.488511E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2000.003 | TFLOPs: 7.44 | 7: iteration 138680/ 173500 | consumed samples: 35502080 | consumed tokens: 72708259840 | elapsed time per iteration (s): 0.13 | learning rate: 3.764E-05 | global batch size: 256 | lm loss: 4.507184E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.588 | TFLOPs: 7.37 | 7: iteration 138690/ 173500 | consumed samples: 35504640 | consumed tokens: 72713502720 | elapsed time per iteration (s): 0.13 | learning rate: 3.763E-05 | global batch size: 256 | lm loss: 4.503022E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.187 | TFLOPs: 7.30 | 7: iteration 138700/ 173500 | consumed samples: 35507200 | consumed tokens: 72718745600 | elapsed time per iteration (s): 0.08 | learning rate: 3.762E-05 | global batch size: 256 | lm loss: 4.504292E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.038 | TFLOPs: 11.77 | 7: iteration 138710/ 173500 | consumed samples: 35509760 | consumed tokens: 72723988480 | elapsed time per iteration (s): 0.08 | learning rate: 3.761E-05 | global batch size: 256 | lm loss: 4.507658E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.044 | TFLOPs: 11.82 | 7: iteration 138720/ 173500 | consumed samples: 35512320 | consumed tokens: 72729231360 | elapsed time per iteration (s): 0.08 | learning rate: 3.760E-05 | global batch size: 256 | lm loss: 4.517351E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.793 | TFLOPs: 11.86 | 7: iteration 138730/ 173500 | consumed samples: 35514880 | consumed tokens: 72734474240 | elapsed time per iteration (s): 0.10 | learning rate: 3.759E-05 | global batch size: 256 | lm loss: 4.514102E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2611.152 | TFLOPs: 9.71 | 7: iteration 138740/ 173500 | consumed samples: 35517440 | consumed tokens: 72739717120 | elapsed time per iteration (s): 0.13 | learning rate: 3.758E-05 | global batch size: 256 | lm loss: 4.515204E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.123 | TFLOPs: 7.37 | 7: iteration 138750/ 173500 | consumed samples: 35520000 | consumed tokens: 72744960000 | elapsed time per iteration (s): 0.13 | learning rate: 3.757E-05 | global batch size: 256 | lm loss: 4.511430E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1943.794 | TFLOPs: 7.23 | 7: iteration 138760/ 173500 | consumed samples: 35522560 | consumed tokens: 72750202880 | elapsed time per iteration (s): 0.15 | learning rate: 3.757E-05 | global batch size: 256 | lm loss: 4.505936E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1726.342 | TFLOPs: 6.42 | 7: iteration 138770/ 173500 | consumed samples: 35525120 | consumed tokens: 72755445760 | elapsed time per iteration (s): 0.15 | learning rate: 3.756E-05 | global batch size: 256 | lm loss: 4.495204E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1741.920 | TFLOPs: 6.48 | 7: iteration 138780/ 173500 | consumed samples: 35527680 | consumed tokens: 72760688640 | elapsed time per iteration (s): 0.11 | learning rate: 3.755E-05 | global batch size: 256 | lm loss: 4.520869E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2322.544 | TFLOPs: 8.64 | 7: iteration 138790/ 173500 | consumed samples: 35530240 | consumed tokens: 72765931520 | elapsed time per iteration (s): 0.09 | learning rate: 3.754E-05 | global batch size: 256 | lm loss: 4.511048E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.641 | TFLOPs: 10.15 | 7: iteration 138800/ 173500 | consumed samples: 35532800 | consumed tokens: 72771174400 | elapsed time per iteration (s): 0.08 | learning rate: 3.753E-05 | global batch size: 256 | lm loss: 4.513536E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.194 | TFLOPs: 11.85 | 7: iteration 138810/ 173500 | consumed samples: 35535360 | consumed tokens: 72776417280 | elapsed time per iteration (s): 0.11 | learning rate: 3.752E-05 | global batch size: 256 | lm loss: 4.507402E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2304.964 | TFLOPs: 8.57 | 7: iteration 138820/ 173500 | consumed samples: 35537920 | consumed tokens: 72781660160 | elapsed time per iteration (s): 0.08 | learning rate: 3.751E-05 | global batch size: 256 | lm loss: 4.500261E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.675 | TFLOPs: 11.80 | 7: iteration 138830/ 173500 | consumed samples: 35540480 | consumed tokens: 72786903040 | elapsed time per iteration (s): 0.08 | learning rate: 3.750E-05 | global batch size: 256 | lm loss: 4.509692E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.514 | TFLOPs: 11.45 | 7: iteration 138840/ 173500 | consumed samples: 35543040 | consumed tokens: 72792145920 | elapsed time per iteration (s): 0.13 | learning rate: 3.749E-05 | global batch size: 256 | lm loss: 4.509651E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2020.480 | TFLOPs: 7.52 | 7: iteration 138850/ 173500 | consumed samples: 35545600 | consumed tokens: 72797388800 | elapsed time per iteration (s): 0.12 | learning rate: 3.748E-05 | global batch size: 256 | lm loss: 4.499311E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2096.459 | TFLOPs: 7.80 | 7: iteration 138860/ 173500 | consumed samples: 35548160 | consumed tokens: 72802631680 | elapsed time per iteration (s): 0.13 | learning rate: 3.747E-05 | global batch size: 256 | lm loss: 4.517299E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1907.373 | TFLOPs: 7.09 | 7: iteration 138870/ 173500 | consumed samples: 35550720 | consumed tokens: 72807874560 | elapsed time per iteration (s): 0.08 | learning rate: 3.746E-05 | global batch size: 256 | lm loss: 4.512489E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.482 | TFLOPs: 11.25 | 7: iteration 138880/ 173500 | consumed samples: 35553280 | consumed tokens: 72813117440 | elapsed time per iteration (s): 0.08 | learning rate: 3.745E-05 | global batch size: 256 | lm loss: 4.507844E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.604 | TFLOPs: 11.95 | 7: iteration 138890/ 173500 | consumed samples: 35555840 | consumed tokens: 72818360320 | elapsed time per iteration (s): 0.08 | learning rate: 3.744E-05 | global batch size: 256 | lm loss: 4.508379E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.268 | TFLOPs: 11.94 | 7: iteration 138900/ 173500 | consumed samples: 35558400 | consumed tokens: 72823603200 | elapsed time per iteration (s): 0.08 | learning rate: 3.743E-05 | global batch size: 256 | lm loss: 4.508058E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.298 | TFLOPs: 11.55 | 7: iteration 138910/ 173500 | consumed samples: 35560960 | consumed tokens: 72828846080 | elapsed time per iteration (s): 0.08 | learning rate: 3.742E-05 | global batch size: 256 | lm loss: 4.509959E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3017.362 | TFLOPs: 11.22 | 7: iteration 138920/ 173500 | consumed samples: 35563520 | consumed tokens: 72834088960 | elapsed time per iteration (s): 0.08 | learning rate: 3.741E-05 | global batch size: 256 | lm loss: 4.497974E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.347 | TFLOPs: 11.88 | 7: iteration 138930/ 173500 | consumed samples: 35566080 | consumed tokens: 72839331840 | elapsed time per iteration (s): 0.08 | learning rate: 3.740E-05 | global batch size: 256 | lm loss: 4.504017E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.703 | TFLOPs: 11.99 | 7: iteration 138940/ 173500 | consumed samples: 35568640 | consumed tokens: 72844574720 | elapsed time per iteration (s): 0.08 | learning rate: 3.739E-05 | global batch size: 256 | lm loss: 4.511515E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.149 | TFLOPs: 12.06 | 7: iteration 138950/ 173500 | consumed samples: 35571200 | consumed tokens: 72849817600 | elapsed time per iteration (s): 0.12 | learning rate: 3.738E-05 | global batch size: 256 | lm loss: 4.504466E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2187.062 | TFLOPs: 8.13 | 7: iteration 138960/ 173500 | consumed samples: 35573760 | consumed tokens: 72855060480 | elapsed time per iteration (s): 0.13 | learning rate: 3.737E-05 | global batch size: 256 | lm loss: 4.498346E+00 | grad norm: 0.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.088 | TFLOPs: 7.58 | 7: iteration 138970/ 173500 | consumed samples: 35576320 | consumed tokens: 72860303360 | elapsed time per iteration (s): 0.13 | learning rate: 3.736E-05 | global batch size: 256 | lm loss: 4.512112E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2019.168 | TFLOPs: 7.51 | 7: iteration 138980/ 173500 | consumed samples: 35578880 | consumed tokens: 72865546240 | elapsed time per iteration (s): 0.13 | learning rate: 3.735E-05 | global batch size: 256 | lm loss: 4.510711E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1963.633 | TFLOPs: 7.30 | 7: iteration 138990/ 173500 | consumed samples: 35581440 | consumed tokens: 72870789120 | elapsed time per iteration (s): 0.15 | learning rate: 3.734E-05 | global batch size: 256 | lm loss: 4.505671E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1757.652 | TFLOPs: 6.54 | 7: iteration 139000/ 173500 | consumed samples: 35584000 | consumed tokens: 72876032000 | elapsed time per iteration (s): 0.13 | learning rate: 3.733E-05 | global batch size: 256 | lm loss: 4.507901E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1988.146 | TFLOPs: 7.40 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 139000 | lm loss value: 4.405880E+00 | lm loss PPL: 8.193121E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 139000 to checkpoints_14m91b100m 0: [2023-03-17 03:39:56,061] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step139000 is begin to save! 0: [2023-03-17 03:39:56,065] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:39:56,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:39:56,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:39:56,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:39:56,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:39:56,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:39:56,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:39:56,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:39:56,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:39:56,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:39:56,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:39:56,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:39:56,105] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step139000/mp_rank_00_model_states.pt 0: [2023-03-17 03:39:56,105] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:39:56,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:39:56,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:39:56,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:39:56,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:39:56,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 7: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 7: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 03:39:56,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 5: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 3: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 4: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 1: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 2: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step139000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 6: [2023-03-17 03:39:56,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step139000 is ready now! 0: successfully saved checkpoint at iteration 139000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.20 7: iteration 139010/ 173500 | consumed samples: 35586560 | consumed tokens: 72881274880 | elapsed time per iteration (s): 0.14 | learning rate: 3.732E-05 | global batch size: 256 | lm loss: 4.514814E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1767.969 | TFLOPs: 6.58 | 7: iteration 139020/ 173500 | consumed samples: 35589120 | consumed tokens: 72886517760 | elapsed time per iteration (s): 0.13 | learning rate: 3.731E-05 | global batch size: 256 | lm loss: 4.492788E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.625 | TFLOPs: 7.61 | 7: iteration 139030/ 173500 | consumed samples: 35591680 | consumed tokens: 72891760640 | elapsed time per iteration (s): 0.13 | learning rate: 3.730E-05 | global batch size: 256 | lm loss: 4.505986E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.216 | TFLOPs: 7.28 | 7: iteration 139040/ 173500 | consumed samples: 35594240 | consumed tokens: 72897003520 | elapsed time per iteration (s): 0.12 | learning rate: 3.729E-05 | global batch size: 256 | lm loss: 4.514611E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2114.104 | TFLOPs: 7.86 | 7: iteration 139050/ 173500 | consumed samples: 35596800 | consumed tokens: 72902246400 | elapsed time per iteration (s): 0.08 | learning rate: 3.728E-05 | global batch size: 256 | lm loss: 4.510372E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.889 | TFLOPs: 11.95 | 7: iteration 139060/ 173500 | consumed samples: 35599360 | consumed tokens: 72907489280 | elapsed time per iteration (s): 0.08 | learning rate: 3.727E-05 | global batch size: 256 | lm loss: 4.516681E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.109 | TFLOPs: 11.79 | 7: iteration 139070/ 173500 | consumed samples: 35601920 | consumed tokens: 72912732160 | elapsed time per iteration (s): 0.08 | learning rate: 3.726E-05 | global batch size: 256 | lm loss: 4.501389E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.904 | TFLOPs: 11.96 | 7: iteration 139080/ 173500 | consumed samples: 35604480 | consumed tokens: 72917975040 | elapsed time per iteration (s): 0.08 | learning rate: 3.725E-05 | global batch size: 256 | lm loss: 4.519302E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.128 | TFLOPs: 11.94 | 7: iteration 139090/ 173500 | consumed samples: 35607040 | consumed tokens: 72923217920 | elapsed time per iteration (s): 0.13 | learning rate: 3.724E-05 | global batch size: 256 | lm loss: 4.510453E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2001.476 | TFLOPs: 7.44 | 7: iteration 139100/ 173500 | consumed samples: 35609600 | consumed tokens: 72928460800 | elapsed time per iteration (s): 0.09 | learning rate: 3.723E-05 | global batch size: 256 | lm loss: 4.500310E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2726.448 | TFLOPs: 10.14 | 7: iteration 139110/ 173500 | consumed samples: 35612160 | consumed tokens: 72933703680 | elapsed time per iteration (s): 0.08 | learning rate: 3.722E-05 | global batch size: 256 | lm loss: 4.508310E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.305 | TFLOPs: 11.91 | 7: iteration 139120/ 173500 | consumed samples: 35614720 | consumed tokens: 72938946560 | elapsed time per iteration (s): 0.10 | learning rate: 3.722E-05 | global batch size: 256 | lm loss: 4.512717E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2474.005 | TFLOPs: 9.20 | 7: iteration 139130/ 173500 | consumed samples: 35617280 | consumed tokens: 72944189440 | elapsed time per iteration (s): 0.10 | learning rate: 3.721E-05 | global batch size: 256 | lm loss: 4.510424E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2571.150 | TFLOPs: 9.56 | 7: iteration 139140/ 173500 | consumed samples: 35619840 | consumed tokens: 72949432320 | elapsed time per iteration (s): 0.09 | learning rate: 3.720E-05 | global batch size: 256 | lm loss: 4.514291E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.725 | TFLOPs: 11.13 | 7: iteration 139150/ 173500 | consumed samples: 35622400 | consumed tokens: 72954675200 | elapsed time per iteration (s): 0.11 | learning rate: 3.719E-05 | global batch size: 256 | lm loss: 4.511475E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2244.850 | TFLOPs: 8.35 | 7: iteration 139160/ 173500 | consumed samples: 35624960 | consumed tokens: 72959918080 | elapsed time per iteration (s): 0.08 | learning rate: 3.718E-05 | global batch size: 256 | lm loss: 4.515855E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.244 | TFLOPs: 11.66 | 7: iteration 139170/ 173500 | consumed samples: 35627520 | consumed tokens: 72965160960 | elapsed time per iteration (s): 0.12 | learning rate: 3.717E-05 | global batch size: 256 | lm loss: 4.506686E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2186.553 | TFLOPs: 8.13 | 7: iteration 139180/ 173500 | consumed samples: 35630080 | consumed tokens: 72970403840 | elapsed time per iteration (s): 0.13 | learning rate: 3.716E-05 | global batch size: 256 | lm loss: 4.512849E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1934.070 | TFLOPs: 7.19 | 7: iteration 139190/ 173500 | consumed samples: 35632640 | consumed tokens: 72975646720 | elapsed time per iteration (s): 0.13 | learning rate: 3.715E-05 | global batch size: 256 | lm loss: 4.513451E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.511 | TFLOPs: 7.35 | 7: iteration 139200/ 173500 | consumed samples: 35635200 | consumed tokens: 72980889600 | elapsed time per iteration (s): 0.11 | learning rate: 3.714E-05 | global batch size: 256 | lm loss: 4.510058E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2258.436 | TFLOPs: 8.40 | 7: iteration 139210/ 173500 | consumed samples: 35637760 | consumed tokens: 72986132480 | elapsed time per iteration (s): 0.11 | learning rate: 3.713E-05 | global batch size: 256 | lm loss: 4.512874E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.057 | TFLOPs: 8.63 | 7: iteration 139220/ 173500 | consumed samples: 35640320 | consumed tokens: 72991375360 | elapsed time per iteration (s): 0.10 | learning rate: 3.712E-05 | global batch size: 256 | lm loss: 4.498097E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2457.841 | TFLOPs: 9.14 | 7: iteration 139230/ 173500 | consumed samples: 35642880 | consumed tokens: 72996618240 | elapsed time per iteration (s): 0.09 | learning rate: 3.711E-05 | global batch size: 256 | lm loss: 4.502890E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.031 | TFLOPs: 10.41 | 7: iteration 139240/ 173500 | consumed samples: 35645440 | consumed tokens: 73001861120 | elapsed time per iteration (s): 0.13 | learning rate: 3.710E-05 | global batch size: 256 | lm loss: 4.519967E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2015.141 | TFLOPs: 7.50 | 7: iteration 139250/ 173500 | consumed samples: 35648000 | consumed tokens: 73007104000 | elapsed time per iteration (s): 0.13 | learning rate: 3.709E-05 | global batch size: 256 | lm loss: 4.484771E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2023.879 | TFLOPs: 7.53 | 7: iteration 139260/ 173500 | consumed samples: 35650560 | consumed tokens: 73012346880 | elapsed time per iteration (s): 0.13 | learning rate: 3.708E-05 | global batch size: 256 | lm loss: 4.513418E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1970.821 | TFLOPs: 7.33 | 7: iteration 139270/ 173500 | consumed samples: 35653120 | consumed tokens: 73017589760 | elapsed time per iteration (s): 0.13 | learning rate: 3.707E-05 | global batch size: 256 | lm loss: 4.522728E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1944.543 | TFLOPs: 7.23 | 7: iteration 139280/ 173500 | consumed samples: 35655680 | consumed tokens: 73022832640 | elapsed time per iteration (s): 0.14 | learning rate: 3.706E-05 | global batch size: 256 | lm loss: 4.514653E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1806.170 | TFLOPs: 6.72 | 7: iteration 139290/ 173500 | consumed samples: 35658240 | consumed tokens: 73028075520 | elapsed time per iteration (s): 0.14 | learning rate: 3.705E-05 | global batch size: 256 | lm loss: 4.521996E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1827.570 | TFLOPs: 6.80 | 7: iteration 139300/ 173500 | consumed samples: 35660800 | consumed tokens: 73033318400 | elapsed time per iteration (s): 0.12 | learning rate: 3.704E-05 | global batch size: 256 | lm loss: 4.509552E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2087.971 | TFLOPs: 7.77 | 7: iteration 139310/ 173500 | consumed samples: 35663360 | consumed tokens: 73038561280 | elapsed time per iteration (s): 0.11 | learning rate: 3.703E-05 | global batch size: 256 | lm loss: 4.501369E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.305 | TFLOPs: 8.91 | 7: iteration 139320/ 173500 | consumed samples: 35665920 | consumed tokens: 73043804160 | elapsed time per iteration (s): 0.11 | learning rate: 3.702E-05 | global batch size: 256 | lm loss: 4.522595E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2335.898 | TFLOPs: 8.69 | 7: iteration 139330/ 173500 | consumed samples: 35668480 | consumed tokens: 73049047040 | elapsed time per iteration (s): 0.11 | learning rate: 3.701E-05 | global batch size: 256 | lm loss: 4.500235E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.708 | TFLOPs: 8.50 | 7: iteration 139340/ 173500 | consumed samples: 35671040 | consumed tokens: 73054289920 | elapsed time per iteration (s): 0.10 | learning rate: 3.700E-05 | global batch size: 256 | lm loss: 4.519734E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2692.207 | TFLOPs: 10.01 | 7: iteration 139350/ 173500 | consumed samples: 35673600 | consumed tokens: 73059532800 | elapsed time per iteration (s): 0.09 | learning rate: 3.699E-05 | global batch size: 256 | lm loss: 4.484948E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.315 | TFLOPs: 10.07 | 7: iteration 139360/ 173500 | consumed samples: 35676160 | consumed tokens: 73064775680 | elapsed time per iteration (s): 0.11 | learning rate: 3.698E-05 | global batch size: 256 | lm loss: 4.499960E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2261.992 | TFLOPs: 8.41 | 7: iteration 139370/ 173500 | consumed samples: 35678720 | consumed tokens: 73070018560 | elapsed time per iteration (s): 0.11 | learning rate: 3.697E-05 | global batch size: 256 | lm loss: 4.499141E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2359.219 | TFLOPs: 8.78 | 7: iteration 139380/ 173500 | consumed samples: 35681280 | consumed tokens: 73075261440 | elapsed time per iteration (s): 0.14 | learning rate: 3.696E-05 | global batch size: 256 | lm loss: 4.496498E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1851.714 | TFLOPs: 6.89 | 7: iteration 139390/ 173500 | consumed samples: 35683840 | consumed tokens: 73080504320 | elapsed time per iteration (s): 0.13 | learning rate: 3.695E-05 | global batch size: 256 | lm loss: 4.510811E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1907.299 | TFLOPs: 7.09 | 7: iteration 139400/ 173500 | consumed samples: 35686400 | consumed tokens: 73085747200 | elapsed time per iteration (s): 0.10 | learning rate: 3.694E-05 | global batch size: 256 | lm loss: 4.502944E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2492.242 | TFLOPs: 9.27 | 7: iteration 139410/ 173500 | consumed samples: 35688960 | consumed tokens: 73090990080 | elapsed time per iteration (s): 0.08 | learning rate: 3.694E-05 | global batch size: 256 | lm loss: 4.506513E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.765 | TFLOPs: 11.24 | 7: iteration 139420/ 173500 | consumed samples: 35691520 | consumed tokens: 73096232960 | elapsed time per iteration (s): 0.08 | learning rate: 3.693E-05 | global batch size: 256 | lm loss: 4.493039E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.867 | TFLOPs: 11.85 | 7: iteration 139430/ 173500 | consumed samples: 35694080 | consumed tokens: 73101475840 | elapsed time per iteration (s): 0.08 | learning rate: 3.692E-05 | global batch size: 256 | lm loss: 4.512617E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.822 | TFLOPs: 11.89 | 7: iteration 139440/ 173500 | consumed samples: 35696640 | consumed tokens: 73106718720 | elapsed time per iteration (s): 0.08 | learning rate: 3.691E-05 | global batch size: 256 | lm loss: 4.503201E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.769 | TFLOPs: 11.98 | 7: iteration 139450/ 173500 | consumed samples: 35699200 | consumed tokens: 73111961600 | elapsed time per iteration (s): 0.08 | learning rate: 3.690E-05 | global batch size: 256 | lm loss: 4.508946E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.985 | TFLOPs: 11.97 | 7: iteration 139460/ 173500 | consumed samples: 35701760 | consumed tokens: 73117204480 | elapsed time per iteration (s): 0.08 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 4.516069E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.748 | TFLOPs: 12.01 | 7: iteration 139470/ 173500 | consumed samples: 35704320 | consumed tokens: 73122447360 | elapsed time per iteration (s): 0.09 | learning rate: 3.688E-05 | global batch size: 256 | lm loss: 4.494296E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2704.027 | TFLOPs: 10.06 | 7: iteration 139480/ 173500 | consumed samples: 35706880 | consumed tokens: 73127690240 | elapsed time per iteration (s): 0.08 | learning rate: 3.687E-05 | global batch size: 256 | lm loss: 4.511472E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.595 | TFLOPs: 11.99 | 7: iteration 139490/ 173500 | consumed samples: 35709440 | consumed tokens: 73132933120 | elapsed time per iteration (s): 0.08 | learning rate: 3.686E-05 | global batch size: 256 | lm loss: 4.516748E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.817 | TFLOPs: 11.59 | 7: iteration 139500/ 173500 | consumed samples: 35712000 | consumed tokens: 73138176000 | elapsed time per iteration (s): 0.08 | learning rate: 3.685E-05 | global batch size: 256 | lm loss: 4.505270E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.146 | TFLOPs: 12.04 | 7: iteration 139510/ 173500 | consumed samples: 35714560 | consumed tokens: 73143418880 | elapsed time per iteration (s): 0.09 | learning rate: 3.684E-05 | global batch size: 256 | lm loss: 4.515115E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2954.450 | TFLOPs: 10.99 | 7: iteration 139520/ 173500 | consumed samples: 35717120 | consumed tokens: 73148661760 | elapsed time per iteration (s): 0.09 | learning rate: 3.683E-05 | global batch size: 256 | lm loss: 4.517769E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.103 | TFLOPs: 10.74 | 7: iteration 139530/ 173500 | consumed samples: 35719680 | consumed tokens: 73153904640 | elapsed time per iteration (s): 0.08 | learning rate: 3.682E-05 | global batch size: 256 | lm loss: 4.514954E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.615 | TFLOPs: 12.02 | 7: iteration 139540/ 173500 | consumed samples: 35722240 | consumed tokens: 73159147520 | elapsed time per iteration (s): 0.08 | learning rate: 3.681E-05 | global batch size: 256 | lm loss: 4.498380E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.528 | TFLOPs: 12.02 | 7: iteration 139550/ 173500 | consumed samples: 35724800 | consumed tokens: 73164390400 | elapsed time per iteration (s): 0.11 | learning rate: 3.680E-05 | global batch size: 256 | lm loss: 4.498466E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.564 | TFLOPs: 8.91 | 7: iteration 139560/ 173500 | consumed samples: 35727360 | consumed tokens: 73169633280 | elapsed time per iteration (s): 0.11 | learning rate: 3.679E-05 | global batch size: 256 | lm loss: 4.517046E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2317.626 | TFLOPs: 8.62 | 7: iteration 139570/ 173500 | consumed samples: 35729920 | consumed tokens: 73174876160 | elapsed time per iteration (s): 0.11 | learning rate: 3.678E-05 | global batch size: 256 | lm loss: 4.512839E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.362 | TFLOPs: 8.82 | 7: iteration 139580/ 173500 | consumed samples: 35732480 | consumed tokens: 73180119040 | elapsed time per iteration (s): 0.11 | learning rate: 3.677E-05 | global batch size: 256 | lm loss: 4.497728E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.693 | TFLOPs: 8.78 | 7: iteration 139590/ 173500 | consumed samples: 35735040 | consumed tokens: 73185361920 | elapsed time per iteration (s): 0.10 | learning rate: 3.676E-05 | global batch size: 256 | lm loss: 4.516035E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2443.897 | TFLOPs: 9.09 | 7: iteration 139600/ 173500 | consumed samples: 35737600 | consumed tokens: 73190604800 | elapsed time per iteration (s): 0.08 | learning rate: 3.675E-05 | global batch size: 256 | lm loss: 4.519351E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.009 | TFLOPs: 11.24 | 7: iteration 139610/ 173500 | consumed samples: 35740160 | consumed tokens: 73195847680 | elapsed time per iteration (s): 0.08 | learning rate: 3.674E-05 | global batch size: 256 | lm loss: 4.512027E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.742 | TFLOPs: 11.33 | 7: iteration 139620/ 173500 | consumed samples: 35742720 | consumed tokens: 73201090560 | elapsed time per iteration (s): 0.08 | learning rate: 3.673E-05 | global batch size: 256 | lm loss: 4.520401E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.737 | TFLOPs: 11.61 | 7: iteration 139630/ 173500 | consumed samples: 35745280 | consumed tokens: 73206333440 | elapsed time per iteration (s): 0.10 | learning rate: 3.672E-05 | global batch size: 256 | lm loss: 4.496740E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2687.582 | TFLOPs: 10.00 | 7: iteration 139640/ 173500 | consumed samples: 35747840 | consumed tokens: 73211576320 | elapsed time per iteration (s): 0.08 | learning rate: 3.671E-05 | global batch size: 256 | lm loss: 4.499562E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.038 | TFLOPs: 11.83 | 7: iteration 139650/ 173500 | consumed samples: 35750400 | consumed tokens: 73216819200 | elapsed time per iteration (s): 0.08 | learning rate: 3.671E-05 | global batch size: 256 | lm loss: 4.500835E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.258 | TFLOPs: 11.84 | 7: iteration 139660/ 173500 | consumed samples: 35752960 | consumed tokens: 73222062080 | elapsed time per iteration (s): 0.11 | learning rate: 3.670E-05 | global batch size: 256 | lm loss: 4.499538E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2341.165 | TFLOPs: 8.71 | 7: iteration 139670/ 173500 | consumed samples: 35755520 | consumed tokens: 73227304960 | elapsed time per iteration (s): 0.11 | learning rate: 3.669E-05 | global batch size: 256 | lm loss: 4.501945E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.413 | TFLOPs: 8.66 | 7: iteration 139680/ 173500 | consumed samples: 35758080 | consumed tokens: 73232547840 | elapsed time per iteration (s): 0.10 | learning rate: 3.668E-05 | global batch size: 256 | lm loss: 4.502376E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2448.534 | TFLOPs: 9.11 | 7: iteration 139690/ 173500 | consumed samples: 35760640 | consumed tokens: 73237790720 | elapsed time per iteration (s): 0.08 | learning rate: 3.667E-05 | global batch size: 256 | lm loss: 4.506094E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.398 | TFLOPs: 11.36 | 7: iteration 139700/ 173500 | consumed samples: 35763200 | consumed tokens: 73243033600 | elapsed time per iteration (s): 0.08 | learning rate: 3.666E-05 | global batch size: 256 | lm loss: 4.499593E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.377 | TFLOPs: 11.75 | 7: iteration 139710/ 173500 | consumed samples: 35765760 | consumed tokens: 73248276480 | elapsed time per iteration (s): 0.10 | learning rate: 3.665E-05 | global batch size: 256 | lm loss: 4.514952E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2476.280 | TFLOPs: 9.21 | 7: iteration 139720/ 173500 | consumed samples: 35768320 | consumed tokens: 73253519360 | elapsed time per iteration (s): 0.14 | learning rate: 3.664E-05 | global batch size: 256 | lm loss: 4.498234E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1811.417 | TFLOPs: 6.74 | 7: iteration 139730/ 173500 | consumed samples: 35770880 | consumed tokens: 73258762240 | elapsed time per iteration (s): 0.13 | learning rate: 3.663E-05 | global batch size: 256 | lm loss: 4.512977E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1931.240 | TFLOPs: 7.18 | 7: iteration 139740/ 173500 | consumed samples: 35773440 | consumed tokens: 73264005120 | elapsed time per iteration (s): 0.13 | learning rate: 3.662E-05 | global batch size: 256 | lm loss: 4.509539E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1918.301 | TFLOPs: 7.14 | 7: iteration 139750/ 173500 | consumed samples: 35776000 | consumed tokens: 73269248000 | elapsed time per iteration (s): 0.15 | learning rate: 3.661E-05 | global batch size: 256 | lm loss: 4.497065E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1760.024 | TFLOPs: 6.55 | 7: iteration 139760/ 173500 | consumed samples: 35778560 | consumed tokens: 73274490880 | elapsed time per iteration (s): 0.13 | learning rate: 3.660E-05 | global batch size: 256 | lm loss: 4.501221E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1956.710 | TFLOPs: 7.28 | 7: iteration 139770/ 173500 | consumed samples: 35781120 | consumed tokens: 73279733760 | elapsed time per iteration (s): 0.11 | learning rate: 3.659E-05 | global batch size: 256 | lm loss: 4.498558E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.601 | TFLOPs: 8.63 | 7: iteration 139780/ 173500 | consumed samples: 35783680 | consumed tokens: 73284976640 | elapsed time per iteration (s): 0.08 | learning rate: 3.658E-05 | global batch size: 256 | lm loss: 4.505256E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.051 | TFLOPs: 11.84 | 7: iteration 139790/ 173500 | consumed samples: 35786240 | consumed tokens: 73290219520 | elapsed time per iteration (s): 0.10 | learning rate: 3.657E-05 | global batch size: 256 | lm loss: 4.503021E+00 | grad norm: 0.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.390 | TFLOPs: 9.61 | 7: iteration 139800/ 173500 | consumed samples: 35788800 | consumed tokens: 73295462400 | elapsed time per iteration (s): 0.10 | learning rate: 3.656E-05 | global batch size: 256 | lm loss: 4.511020E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2517.910 | TFLOPs: 9.37 | 7: iteration 139810/ 173500 | consumed samples: 35791360 | consumed tokens: 73300705280 | elapsed time per iteration (s): 0.08 | learning rate: 3.655E-05 | global batch size: 256 | lm loss: 4.502896E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.077 | TFLOPs: 11.66 | 7: iteration 139820/ 173500 | consumed samples: 35793920 | consumed tokens: 73305948160 | elapsed time per iteration (s): 0.11 | learning rate: 3.654E-05 | global batch size: 256 | lm loss: 4.505208E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.488 | TFLOPs: 8.88 | 7: iteration 139830/ 173500 | consumed samples: 35796480 | consumed tokens: 73311191040 | elapsed time per iteration (s): 0.11 | learning rate: 3.653E-05 | global batch size: 256 | lm loss: 4.493665E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.282 | TFLOPs: 8.86 | 7: iteration 139840/ 173500 | consumed samples: 35799040 | consumed tokens: 73316433920 | elapsed time per iteration (s): 0.10 | learning rate: 3.652E-05 | global batch size: 256 | lm loss: 4.495071E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.507 | TFLOPs: 9.46 | 7: iteration 139850/ 173500 | consumed samples: 35801600 | consumed tokens: 73321676800 | elapsed time per iteration (s): 0.11 | learning rate: 3.651E-05 | global batch size: 256 | lm loss: 4.509319E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2425.778 | TFLOPs: 9.02 | 7: iteration 139860/ 173500 | consumed samples: 35804160 | consumed tokens: 73326919680 | elapsed time per iteration (s): 0.11 | learning rate: 3.651E-05 | global batch size: 256 | lm loss: 4.504738E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.955 | TFLOPs: 8.72 | 7: iteration 139870/ 173500 | consumed samples: 35806720 | consumed tokens: 73332162560 | elapsed time per iteration (s): 0.11 | learning rate: 3.650E-05 | global batch size: 256 | lm loss: 4.506551E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2358.267 | TFLOPs: 8.77 | 7: iteration 139880/ 173500 | consumed samples: 35809280 | consumed tokens: 73337405440 | elapsed time per iteration (s): 0.13 | learning rate: 3.649E-05 | global batch size: 256 | lm loss: 4.506372E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1954.995 | TFLOPs: 7.27 | 7: iteration 139890/ 173500 | consumed samples: 35811840 | consumed tokens: 73342648320 | elapsed time per iteration (s): 0.11 | learning rate: 3.648E-05 | global batch size: 256 | lm loss: 4.501483E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2316.956 | TFLOPs: 8.62 | 7: iteration 139900/ 173500 | consumed samples: 35814400 | consumed tokens: 73347891200 | elapsed time per iteration (s): 0.11 | learning rate: 3.647E-05 | global batch size: 256 | lm loss: 4.489493E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.919 | TFLOPs: 8.88 | 7: iteration 139910/ 173500 | consumed samples: 35816960 | consumed tokens: 73353134080 | elapsed time per iteration (s): 0.10 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 4.512869E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2632.297 | TFLOPs: 9.79 | 7: iteration 139920/ 173500 | consumed samples: 35819520 | consumed tokens: 73358376960 | elapsed time per iteration (s): 0.09 | learning rate: 3.645E-05 | global batch size: 256 | lm loss: 4.520803E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2783.538 | TFLOPs: 10.35 | 7: iteration 139930/ 173500 | consumed samples: 35822080 | consumed tokens: 73363619840 | elapsed time per iteration (s): 0.08 | learning rate: 3.644E-05 | global batch size: 256 | lm loss: 4.515935E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.037 | TFLOPs: 11.84 | 7: iteration 139940/ 173500 | consumed samples: 35824640 | consumed tokens: 73368862720 | elapsed time per iteration (s): 0.09 | learning rate: 3.643E-05 | global batch size: 256 | lm loss: 4.514313E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.413 | TFLOPs: 11.10 | 7: iteration 139950/ 173500 | consumed samples: 35827200 | consumed tokens: 73374105600 | elapsed time per iteration (s): 0.11 | learning rate: 3.642E-05 | global batch size: 256 | lm loss: 4.500016E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2257.761 | TFLOPs: 8.40 | 7: iteration 139960/ 173500 | consumed samples: 35829760 | consumed tokens: 73379348480 | elapsed time per iteration (s): 0.09 | learning rate: 3.641E-05 | global batch size: 256 | lm loss: 4.516012E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.151 | TFLOPs: 10.69 | 7: iteration 139970/ 173500 | consumed samples: 35832320 | consumed tokens: 73384591360 | elapsed time per iteration (s): 0.08 | learning rate: 3.640E-05 | global batch size: 256 | lm loss: 4.515576E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.447 | TFLOPs: 11.21 | 7: iteration 139980/ 173500 | consumed samples: 35834880 | consumed tokens: 73389834240 | elapsed time per iteration (s): 0.12 | learning rate: 3.639E-05 | global batch size: 256 | lm loss: 4.509887E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2057.220 | TFLOPs: 7.65 | 7: iteration 139990/ 173500 | consumed samples: 35837440 | consumed tokens: 73395077120 | elapsed time per iteration (s): 0.11 | learning rate: 3.638E-05 | global batch size: 256 | lm loss: 4.501993E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2375.892 | TFLOPs: 8.84 | 0: [2023-03-17 03:41:39,784] [INFO] [logging.py:68:log_dist] [Rank 0] step=140000, skipped=0, lr=[3.63724657135183e-05, 3.63724657135183e-05, 3.63724657135183e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 140000/ 173500 | consumed samples: 35840000 | consumed tokens: 73400320000 | elapsed time per iteration (s): 0.11 | learning rate: 3.637E-05 | global batch size: 256 | lm loss: 4.501400E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2389.053 | TFLOPs: 8.89 | 0: steps: 140000 loss: 4.4953 iter time (s): 0.101 samples/sec: 2542.411 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 140000 | lm loss value: 4.417996E+00 | lm loss PPL: 8.292996E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 140000 to checkpoints_14m91b100m 0: [2023-03-17 03:41:39,854] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step140000 is begin to save! 0: [2023-03-17 03:41:39,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:41:39,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:41:39,883] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:41:39,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:41:39,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:41:39,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:41:39,892] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:41:39,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:41:39,895] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:41:39,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:41:39,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:41:39,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:41:39,899] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step140000/mp_rank_00_model_states.pt 0: [2023-03-17 03:41:39,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:41:39,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:41:39,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:41:39,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 4: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 7: [2023-03-17 03:41:39,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 03:41:39,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 3: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 5: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:41:39,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:41:39,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 6: [2023-03-17 03:41:39,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:41:39,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:41:39,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 1: [2023-03-17 03:41:39,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:41:39,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step140000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:41:39,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step140000 is ready now! 0: successfully saved checkpoint at iteration 140000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.86 7: iteration 140010/ 173500 | consumed samples: 35842560 | consumed tokens: 73405562880 | elapsed time per iteration (s): 0.14 | learning rate: 3.636E-05 | global batch size: 256 | lm loss: 4.509840E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1892.804 | TFLOPs: 7.04 | 7: iteration 140020/ 173500 | consumed samples: 35845120 | consumed tokens: 73410805760 | elapsed time per iteration (s): 0.12 | learning rate: 3.635E-05 | global batch size: 256 | lm loss: 4.516020E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2060.674 | TFLOPs: 7.66 | 7: iteration 140030/ 173500 | consumed samples: 35847680 | consumed tokens: 73416048640 | elapsed time per iteration (s): 0.13 | learning rate: 3.634E-05 | global batch size: 256 | lm loss: 4.504761E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1906.259 | TFLOPs: 7.09 | 7: iteration 140040/ 173500 | consumed samples: 35850240 | consumed tokens: 73421291520 | elapsed time per iteration (s): 0.13 | learning rate: 3.633E-05 | global batch size: 256 | lm loss: 4.512816E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1995.762 | TFLOPs: 7.42 | 7: iteration 140050/ 173500 | consumed samples: 35852800 | consumed tokens: 73426534400 | elapsed time per iteration (s): 0.11 | learning rate: 3.633E-05 | global batch size: 256 | lm loss: 4.502154E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2284.170 | TFLOPs: 8.50 | 7: iteration 140060/ 173500 | consumed samples: 35855360 | consumed tokens: 73431777280 | elapsed time per iteration (s): 0.11 | learning rate: 3.632E-05 | global batch size: 256 | lm loss: 4.513641E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2229.810 | TFLOPs: 8.29 | 7: iteration 140070/ 173500 | consumed samples: 35857920 | consumed tokens: 73437020160 | elapsed time per iteration (s): 0.13 | learning rate: 3.631E-05 | global batch size: 256 | lm loss: 4.502306E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2025.685 | TFLOPs: 7.53 | 7: iteration 140080/ 173500 | consumed samples: 35860480 | consumed tokens: 73442263040 | elapsed time per iteration (s): 0.13 | learning rate: 3.630E-05 | global batch size: 256 | lm loss: 4.501840E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.818 | TFLOPs: 7.58 | 7: iteration 140090/ 173500 | consumed samples: 35863040 | consumed tokens: 73447505920 | elapsed time per iteration (s): 0.11 | learning rate: 3.629E-05 | global batch size: 256 | lm loss: 4.495745E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2294.826 | TFLOPs: 8.54 | 7: iteration 140100/ 173500 | consumed samples: 35865600 | consumed tokens: 73452748800 | elapsed time per iteration (s): 0.11 | learning rate: 3.628E-05 | global batch size: 256 | lm loss: 4.508408E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2366.727 | TFLOPs: 8.80 | 7: iteration 140110/ 173500 | consumed samples: 35868160 | consumed tokens: 73457991680 | elapsed time per iteration (s): 0.11 | learning rate: 3.627E-05 | global batch size: 256 | lm loss: 4.506688E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2350.181 | TFLOPs: 8.74 | 7: iteration 140120/ 173500 | consumed samples: 35870720 | consumed tokens: 73463234560 | elapsed time per iteration (s): 0.11 | learning rate: 3.626E-05 | global batch size: 256 | lm loss: 4.504181E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2291.409 | TFLOPs: 8.52 | 7: iteration 140130/ 173500 | consumed samples: 35873280 | consumed tokens: 73468477440 | elapsed time per iteration (s): 0.12 | learning rate: 3.625E-05 | global batch size: 256 | lm loss: 4.493429E+00 | grad norm: 0.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2126.970 | TFLOPs: 7.91 | 7: iteration 140140/ 173500 | consumed samples: 35875840 | consumed tokens: 73473720320 | elapsed time per iteration (s): 0.13 | learning rate: 3.624E-05 | global batch size: 256 | lm loss: 4.505540E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2018.519 | TFLOPs: 7.51 | 7: iteration 140150/ 173500 | consumed samples: 35878400 | consumed tokens: 73478963200 | elapsed time per iteration (s): 0.11 | learning rate: 3.623E-05 | global batch size: 256 | lm loss: 4.493003E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2291.737 | TFLOPs: 8.52 | 7: iteration 140160/ 173500 | consumed samples: 35880960 | consumed tokens: 73484206080 | elapsed time per iteration (s): 0.17 | learning rate: 3.622E-05 | global batch size: 256 | lm loss: 4.503965E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1518.153 | TFLOPs: 5.65 | 7: iteration 140170/ 173500 | consumed samples: 35883520 | consumed tokens: 73489448960 | elapsed time per iteration (s): 0.16 | learning rate: 3.621E-05 | global batch size: 256 | lm loss: 4.520765E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1572.620 | TFLOPs: 5.85 | 7: iteration 140180/ 173500 | consumed samples: 35886080 | consumed tokens: 73494691840 | elapsed time per iteration (s): 0.13 | learning rate: 3.620E-05 | global batch size: 256 | lm loss: 4.526532E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1935.859 | TFLOPs: 7.20 | 7: iteration 140190/ 173500 | consumed samples: 35888640 | consumed tokens: 73499934720 | elapsed time per iteration (s): 0.15 | learning rate: 3.619E-05 | global batch size: 256 | lm loss: 4.514155E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1660.795 | TFLOPs: 6.18 | 7: iteration 140200/ 173500 | consumed samples: 35891200 | consumed tokens: 73505177600 | elapsed time per iteration (s): 0.11 | learning rate: 3.618E-05 | global batch size: 256 | lm loss: 4.503693E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2272.478 | TFLOPs: 8.45 | 7: iteration 140210/ 173500 | consumed samples: 35893760 | consumed tokens: 73510420480 | elapsed time per iteration (s): 0.08 | learning rate: 3.617E-05 | global batch size: 256 | lm loss: 4.493041E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.779 | TFLOPs: 11.97 | 7: iteration 140220/ 173500 | consumed samples: 35896320 | consumed tokens: 73515663360 | elapsed time per iteration (s): 0.08 | learning rate: 3.616E-05 | global batch size: 256 | lm loss: 4.498656E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.143 | TFLOPs: 11.93 | 7: iteration 140230/ 173500 | consumed samples: 35898880 | consumed tokens: 73520906240 | elapsed time per iteration (s): 0.09 | learning rate: 3.616E-05 | global batch size: 256 | lm loss: 4.506039E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.215 | TFLOPs: 10.81 | 7: iteration 140240/ 173500 | consumed samples: 35901440 | consumed tokens: 73526149120 | elapsed time per iteration (s): 0.08 | learning rate: 3.615E-05 | global batch size: 256 | lm loss: 4.493513E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3102.089 | TFLOPs: 11.54 | 7: iteration 140250/ 173500 | consumed samples: 35904000 | consumed tokens: 73531392000 | elapsed time per iteration (s): 0.10 | learning rate: 3.614E-05 | global batch size: 256 | lm loss: 4.506491E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2459.650 | TFLOPs: 9.15 | 7: iteration 140260/ 173500 | consumed samples: 35906560 | consumed tokens: 73536634880 | elapsed time per iteration (s): 0.08 | learning rate: 3.613E-05 | global batch size: 256 | lm loss: 4.513020E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.616 | TFLOPs: 11.80 | 7: iteration 140270/ 173500 | consumed samples: 35909120 | consumed tokens: 73541877760 | elapsed time per iteration (s): 0.09 | learning rate: 3.612E-05 | global batch size: 256 | lm loss: 4.509536E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2918.961 | TFLOPs: 10.86 | 7: iteration 140280/ 173500 | consumed samples: 35911680 | consumed tokens: 73547120640 | elapsed time per iteration (s): 0.08 | learning rate: 3.611E-05 | global batch size: 256 | lm loss: 4.506790E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.679 | TFLOPs: 11.97 | 7: iteration 140290/ 173500 | consumed samples: 35914240 | consumed tokens: 73552363520 | elapsed time per iteration (s): 0.08 | learning rate: 3.610E-05 | global batch size: 256 | lm loss: 4.515790E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.255 | TFLOPs: 11.91 | 7: iteration 140300/ 173500 | consumed samples: 35916800 | consumed tokens: 73557606400 | elapsed time per iteration (s): 0.08 | learning rate: 3.609E-05 | global batch size: 256 | lm loss: 4.513818E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.091 | TFLOPs: 11.75 | 7: iteration 140310/ 173500 | consumed samples: 35919360 | consumed tokens: 73562849280 | elapsed time per iteration (s): 0.08 | learning rate: 3.608E-05 | global batch size: 256 | lm loss: 4.507504E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.382 | TFLOPs: 11.77 | 7: iteration 140320/ 173500 | consumed samples: 35921920 | consumed tokens: 73568092160 | elapsed time per iteration (s): 0.09 | learning rate: 3.607E-05 | global batch size: 256 | lm loss: 4.509467E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.034 | TFLOPs: 10.44 | 7: iteration 140330/ 173500 | consumed samples: 35924480 | consumed tokens: 73573335040 | elapsed time per iteration (s): 0.09 | learning rate: 3.606E-05 | global batch size: 256 | lm loss: 4.511381E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.056 | TFLOPs: 11.14 | 7: iteration 140340/ 173500 | consumed samples: 35927040 | consumed tokens: 73578577920 | elapsed time per iteration (s): 0.10 | learning rate: 3.605E-05 | global batch size: 256 | lm loss: 4.502885E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2579.490 | TFLOPs: 9.59 | 7: iteration 140350/ 173500 | consumed samples: 35929600 | consumed tokens: 73583820800 | elapsed time per iteration (s): 0.13 | learning rate: 3.604E-05 | global batch size: 256 | lm loss: 4.512795E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2047.665 | TFLOPs: 7.62 | 7: iteration 140360/ 173500 | consumed samples: 35932160 | consumed tokens: 73589063680 | elapsed time per iteration (s): 0.11 | learning rate: 3.603E-05 | global batch size: 256 | lm loss: 4.498068E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.029 | TFLOPs: 8.98 | 7: iteration 140370/ 173500 | consumed samples: 35934720 | consumed tokens: 73594306560 | elapsed time per iteration (s): 0.08 | learning rate: 3.602E-05 | global batch size: 256 | lm loss: 4.500499E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.355 | TFLOPs: 11.65 | 7: iteration 140380/ 173500 | consumed samples: 35937280 | consumed tokens: 73599549440 | elapsed time per iteration (s): 0.11 | learning rate: 3.601E-05 | global batch size: 256 | lm loss: 4.504576E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2430.326 | TFLOPs: 9.04 | 7: iteration 140390/ 173500 | consumed samples: 35939840 | consumed tokens: 73604792320 | elapsed time per iteration (s): 0.11 | learning rate: 3.601E-05 | global batch size: 256 | lm loss: 4.506020E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2399.254 | TFLOPs: 8.92 | 7: iteration 140400/ 173500 | consumed samples: 35942400 | consumed tokens: 73610035200 | elapsed time per iteration (s): 0.09 | learning rate: 3.600E-05 | global batch size: 256 | lm loss: 4.506226E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.483 | TFLOPs: 11.16 | 7: iteration 140410/ 173500 | consumed samples: 35944960 | consumed tokens: 73615278080 | elapsed time per iteration (s): 0.08 | learning rate: 3.599E-05 | global batch size: 256 | lm loss: 4.502368E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.226 | TFLOPs: 11.37 | 7: iteration 140420/ 173500 | consumed samples: 35947520 | consumed tokens: 73620520960 | elapsed time per iteration (s): 0.08 | learning rate: 3.598E-05 | global batch size: 256 | lm loss: 4.507978E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.518 | TFLOPs: 11.67 | 7: iteration 140430/ 173500 | consumed samples: 35950080 | consumed tokens: 73625763840 | elapsed time per iteration (s): 0.08 | learning rate: 3.597E-05 | global batch size: 256 | lm loss: 4.507465E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.434 | TFLOPs: 11.62 | 7: iteration 140440/ 173500 | consumed samples: 35952640 | consumed tokens: 73631006720 | elapsed time per iteration (s): 0.08 | learning rate: 3.596E-05 | global batch size: 256 | lm loss: 4.518399E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.249 | TFLOPs: 11.88 | 7: iteration 140450/ 173500 | consumed samples: 35955200 | consumed tokens: 73636249600 | elapsed time per iteration (s): 0.08 | learning rate: 3.595E-05 | global batch size: 256 | lm loss: 4.512687E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.919 | TFLOPs: 11.62 | 7: iteration 140460/ 173500 | consumed samples: 35957760 | consumed tokens: 73641492480 | elapsed time per iteration (s): 0.08 | learning rate: 3.594E-05 | global batch size: 256 | lm loss: 4.504893E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.308 | TFLOPs: 11.79 | 7: iteration 140470/ 173500 | consumed samples: 35960320 | consumed tokens: 73646735360 | elapsed time per iteration (s): 0.08 | learning rate: 3.593E-05 | global batch size: 256 | lm loss: 4.519683E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.343 | TFLOPs: 11.96 | 7: iteration 140480/ 173500 | consumed samples: 35962880 | consumed tokens: 73651978240 | elapsed time per iteration (s): 0.08 | learning rate: 3.592E-05 | global batch size: 256 | lm loss: 4.519168E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.601 | TFLOPs: 11.72 | 7: iteration 140490/ 173500 | consumed samples: 35965440 | consumed tokens: 73657221120 | elapsed time per iteration (s): 0.08 | learning rate: 3.591E-05 | global batch size: 256 | lm loss: 4.489775E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.961 | TFLOPs: 11.89 | 7: iteration 140500/ 173500 | consumed samples: 35968000 | consumed tokens: 73662464000 | elapsed time per iteration (s): 0.13 | learning rate: 3.590E-05 | global batch size: 256 | lm loss: 4.519849E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2023.059 | TFLOPs: 7.52 | 7: iteration 140510/ 173500 | consumed samples: 35970560 | consumed tokens: 73667706880 | elapsed time per iteration (s): 0.10 | learning rate: 3.589E-05 | global batch size: 256 | lm loss: 4.515738E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.432 | TFLOPs: 9.12 | 7: iteration 140520/ 173500 | consumed samples: 35973120 | consumed tokens: 73672949760 | elapsed time per iteration (s): 0.12 | learning rate: 3.588E-05 | global batch size: 256 | lm loss: 4.500378E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2106.498 | TFLOPs: 7.84 | 7: iteration 140530/ 173500 | consumed samples: 35975680 | consumed tokens: 73678192640 | elapsed time per iteration (s): 0.09 | learning rate: 3.587E-05 | global batch size: 256 | lm loss: 4.507087E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.446 | TFLOPs: 10.04 | 7: iteration 140540/ 173500 | consumed samples: 35978240 | consumed tokens: 73683435520 | elapsed time per iteration (s): 0.11 | learning rate: 3.586E-05 | global batch size: 256 | lm loss: 4.518313E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2372.485 | TFLOPs: 8.82 | 7: iteration 140550/ 173500 | consumed samples: 35980800 | consumed tokens: 73688678400 | elapsed time per iteration (s): 0.09 | learning rate: 3.586E-05 | global batch size: 256 | lm loss: 4.513616E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2953.943 | TFLOPs: 10.99 | 7: iteration 140560/ 173500 | consumed samples: 35983360 | consumed tokens: 73693921280 | elapsed time per iteration (s): 0.11 | learning rate: 3.585E-05 | global batch size: 256 | lm loss: 4.495654E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2393.972 | TFLOPs: 8.90 | 7: iteration 140570/ 173500 | consumed samples: 35985920 | consumed tokens: 73699164160 | elapsed time per iteration (s): 0.11 | learning rate: 3.584E-05 | global batch size: 256 | lm loss: 4.499607E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2395.945 | TFLOPs: 8.91 | 7: iteration 140580/ 173500 | consumed samples: 35988480 | consumed tokens: 73704407040 | elapsed time per iteration (s): 0.11 | learning rate: 3.583E-05 | global batch size: 256 | lm loss: 4.509715E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2387.107 | TFLOPs: 8.88 | 7: iteration 140590/ 173500 | consumed samples: 35991040 | consumed tokens: 73709649920 | elapsed time per iteration (s): 0.09 | learning rate: 3.582E-05 | global batch size: 256 | lm loss: 4.515773E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.318 | TFLOPs: 11.08 | 7: iteration 140600/ 173500 | consumed samples: 35993600 | consumed tokens: 73714892800 | elapsed time per iteration (s): 0.09 | learning rate: 3.581E-05 | global batch size: 256 | lm loss: 4.515982E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2848.626 | TFLOPs: 10.60 | 7: iteration 140610/ 173500 | consumed samples: 35996160 | consumed tokens: 73720135680 | elapsed time per iteration (s): 0.09 | learning rate: 3.580E-05 | global batch size: 256 | lm loss: 4.497309E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.180 | TFLOPs: 10.92 | 7: iteration 140620/ 173500 | consumed samples: 35998720 | consumed tokens: 73725378560 | elapsed time per iteration (s): 0.08 | learning rate: 3.579E-05 | global batch size: 256 | lm loss: 4.505897E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.050 | TFLOPs: 11.66 | 7: iteration 140630/ 173500 | consumed samples: 36001280 | consumed tokens: 73730621440 | elapsed time per iteration (s): 0.08 | learning rate: 3.578E-05 | global batch size: 256 | lm loss: 4.511569E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.699 | TFLOPs: 11.98 | 7: iteration 140640/ 173500 | consumed samples: 36003840 | consumed tokens: 73735864320 | elapsed time per iteration (s): 0.09 | learning rate: 3.577E-05 | global batch size: 256 | lm loss: 4.517903E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.192 | TFLOPs: 10.92 | 7: iteration 140650/ 173500 | consumed samples: 36006400 | consumed tokens: 73741107200 | elapsed time per iteration (s): 0.08 | learning rate: 3.576E-05 | global batch size: 256 | lm loss: 4.510807E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.150 | TFLOPs: 11.25 | 7: iteration 140660/ 173500 | consumed samples: 36008960 | consumed tokens: 73746350080 | elapsed time per iteration (s): 0.09 | learning rate: 3.575E-05 | global batch size: 256 | lm loss: 4.511431E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.380 | TFLOPs: 10.54 | 7: iteration 140670/ 173500 | consumed samples: 36011520 | consumed tokens: 73751592960 | elapsed time per iteration (s): 0.09 | learning rate: 3.574E-05 | global batch size: 256 | lm loss: 4.521794E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2802.214 | TFLOPs: 10.42 | 7: iteration 140680/ 173500 | consumed samples: 36014080 | consumed tokens: 73756835840 | elapsed time per iteration (s): 0.08 | learning rate: 3.573E-05 | global batch size: 256 | lm loss: 4.514796E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.560 | TFLOPs: 12.02 | 7: iteration 140690/ 173500 | consumed samples: 36016640 | consumed tokens: 73762078720 | elapsed time per iteration (s): 0.09 | learning rate: 3.573E-05 | global batch size: 256 | lm loss: 4.511148E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.677 | TFLOPs: 10.05 | 7: iteration 140700/ 173500 | consumed samples: 36019200 | consumed tokens: 73767321600 | elapsed time per iteration (s): 0.08 | learning rate: 3.572E-05 | global batch size: 256 | lm loss: 4.512408E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.133 | TFLOPs: 11.43 | 7: iteration 140710/ 173500 | consumed samples: 36021760 | consumed tokens: 73772564480 | elapsed time per iteration (s): 0.09 | learning rate: 3.571E-05 | global batch size: 256 | lm loss: 4.503731E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2947.372 | TFLOPs: 10.96 | 7: iteration 140720/ 173500 | consumed samples: 36024320 | consumed tokens: 73777807360 | elapsed time per iteration (s): 0.08 | learning rate: 3.570E-05 | global batch size: 256 | lm loss: 4.500209E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.167 | TFLOPs: 11.70 | 7: iteration 140730/ 173500 | consumed samples: 36026880 | consumed tokens: 73783050240 | elapsed time per iteration (s): 0.08 | learning rate: 3.569E-05 | global batch size: 256 | lm loss: 4.510561E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.312 | TFLOPs: 11.57 | 7: iteration 140740/ 173500 | consumed samples: 36029440 | consumed tokens: 73788293120 | elapsed time per iteration (s): 0.09 | learning rate: 3.568E-05 | global batch size: 256 | lm loss: 4.504686E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.823 | TFLOPs: 10.60 | 7: iteration 140750/ 173500 | consumed samples: 36032000 | consumed tokens: 73793536000 | elapsed time per iteration (s): 0.11 | learning rate: 3.567E-05 | global batch size: 256 | lm loss: 4.512666E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2269.979 | TFLOPs: 8.44 | 7: iteration 140760/ 173500 | consumed samples: 36034560 | consumed tokens: 73798778880 | elapsed time per iteration (s): 0.08 | learning rate: 3.566E-05 | global batch size: 256 | lm loss: 4.514278E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.954 | TFLOPs: 11.97 | 7: iteration 140770/ 173500 | consumed samples: 36037120 | consumed tokens: 73804021760 | elapsed time per iteration (s): 0.08 | learning rate: 3.565E-05 | global batch size: 256 | lm loss: 4.507952E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.724 | TFLOPs: 12.02 | 7: iteration 140780/ 173500 | consumed samples: 36039680 | consumed tokens: 73809264640 | elapsed time per iteration (s): 0.08 | learning rate: 3.564E-05 | global batch size: 256 | lm loss: 4.491959E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.205 | TFLOPs: 12.04 | 7: iteration 140790/ 173500 | consumed samples: 36042240 | consumed tokens: 73814507520 | elapsed time per iteration (s): 0.09 | learning rate: 3.563E-05 | global batch size: 256 | lm loss: 4.510661E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2914.299 | TFLOPs: 10.84 | 7: iteration 140800/ 173500 | consumed samples: 36044800 | consumed tokens: 73819750400 | elapsed time per iteration (s): 0.08 | learning rate: 3.562E-05 | global batch size: 256 | lm loss: 4.512539E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.935 | TFLOPs: 11.68 | 7: iteration 140810/ 173500 | consumed samples: 36047360 | consumed tokens: 73824993280 | elapsed time per iteration (s): 0.10 | learning rate: 3.561E-05 | global batch size: 256 | lm loss: 4.509106E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.110 | TFLOPs: 9.33 | 7: iteration 140820/ 173500 | consumed samples: 36049920 | consumed tokens: 73830236160 | elapsed time per iteration (s): 0.09 | learning rate: 3.560E-05 | global batch size: 256 | lm loss: 4.501435E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.738 | TFLOPs: 10.26 | 7: iteration 140830/ 173500 | consumed samples: 36052480 | consumed tokens: 73835479040 | elapsed time per iteration (s): 0.10 | learning rate: 3.560E-05 | global batch size: 256 | lm loss: 4.514152E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2604.713 | TFLOPs: 9.69 | 7: iteration 140840/ 173500 | consumed samples: 36055040 | consumed tokens: 73840721920 | elapsed time per iteration (s): 0.10 | learning rate: 3.559E-05 | global batch size: 256 | lm loss: 4.494497E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.999 | TFLOPs: 9.52 | 7: iteration 140850/ 173500 | consumed samples: 36057600 | consumed tokens: 73845964800 | elapsed time per iteration (s): 0.09 | learning rate: 3.558E-05 | global batch size: 256 | lm loss: 4.507025E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.580 | TFLOPs: 10.40 | 7: iteration 140860/ 173500 | consumed samples: 36060160 | consumed tokens: 73851207680 | elapsed time per iteration (s): 0.08 | learning rate: 3.557E-05 | global batch size: 256 | lm loss: 4.511810E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.016 | TFLOPs: 11.93 | 7: iteration 140870/ 173500 | consumed samples: 36062720 | consumed tokens: 73856450560 | elapsed time per iteration (s): 0.08 | learning rate: 3.556E-05 | global batch size: 256 | lm loss: 4.511089E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.630 | TFLOPs: 11.92 | 7: iteration 140880/ 173500 | consumed samples: 36065280 | consumed tokens: 73861693440 | elapsed time per iteration (s): 0.13 | learning rate: 3.555E-05 | global batch size: 256 | lm loss: 4.501530E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2036.005 | TFLOPs: 7.57 | 7: iteration 140890/ 173500 | consumed samples: 36067840 | consumed tokens: 73866936320 | elapsed time per iteration (s): 0.14 | learning rate: 3.554E-05 | global batch size: 256 | lm loss: 4.527228E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1803.308 | TFLOPs: 6.71 | 7: iteration 140900/ 173500 | consumed samples: 36070400 | consumed tokens: 73872179200 | elapsed time per iteration (s): 0.13 | learning rate: 3.553E-05 | global batch size: 256 | lm loss: 4.508564E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1944.297 | TFLOPs: 7.23 | 7: iteration 140910/ 173500 | consumed samples: 36072960 | consumed tokens: 73877422080 | elapsed time per iteration (s): 0.12 | learning rate: 3.552E-05 | global batch size: 256 | lm loss: 4.512115E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2106.564 | TFLOPs: 7.84 | 7: iteration 140920/ 173500 | consumed samples: 36075520 | consumed tokens: 73882664960 | elapsed time per iteration (s): 0.08 | learning rate: 3.551E-05 | global batch size: 256 | lm loss: 4.496150E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.851 | TFLOPs: 11.83 | 7: iteration 140930/ 173500 | consumed samples: 36078080 | consumed tokens: 73887907840 | elapsed time per iteration (s): 0.08 | learning rate: 3.550E-05 | global batch size: 256 | lm loss: 4.512355E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.308 | TFLOPs: 12.03 | 7: iteration 140940/ 173500 | consumed samples: 36080640 | consumed tokens: 73893150720 | elapsed time per iteration (s): 0.08 | learning rate: 3.549E-05 | global batch size: 256 | lm loss: 4.505011E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.540 | TFLOPs: 12.04 | 7: iteration 140950/ 173500 | consumed samples: 36083200 | consumed tokens: 73898393600 | elapsed time per iteration (s): 0.08 | learning rate: 3.548E-05 | global batch size: 256 | lm loss: 4.503627E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.648 | TFLOPs: 11.65 | 7: iteration 140960/ 173500 | consumed samples: 36085760 | consumed tokens: 73903636480 | elapsed time per iteration (s): 0.09 | learning rate: 3.548E-05 | global batch size: 256 | lm loss: 4.515714E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2810.258 | TFLOPs: 10.45 | 7: iteration 140970/ 173500 | consumed samples: 36088320 | consumed tokens: 73908879360 | elapsed time per iteration (s): 0.08 | learning rate: 3.547E-05 | global batch size: 256 | lm loss: 4.513441E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.133 | TFLOPs: 11.87 | 7: iteration 140980/ 173500 | consumed samples: 36090880 | consumed tokens: 73914122240 | elapsed time per iteration (s): 0.08 | learning rate: 3.546E-05 | global batch size: 256 | lm loss: 4.513683E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.970 | TFLOPs: 11.61 | 7: iteration 140990/ 173500 | consumed samples: 36093440 | consumed tokens: 73919365120 | elapsed time per iteration (s): 0.08 | learning rate: 3.545E-05 | global batch size: 256 | lm loss: 4.507444E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.730 | TFLOPs: 11.27 | 7: iteration 141000/ 173500 | consumed samples: 36096000 | consumed tokens: 73924608000 | elapsed time per iteration (s): 0.08 | learning rate: 3.544E-05 | global batch size: 256 | lm loss: 4.514621E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.249 | TFLOPs: 11.98 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 141000 | lm loss value: 4.379153E+00 | lm loss PPL: 7.977042E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 141000 to checkpoints_14m91b100m 0: [2023-03-17 03:43:18,163] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step141000 is begin to save! 0: [2023-03-17 03:43:18,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:43:18,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:43:18,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:43:18,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:43:18,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:43:18,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:43:18,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:43:18,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:43:18,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:43:18,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:43:18,214] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:43:18,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:43:18,215] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step141000/mp_rank_00_model_states.pt 0: [2023-03-17 03:43:18,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:43:18,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:43:18,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:43:18,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 7: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 1: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 2: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 2: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 3: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 5: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step141000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 4: [2023-03-17 03:43:18,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step141000 is ready now! 0: successfully saved checkpoint at iteration 141000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 90.00 7: iteration 141010/ 173500 | consumed samples: 36098560 | consumed tokens: 73929850880 | elapsed time per iteration (s): 0.09 | learning rate: 3.543E-05 | global batch size: 256 | lm loss: 4.508822E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.557 | TFLOPs: 10.21 | 7: iteration 141020/ 173500 | consumed samples: 36101120 | consumed tokens: 73935093760 | elapsed time per iteration (s): 0.08 | learning rate: 3.542E-05 | global batch size: 256 | lm loss: 4.513878E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.307 | TFLOPs: 11.89 | 7: iteration 141030/ 173500 | consumed samples: 36103680 | consumed tokens: 73940336640 | elapsed time per iteration (s): 0.08 | learning rate: 3.541E-05 | global batch size: 256 | lm loss: 4.501606E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.944 | TFLOPs: 11.88 | 7: iteration 141040/ 173500 | consumed samples: 36106240 | consumed tokens: 73945579520 | elapsed time per iteration (s): 0.08 | learning rate: 3.540E-05 | global batch size: 256 | lm loss: 4.518340E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.366 | TFLOPs: 12.01 | 7: iteration 141050/ 173500 | consumed samples: 36108800 | consumed tokens: 73950822400 | elapsed time per iteration (s): 0.09 | learning rate: 3.539E-05 | global batch size: 256 | lm loss: 4.492739E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.378 | TFLOPs: 11.10 | 7: iteration 141060/ 173500 | consumed samples: 36111360 | consumed tokens: 73956065280 | elapsed time per iteration (s): 0.09 | learning rate: 3.538E-05 | global batch size: 256 | lm loss: 4.503799E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.309 | TFLOPs: 10.92 | 7: iteration 141070/ 173500 | consumed samples: 36113920 | consumed tokens: 73961308160 | elapsed time per iteration (s): 0.08 | learning rate: 3.537E-05 | global batch size: 256 | lm loss: 4.496542E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.872 | TFLOPs: 11.98 | 7: iteration 141080/ 173500 | consumed samples: 36116480 | consumed tokens: 73966551040 | elapsed time per iteration (s): 0.08 | learning rate: 3.536E-05 | global batch size: 256 | lm loss: 4.510323E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.680 | TFLOPs: 11.71 | 7: iteration 141090/ 173500 | consumed samples: 36119040 | consumed tokens: 73971793920 | elapsed time per iteration (s): 0.08 | learning rate: 3.536E-05 | global batch size: 256 | lm loss: 4.506538E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.533 | TFLOPs: 12.02 | 7: iteration 141100/ 173500 | consumed samples: 36121600 | consumed tokens: 73977036800 | elapsed time per iteration (s): 0.08 | learning rate: 3.535E-05 | global batch size: 256 | lm loss: 4.504704E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.788 | TFLOPs: 11.70 | 7: iteration 141110/ 173500 | consumed samples: 36124160 | consumed tokens: 73982279680 | elapsed time per iteration (s): 0.10 | learning rate: 3.534E-05 | global batch size: 256 | lm loss: 4.500590E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.667 | TFLOPs: 9.70 | 7: iteration 141120/ 173500 | consumed samples: 36126720 | consumed tokens: 73987522560 | elapsed time per iteration (s): 0.09 | learning rate: 3.533E-05 | global batch size: 256 | lm loss: 4.504794E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.733 | TFLOPs: 11.16 | 7: iteration 141130/ 173500 | consumed samples: 36129280 | consumed tokens: 73992765440 | elapsed time per iteration (s): 0.08 | learning rate: 3.532E-05 | global batch size: 256 | lm loss: 4.501918E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.889 | TFLOPs: 11.49 | 7: iteration 141140/ 173500 | consumed samples: 36131840 | consumed tokens: 73998008320 | elapsed time per iteration (s): 0.08 | learning rate: 3.531E-05 | global batch size: 256 | lm loss: 4.515752E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.241 | TFLOPs: 11.94 | 7: iteration 141150/ 173500 | consumed samples: 36134400 | consumed tokens: 74003251200 | elapsed time per iteration (s): 0.10 | learning rate: 3.530E-05 | global batch size: 256 | lm loss: 4.524325E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2534.909 | TFLOPs: 9.43 | 7: iteration 141160/ 173500 | consumed samples: 36136960 | consumed tokens: 74008494080 | elapsed time per iteration (s): 0.12 | learning rate: 3.529E-05 | global batch size: 256 | lm loss: 4.501125E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2050.229 | TFLOPs: 7.63 | 7: iteration 141170/ 173500 | consumed samples: 36139520 | consumed tokens: 74013736960 | elapsed time per iteration (s): 0.13 | learning rate: 3.528E-05 | global batch size: 256 | lm loss: 4.505705E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1943.776 | TFLOPs: 7.23 | 7: iteration 141180/ 173500 | consumed samples: 36142080 | consumed tokens: 74018979840 | elapsed time per iteration (s): 0.12 | learning rate: 3.527E-05 | global batch size: 256 | lm loss: 4.508544E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2224.201 | TFLOPs: 8.27 | 7: iteration 141190/ 173500 | consumed samples: 36144640 | consumed tokens: 74024222720 | elapsed time per iteration (s): 0.09 | learning rate: 3.526E-05 | global batch size: 256 | lm loss: 4.520586E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.694 | TFLOPs: 10.23 | 7: iteration 141200/ 173500 | consumed samples: 36147200 | consumed tokens: 74029465600 | elapsed time per iteration (s): 0.13 | learning rate: 3.525E-05 | global batch size: 256 | lm loss: 4.518093E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1991.090 | TFLOPs: 7.41 | 7: iteration 141210/ 173500 | consumed samples: 36149760 | consumed tokens: 74034708480 | elapsed time per iteration (s): 0.13 | learning rate: 3.525E-05 | global batch size: 256 | lm loss: 4.504346E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1917.506 | TFLOPs: 7.13 | 7: iteration 141220/ 173500 | consumed samples: 36152320 | consumed tokens: 74039951360 | elapsed time per iteration (s): 0.10 | learning rate: 3.524E-05 | global batch size: 256 | lm loss: 4.506107E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2496.576 | TFLOPs: 9.29 | 7: iteration 141230/ 173500 | consumed samples: 36154880 | consumed tokens: 74045194240 | elapsed time per iteration (s): 0.09 | learning rate: 3.523E-05 | global batch size: 256 | lm loss: 4.505843E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.977 | TFLOPs: 10.30 | 7: iteration 141240/ 173500 | consumed samples: 36157440 | consumed tokens: 74050437120 | elapsed time per iteration (s): 0.09 | learning rate: 3.522E-05 | global batch size: 256 | lm loss: 4.512117E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.400 | TFLOPs: 10.03 | 7: iteration 141250/ 173500 | consumed samples: 36160000 | consumed tokens: 74055680000 | elapsed time per iteration (s): 0.12 | learning rate: 3.521E-05 | global batch size: 256 | lm loss: 4.521096E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2110.905 | TFLOPs: 7.85 | 7: iteration 141260/ 173500 | consumed samples: 36162560 | consumed tokens: 74060922880 | elapsed time per iteration (s): 0.13 | learning rate: 3.520E-05 | global batch size: 256 | lm loss: 4.488115E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2022.518 | TFLOPs: 7.52 | 7: iteration 141270/ 173500 | consumed samples: 36165120 | consumed tokens: 74066165760 | elapsed time per iteration (s): 0.12 | learning rate: 3.519E-05 | global batch size: 256 | lm loss: 4.502889E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2060.657 | TFLOPs: 7.66 | 7: iteration 141280/ 173500 | consumed samples: 36167680 | consumed tokens: 74071408640 | elapsed time per iteration (s): 0.08 | learning rate: 3.518E-05 | global batch size: 256 | lm loss: 4.521832E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.220 | TFLOPs: 11.78 | 7: iteration 141290/ 173500 | consumed samples: 36170240 | consumed tokens: 74076651520 | elapsed time per iteration (s): 0.08 | learning rate: 3.517E-05 | global batch size: 256 | lm loss: 4.511866E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.682 | TFLOPs: 11.88 | 7: iteration 141300/ 173500 | consumed samples: 36172800 | consumed tokens: 74081894400 | elapsed time per iteration (s): 0.08 | learning rate: 3.516E-05 | global batch size: 256 | lm loss: 4.503889E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.516 | TFLOPs: 11.72 | 7: iteration 141310/ 173500 | consumed samples: 36175360 | consumed tokens: 74087137280 | elapsed time per iteration (s): 0.09 | learning rate: 3.515E-05 | global batch size: 256 | lm loss: 4.518216E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.713 | TFLOPs: 11.03 | 7: iteration 141320/ 173500 | consumed samples: 36177920 | consumed tokens: 74092380160 | elapsed time per iteration (s): 0.12 | learning rate: 3.514E-05 | global batch size: 256 | lm loss: 4.501557E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2088.779 | TFLOPs: 7.77 | 7: iteration 141330/ 173500 | consumed samples: 36180480 | consumed tokens: 74097623040 | elapsed time per iteration (s): 0.11 | learning rate: 3.514E-05 | global batch size: 256 | lm loss: 4.508274E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2352.976 | TFLOPs: 8.75 | 7: iteration 141340/ 173500 | consumed samples: 36183040 | consumed tokens: 74102865920 | elapsed time per iteration (s): 0.10 | learning rate: 3.513E-05 | global batch size: 256 | lm loss: 4.496325E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2441.366 | TFLOPs: 9.08 | 7: iteration 141350/ 173500 | consumed samples: 36185600 | consumed tokens: 74108108800 | elapsed time per iteration (s): 0.08 | learning rate: 3.512E-05 | global batch size: 256 | lm loss: 4.509970E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.763 | TFLOPs: 11.82 | 7: iteration 141360/ 173500 | consumed samples: 36188160 | consumed tokens: 74113351680 | elapsed time per iteration (s): 0.12 | learning rate: 3.511E-05 | global batch size: 256 | lm loss: 4.509047E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2208.433 | TFLOPs: 8.21 | 7: iteration 141370/ 173500 | consumed samples: 36190720 | consumed tokens: 74118594560 | elapsed time per iteration (s): 0.12 | learning rate: 3.510E-05 | global batch size: 256 | lm loss: 4.503870E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2194.985 | TFLOPs: 8.16 | 7: iteration 141380/ 173500 | consumed samples: 36193280 | consumed tokens: 74123837440 | elapsed time per iteration (s): 0.10 | learning rate: 3.509E-05 | global batch size: 256 | lm loss: 4.514828E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2571.348 | TFLOPs: 9.56 | 7: iteration 141390/ 173500 | consumed samples: 36195840 | consumed tokens: 74129080320 | elapsed time per iteration (s): 0.08 | learning rate: 3.508E-05 | global batch size: 256 | lm loss: 4.506103E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.480 | TFLOPs: 11.93 | 7: iteration 141400/ 173500 | consumed samples: 36198400 | consumed tokens: 74134323200 | elapsed time per iteration (s): 0.08 | learning rate: 3.507E-05 | global batch size: 256 | lm loss: 4.507381E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.019 | TFLOPs: 11.83 | 7: iteration 141410/ 173500 | consumed samples: 36200960 | consumed tokens: 74139566080 | elapsed time per iteration (s): 0.08 | learning rate: 3.506E-05 | global batch size: 256 | lm loss: 4.498767E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.430 | TFLOPs: 12.05 | 7: iteration 141420/ 173500 | consumed samples: 36203520 | consumed tokens: 74144808960 | elapsed time per iteration (s): 0.08 | learning rate: 3.505E-05 | global batch size: 256 | lm loss: 4.497324E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.620 | TFLOPs: 12.02 | 7: iteration 141430/ 173500 | consumed samples: 36206080 | consumed tokens: 74150051840 | elapsed time per iteration (s): 0.08 | learning rate: 3.504E-05 | global batch size: 256 | lm loss: 4.507748E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.998 | TFLOPs: 12.02 | 7: iteration 141440/ 173500 | consumed samples: 36208640 | consumed tokens: 74155294720 | elapsed time per iteration (s): 0.08 | learning rate: 3.503E-05 | global batch size: 256 | lm loss: 4.503838E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3030.113 | TFLOPs: 11.27 | 7: iteration 141450/ 173500 | consumed samples: 36211200 | consumed tokens: 74160537600 | elapsed time per iteration (s): 0.08 | learning rate: 3.503E-05 | global batch size: 256 | lm loss: 4.508560E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.969 | TFLOPs: 11.93 | 7: iteration 141460/ 173500 | consumed samples: 36213760 | consumed tokens: 74165780480 | elapsed time per iteration (s): 0.08 | learning rate: 3.502E-05 | global batch size: 256 | lm loss: 4.528476E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3051.748 | TFLOPs: 11.35 | 7: iteration 141470/ 173500 | consumed samples: 36216320 | consumed tokens: 74171023360 | elapsed time per iteration (s): 0.08 | learning rate: 3.501E-05 | global batch size: 256 | lm loss: 4.508221E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.345 | TFLOPs: 11.93 | 7: iteration 141480/ 173500 | consumed samples: 36218880 | consumed tokens: 74176266240 | elapsed time per iteration (s): 0.08 | learning rate: 3.500E-05 | global batch size: 256 | lm loss: 4.517504E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.054 | TFLOPs: 11.79 | 7: iteration 141490/ 173500 | consumed samples: 36221440 | consumed tokens: 74181509120 | elapsed time per iteration (s): 0.09 | learning rate: 3.499E-05 | global batch size: 256 | lm loss: 4.520545E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.381 | TFLOPs: 10.25 | 7: iteration 141500/ 173500 | consumed samples: 36224000 | consumed tokens: 74186752000 | elapsed time per iteration (s): 0.09 | learning rate: 3.498E-05 | global batch size: 256 | lm loss: 4.512324E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2799.116 | TFLOPs: 10.41 | 7: iteration 141510/ 173500 | consumed samples: 36226560 | consumed tokens: 74191994880 | elapsed time per iteration (s): 0.08 | learning rate: 3.497E-05 | global batch size: 256 | lm loss: 4.516510E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.599 | TFLOPs: 11.97 | 7: iteration 141520/ 173500 | consumed samples: 36229120 | consumed tokens: 74197237760 | elapsed time per iteration (s): 0.08 | learning rate: 3.496E-05 | global batch size: 256 | lm loss: 4.510514E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.793 | TFLOPs: 11.98 | 7: iteration 141530/ 173500 | consumed samples: 36231680 | consumed tokens: 74202480640 | elapsed time per iteration (s): 0.08 | learning rate: 3.495E-05 | global batch size: 256 | lm loss: 4.512809E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.821 | TFLOPs: 11.93 | 7: iteration 141540/ 173500 | consumed samples: 36234240 | consumed tokens: 74207723520 | elapsed time per iteration (s): 0.08 | learning rate: 3.494E-05 | global batch size: 256 | lm loss: 4.508886E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.710 | TFLOPs: 11.99 | 7: iteration 141550/ 173500 | consumed samples: 36236800 | consumed tokens: 74212966400 | elapsed time per iteration (s): 0.08 | learning rate: 3.493E-05 | global batch size: 256 | lm loss: 4.491752E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.603 | TFLOPs: 11.97 | 7: iteration 141560/ 173500 | consumed samples: 36239360 | consumed tokens: 74218209280 | elapsed time per iteration (s): 0.08 | learning rate: 3.493E-05 | global batch size: 256 | lm loss: 4.496133E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.020 | TFLOPs: 12.00 | 7: iteration 141570/ 173500 | consumed samples: 36241920 | consumed tokens: 74223452160 | elapsed time per iteration (s): 0.08 | learning rate: 3.492E-05 | global batch size: 256 | lm loss: 4.499535E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.223 | TFLOPs: 12.04 | 7: iteration 141580/ 173500 | consumed samples: 36244480 | consumed tokens: 74228695040 | elapsed time per iteration (s): 0.08 | learning rate: 3.491E-05 | global batch size: 256 | lm loss: 4.509837E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.268 | TFLOPs: 12.03 | 7: iteration 141590/ 173500 | consumed samples: 36247040 | consumed tokens: 74233937920 | elapsed time per iteration (s): 0.12 | learning rate: 3.490E-05 | global batch size: 256 | lm loss: 4.492963E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2058.481 | TFLOPs: 7.66 | 7: iteration 141600/ 173500 | consumed samples: 36249600 | consumed tokens: 74239180800 | elapsed time per iteration (s): 0.09 | learning rate: 3.489E-05 | global batch size: 256 | lm loss: 4.504037E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.010 | TFLOPs: 10.92 | 7: iteration 141610/ 173500 | consumed samples: 36252160 | consumed tokens: 74244423680 | elapsed time per iteration (s): 0.10 | learning rate: 3.488E-05 | global batch size: 256 | lm loss: 4.514790E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.823 | TFLOPs: 9.46 | 7: iteration 141620/ 173500 | consumed samples: 36254720 | consumed tokens: 74249666560 | elapsed time per iteration (s): 0.08 | learning rate: 3.487E-05 | global batch size: 256 | lm loss: 4.506346E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.084 | TFLOPs: 11.91 | 7: iteration 141630/ 173500 | consumed samples: 36257280 | consumed tokens: 74254909440 | elapsed time per iteration (s): 0.09 | learning rate: 3.486E-05 | global batch size: 256 | lm loss: 4.512008E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2944.929 | TFLOPs: 10.95 | 7: iteration 141640/ 173500 | consumed samples: 36259840 | consumed tokens: 74260152320 | elapsed time per iteration (s): 0.09 | learning rate: 3.485E-05 | global batch size: 256 | lm loss: 4.507788E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.198 | TFLOPs: 11.16 | 7: iteration 141650/ 173500 | consumed samples: 36262400 | consumed tokens: 74265395200 | elapsed time per iteration (s): 0.09 | learning rate: 3.484E-05 | global batch size: 256 | lm loss: 4.514668E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3009.937 | TFLOPs: 11.20 | 7: iteration 141660/ 173500 | consumed samples: 36264960 | consumed tokens: 74270638080 | elapsed time per iteration (s): 0.08 | learning rate: 3.484E-05 | global batch size: 256 | lm loss: 4.509161E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.303 | TFLOPs: 11.69 | 7: iteration 141670/ 173500 | consumed samples: 36267520 | consumed tokens: 74275880960 | elapsed time per iteration (s): 0.08 | learning rate: 3.483E-05 | global batch size: 256 | lm loss: 4.508223E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.675 | TFLOPs: 11.91 | 7: iteration 141680/ 173500 | consumed samples: 36270080 | consumed tokens: 74281123840 | elapsed time per iteration (s): 0.08 | learning rate: 3.482E-05 | global batch size: 256 | lm loss: 4.502231E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.776 | TFLOPs: 12.01 | 7: iteration 141690/ 173500 | consumed samples: 36272640 | consumed tokens: 74286366720 | elapsed time per iteration (s): 0.08 | learning rate: 3.481E-05 | global batch size: 256 | lm loss: 4.507775E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.311 | TFLOPs: 12.00 | 7: iteration 141700/ 173500 | consumed samples: 36275200 | consumed tokens: 74291609600 | elapsed time per iteration (s): 0.08 | learning rate: 3.480E-05 | global batch size: 256 | lm loss: 4.509961E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.225 | TFLOPs: 12.02 | 7: iteration 141710/ 173500 | consumed samples: 36277760 | consumed tokens: 74296852480 | elapsed time per iteration (s): 0.08 | learning rate: 3.479E-05 | global batch size: 256 | lm loss: 4.509996E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.413 | TFLOPs: 12.02 | 7: iteration 141720/ 173500 | consumed samples: 36280320 | consumed tokens: 74302095360 | elapsed time per iteration (s): 0.08 | learning rate: 3.478E-05 | global batch size: 256 | lm loss: 4.504112E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.123 | TFLOPs: 12.00 | 7: iteration 141730/ 173500 | consumed samples: 36282880 | consumed tokens: 74307338240 | elapsed time per iteration (s): 0.08 | learning rate: 3.477E-05 | global batch size: 256 | lm loss: 4.488023E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.598 | TFLOPs: 11.97 | 7: iteration 141740/ 173500 | consumed samples: 36285440 | consumed tokens: 74312581120 | elapsed time per iteration (s): 0.08 | learning rate: 3.476E-05 | global batch size: 256 | lm loss: 4.505232E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.039 | TFLOPs: 11.91 | 7: iteration 141750/ 173500 | consumed samples: 36288000 | consumed tokens: 74317824000 | elapsed time per iteration (s): 0.08 | learning rate: 3.475E-05 | global batch size: 256 | lm loss: 4.495241E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.974 | TFLOPs: 11.27 | 7: iteration 141760/ 173500 | consumed samples: 36290560 | consumed tokens: 74323066880 | elapsed time per iteration (s): 0.09 | learning rate: 3.474E-05 | global batch size: 256 | lm loss: 4.514141E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2917.204 | TFLOPs: 10.85 | 7: iteration 141770/ 173500 | consumed samples: 36293120 | consumed tokens: 74328309760 | elapsed time per iteration (s): 0.12 | learning rate: 3.474E-05 | global batch size: 256 | lm loss: 4.502462E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.884 | TFLOPs: 8.04 | 7: iteration 141780/ 173500 | consumed samples: 36295680 | consumed tokens: 74333552640 | elapsed time per iteration (s): 0.08 | learning rate: 3.473E-05 | global batch size: 256 | lm loss: 4.487611E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.126 | TFLOPs: 11.98 | 7: iteration 141790/ 173500 | consumed samples: 36298240 | consumed tokens: 74338795520 | elapsed time per iteration (s): 0.24 | learning rate: 3.472E-05 | global batch size: 256 | lm loss: 4.513091E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1077.335 | TFLOPs: 4.01 | 7: iteration 141800/ 173500 | consumed samples: 36300800 | consumed tokens: 74344038400 | elapsed time per iteration (s): 0.11 | learning rate: 3.471E-05 | global batch size: 256 | lm loss: 4.506056E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2376.250 | TFLOPs: 8.84 | 7: iteration 141810/ 173500 | consumed samples: 36303360 | consumed tokens: 74349281280 | elapsed time per iteration (s): 0.11 | learning rate: 3.470E-05 | global batch size: 256 | lm loss: 4.507571E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2327.752 | TFLOPs: 8.66 | 7: iteration 141820/ 173500 | consumed samples: 36305920 | consumed tokens: 74354524160 | elapsed time per iteration (s): 0.12 | learning rate: 3.469E-05 | global batch size: 256 | lm loss: 4.499136E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2098.311 | TFLOPs: 7.80 | 7: iteration 141830/ 173500 | consumed samples: 36308480 | consumed tokens: 74359767040 | elapsed time per iteration (s): 0.12 | learning rate: 3.468E-05 | global batch size: 256 | lm loss: 4.510126E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2140.748 | TFLOPs: 7.96 | 7: iteration 141840/ 173500 | consumed samples: 36311040 | consumed tokens: 74365009920 | elapsed time per iteration (s): 0.11 | learning rate: 3.467E-05 | global batch size: 256 | lm loss: 4.508340E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2229.894 | TFLOPs: 8.29 | 7: iteration 141850/ 173500 | consumed samples: 36313600 | consumed tokens: 74370252800 | elapsed time per iteration (s): 0.10 | learning rate: 3.466E-05 | global batch size: 256 | lm loss: 4.510292E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2559.733 | TFLOPs: 9.52 | 7: iteration 141860/ 173500 | consumed samples: 36316160 | consumed tokens: 74375495680 | elapsed time per iteration (s): 0.10 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 4.508597E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2595.818 | TFLOPs: 9.66 | 7: iteration 141870/ 173500 | consumed samples: 36318720 | consumed tokens: 74380738560 | elapsed time per iteration (s): 0.08 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 4.503963E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.475 | TFLOPs: 11.63 | 7: iteration 141880/ 173500 | consumed samples: 36321280 | consumed tokens: 74385981440 | elapsed time per iteration (s): 0.08 | learning rate: 3.464E-05 | global batch size: 256 | lm loss: 4.498550E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.528 | TFLOPs: 12.00 | 7: iteration 141890/ 173500 | consumed samples: 36323840 | consumed tokens: 74391224320 | elapsed time per iteration (s): 0.08 | learning rate: 3.463E-05 | global batch size: 256 | lm loss: 4.515432E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.649 | TFLOPs: 11.96 | 7: iteration 141900/ 173500 | consumed samples: 36326400 | consumed tokens: 74396467200 | elapsed time per iteration (s): 0.13 | learning rate: 3.462E-05 | global batch size: 256 | lm loss: 4.510711E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2024.455 | TFLOPs: 7.53 | 7: iteration 141910/ 173500 | consumed samples: 36328960 | consumed tokens: 74401710080 | elapsed time per iteration (s): 0.13 | learning rate: 3.461E-05 | global batch size: 256 | lm loss: 4.501295E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1905.881 | TFLOPs: 7.09 | 7: iteration 141920/ 173500 | consumed samples: 36331520 | consumed tokens: 74406952960 | elapsed time per iteration (s): 0.12 | learning rate: 3.460E-05 | global batch size: 256 | lm loss: 4.505643E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2163.686 | TFLOPs: 8.05 | 7: iteration 141930/ 173500 | consumed samples: 36334080 | consumed tokens: 74412195840 | elapsed time per iteration (s): 0.10 | learning rate: 3.459E-05 | global batch size: 256 | lm loss: 4.500803E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2499.856 | TFLOPs: 9.30 | 7: iteration 141940/ 173500 | consumed samples: 36336640 | consumed tokens: 74417438720 | elapsed time per iteration (s): 0.10 | learning rate: 3.458E-05 | global batch size: 256 | lm loss: 4.516403E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2452.017 | TFLOPs: 9.12 | 7: iteration 141950/ 173500 | consumed samples: 36339200 | consumed tokens: 74422681600 | elapsed time per iteration (s): 0.10 | learning rate: 3.457E-05 | global batch size: 256 | lm loss: 4.511620E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.587 | TFLOPs: 9.70 | 7: iteration 141960/ 173500 | consumed samples: 36341760 | consumed tokens: 74427924480 | elapsed time per iteration (s): 0.08 | learning rate: 3.456E-05 | global batch size: 256 | lm loss: 4.493199E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.573 | TFLOPs: 12.04 | 7: iteration 141970/ 173500 | consumed samples: 36344320 | consumed tokens: 74433167360 | elapsed time per iteration (s): 0.08 | learning rate: 3.456E-05 | global batch size: 256 | lm loss: 4.506582E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.909 | TFLOPs: 12.01 | 7: iteration 141980/ 173500 | consumed samples: 36346880 | consumed tokens: 74438410240 | elapsed time per iteration (s): 0.08 | learning rate: 3.455E-05 | global batch size: 256 | lm loss: 4.495259E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.333 | TFLOPs: 11.48 | 7: iteration 141990/ 173500 | consumed samples: 36349440 | consumed tokens: 74443653120 | elapsed time per iteration (s): 0.08 | learning rate: 3.454E-05 | global batch size: 256 | lm loss: 4.511226E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.480 | TFLOPs: 12.02 | 0: [2023-03-17 03:44:52,701] [INFO] [logging.py:68:log_dist] [Rank 0] step=142000, skipped=0, lr=[3.452880099827123e-05, 3.452880099827123e-05, 3.452880099827123e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 142000/ 173500 | consumed samples: 36352000 | consumed tokens: 74448896000 | elapsed time per iteration (s): 0.10 | learning rate: 3.453E-05 | global batch size: 256 | lm loss: 4.501114E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2610.004 | TFLOPs: 9.71 | 0: steps: 142000 loss: 4.4572 iter time (s): 0.096 samples/sec: 2674.823 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 142000 | lm loss value: 4.368969E+00 | lm loss PPL: 7.896218E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 142000 to checkpoints_14m91b100m 0: [2023-03-17 03:44:52,759] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step142000 is begin to save! 0: [2023-03-17 03:44:52,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:44:52,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:44:52,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:44:52,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:44:52,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:44:52,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:44:52,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:44:52,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:44:52,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:44:52,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:44:52,802] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:44:52,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:44:52,803] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step142000/mp_rank_00_model_states.pt 0: [2023-03-17 03:44:52,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:44:52,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:44:52,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:44:52,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,830] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,830] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,831] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,831] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: [2023-03-17 03:44:52,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 6: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 5: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 3: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 7: [2023-03-17 03:44:52,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:44:52,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:44:52,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 2: [2023-03-17 03:44:52,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:44:52,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:44:52,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 1: [2023-03-17 03:44:52,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:44:52,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step142000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:44:52,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step142000 is ready now! 0: successfully saved checkpoint at iteration 142000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.16 7: iteration 142010/ 173500 | consumed samples: 36354560 | consumed tokens: 74454138880 | elapsed time per iteration (s): 0.09 | learning rate: 3.452E-05 | global batch size: 256 | lm loss: 4.513758E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2746.980 | TFLOPs: 10.22 | 7: iteration 142020/ 173500 | consumed samples: 36357120 | consumed tokens: 74459381760 | elapsed time per iteration (s): 0.09 | learning rate: 3.451E-05 | global batch size: 256 | lm loss: 4.515476E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.859 | TFLOPs: 10.88 | 7: iteration 142030/ 173500 | consumed samples: 36359680 | consumed tokens: 74464624640 | elapsed time per iteration (s): 0.08 | learning rate: 3.450E-05 | global batch size: 256 | lm loss: 4.504650E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.168 | TFLOPs: 11.88 | 7: iteration 142040/ 173500 | consumed samples: 36362240 | consumed tokens: 74469867520 | elapsed time per iteration (s): 0.08 | learning rate: 3.449E-05 | global batch size: 256 | lm loss: 4.520392E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.785 | TFLOPs: 11.26 | 7: iteration 142050/ 173500 | consumed samples: 36364800 | consumed tokens: 74475110400 | elapsed time per iteration (s): 0.08 | learning rate: 3.448E-05 | global batch size: 256 | lm loss: 4.500964E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.692 | TFLOPs: 11.96 | 7: iteration 142060/ 173500 | consumed samples: 36367360 | consumed tokens: 74480353280 | elapsed time per iteration (s): 0.08 | learning rate: 3.448E-05 | global batch size: 256 | lm loss: 4.503721E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.738 | TFLOPs: 11.80 | 7: iteration 142070/ 173500 | consumed samples: 36369920 | consumed tokens: 74485596160 | elapsed time per iteration (s): 0.08 | learning rate: 3.447E-05 | global batch size: 256 | lm loss: 4.507287E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.521 | TFLOPs: 11.98 | 7: iteration 142080/ 173500 | consumed samples: 36372480 | consumed tokens: 74490839040 | elapsed time per iteration (s): 0.08 | learning rate: 3.446E-05 | global batch size: 256 | lm loss: 4.505310E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.741 | TFLOPs: 11.91 | 7: iteration 142090/ 173500 | consumed samples: 36375040 | consumed tokens: 74496081920 | elapsed time per iteration (s): 0.09 | learning rate: 3.445E-05 | global batch size: 256 | lm loss: 4.503855E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2910.450 | TFLOPs: 10.83 | 7: iteration 142100/ 173500 | consumed samples: 36377600 | consumed tokens: 74501324800 | elapsed time per iteration (s): 0.08 | learning rate: 3.444E-05 | global batch size: 256 | lm loss: 4.502994E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.574 | TFLOPs: 11.85 | 7: iteration 142110/ 173500 | consumed samples: 36380160 | consumed tokens: 74506567680 | elapsed time per iteration (s): 0.08 | learning rate: 3.443E-05 | global batch size: 256 | lm loss: 4.498565E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.320 | TFLOPs: 11.53 | 7: iteration 142120/ 173500 | consumed samples: 36382720 | consumed tokens: 74511810560 | elapsed time per iteration (s): 0.08 | learning rate: 3.442E-05 | global batch size: 256 | lm loss: 4.513703E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.758 | TFLOPs: 11.84 | 7: iteration 142130/ 173500 | consumed samples: 36385280 | consumed tokens: 74517053440 | elapsed time per iteration (s): 0.08 | learning rate: 3.441E-05 | global batch size: 256 | lm loss: 4.508905E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.879 | TFLOPs: 11.86 | 7: iteration 142140/ 173500 | consumed samples: 36387840 | consumed tokens: 74522296320 | elapsed time per iteration (s): 0.10 | learning rate: 3.440E-05 | global batch size: 256 | lm loss: 4.508514E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.958 | TFLOPs: 9.34 | 7: iteration 142150/ 173500 | consumed samples: 36390400 | consumed tokens: 74527539200 | elapsed time per iteration (s): 0.08 | learning rate: 3.439E-05 | global batch size: 256 | lm loss: 4.502398E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.810 | TFLOPs: 11.48 | 7: iteration 142160/ 173500 | consumed samples: 36392960 | consumed tokens: 74532782080 | elapsed time per iteration (s): 0.08 | learning rate: 3.439E-05 | global batch size: 256 | lm loss: 4.515723E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.795 | TFLOPs: 11.89 | 7: iteration 142170/ 173500 | consumed samples: 36395520 | consumed tokens: 74538024960 | elapsed time per iteration (s): 0.08 | learning rate: 3.438E-05 | global batch size: 256 | lm loss: 4.500413E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.298 | TFLOPs: 11.82 | 7: iteration 142180/ 173500 | consumed samples: 36398080 | consumed tokens: 74543267840 | elapsed time per iteration (s): 0.08 | learning rate: 3.437E-05 | global batch size: 256 | lm loss: 4.503367E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.019 | TFLOPs: 11.86 | 7: iteration 142190/ 173500 | consumed samples: 36400640 | consumed tokens: 74548510720 | elapsed time per iteration (s): 0.10 | learning rate: 3.436E-05 | global batch size: 256 | lm loss: 4.507671E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2491.132 | TFLOPs: 9.27 | 7: iteration 142200/ 173500 | consumed samples: 36403200 | consumed tokens: 74553753600 | elapsed time per iteration (s): 0.11 | learning rate: 3.435E-05 | global batch size: 256 | lm loss: 4.511961E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.519 | TFLOPs: 8.62 | 7: iteration 142210/ 173500 | consumed samples: 36405760 | consumed tokens: 74558996480 | elapsed time per iteration (s): 0.12 | learning rate: 3.434E-05 | global batch size: 256 | lm loss: 4.497798E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2176.957 | TFLOPs: 8.10 | 7: iteration 142220/ 173500 | consumed samples: 36408320 | consumed tokens: 74564239360 | elapsed time per iteration (s): 0.10 | learning rate: 3.433E-05 | global batch size: 256 | lm loss: 4.514071E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.317 | TFLOPs: 9.19 | 7: iteration 142230/ 173500 | consumed samples: 36410880 | consumed tokens: 74569482240 | elapsed time per iteration (s): 0.11 | learning rate: 3.432E-05 | global batch size: 256 | lm loss: 4.516275E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2379.150 | TFLOPs: 8.85 | 7: iteration 142240/ 173500 | consumed samples: 36413440 | consumed tokens: 74574725120 | elapsed time per iteration (s): 0.12 | learning rate: 3.431E-05 | global batch size: 256 | lm loss: 4.505371E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2194.234 | TFLOPs: 8.16 | 7: iteration 142250/ 173500 | consumed samples: 36416000 | consumed tokens: 74579968000 | elapsed time per iteration (s): 0.12 | learning rate: 3.431E-05 | global batch size: 256 | lm loss: 4.494036E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2058.081 | TFLOPs: 7.66 | 7: iteration 142260/ 173500 | consumed samples: 36418560 | consumed tokens: 74585210880 | elapsed time per iteration (s): 0.13 | learning rate: 3.430E-05 | global batch size: 256 | lm loss: 4.505341E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1921.675 | TFLOPs: 7.15 | 7: iteration 142270/ 173500 | consumed samples: 36421120 | consumed tokens: 74590453760 | elapsed time per iteration (s): 0.11 | learning rate: 3.429E-05 | global batch size: 256 | lm loss: 4.505986E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2299.881 | TFLOPs: 8.55 | 7: iteration 142280/ 173500 | consumed samples: 36423680 | consumed tokens: 74595696640 | elapsed time per iteration (s): 0.12 | learning rate: 3.428E-05 | global batch size: 256 | lm loss: 4.513677E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2133.203 | TFLOPs: 7.93 | 7: iteration 142290/ 173500 | consumed samples: 36426240 | consumed tokens: 74600939520 | elapsed time per iteration (s): 0.10 | learning rate: 3.427E-05 | global batch size: 256 | lm loss: 4.503042E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2577.618 | TFLOPs: 9.59 | 7: iteration 142300/ 173500 | consumed samples: 36428800 | consumed tokens: 74606182400 | elapsed time per iteration (s): 0.08 | learning rate: 3.426E-05 | global batch size: 256 | lm loss: 4.504275E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.194 | TFLOPs: 11.73 | 7: iteration 142310/ 173500 | consumed samples: 36431360 | consumed tokens: 74611425280 | elapsed time per iteration (s): 0.09 | learning rate: 3.425E-05 | global batch size: 256 | lm loss: 4.491334E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.703 | TFLOPs: 10.60 | 7: iteration 142320/ 173500 | consumed samples: 36433920 | consumed tokens: 74616668160 | elapsed time per iteration (s): 0.09 | learning rate: 3.424E-05 | global batch size: 256 | lm loss: 4.504284E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2829.004 | TFLOPs: 10.52 | 7: iteration 142330/ 173500 | consumed samples: 36436480 | consumed tokens: 74621911040 | elapsed time per iteration (s): 0.08 | learning rate: 3.423E-05 | global batch size: 256 | lm loss: 4.507201E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.437 | TFLOPs: 11.95 | 7: iteration 142340/ 173500 | consumed samples: 36439040 | consumed tokens: 74627153920 | elapsed time per iteration (s): 0.08 | learning rate: 3.423E-05 | global batch size: 256 | lm loss: 4.497730E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.475 | TFLOPs: 12.00 | 7: iteration 142350/ 173500 | consumed samples: 36441600 | consumed tokens: 74632396800 | elapsed time per iteration (s): 0.08 | learning rate: 3.422E-05 | global batch size: 256 | lm loss: 4.499362E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.925 | TFLOPs: 11.97 | 7: iteration 142360/ 173500 | consumed samples: 36444160 | consumed tokens: 74637639680 | elapsed time per iteration (s): 0.08 | learning rate: 3.421E-05 | global batch size: 256 | lm loss: 4.516273E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.698 | TFLOPs: 11.91 | 7: iteration 142370/ 173500 | consumed samples: 36446720 | consumed tokens: 74642882560 | elapsed time per iteration (s): 0.09 | learning rate: 3.420E-05 | global batch size: 256 | lm loss: 4.507630E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2697.388 | TFLOPs: 10.03 | 7: iteration 142380/ 173500 | consumed samples: 36449280 | consumed tokens: 74648125440 | elapsed time per iteration (s): 0.09 | learning rate: 3.419E-05 | global batch size: 256 | lm loss: 4.491679E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.202 | TFLOPs: 11.06 | 7: iteration 142390/ 173500 | consumed samples: 36451840 | consumed tokens: 74653368320 | elapsed time per iteration (s): 0.08 | learning rate: 3.418E-05 | global batch size: 256 | lm loss: 4.513098E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.516 | TFLOPs: 11.60 | 7: iteration 142400/ 173500 | consumed samples: 36454400 | consumed tokens: 74658611200 | elapsed time per iteration (s): 0.09 | learning rate: 3.417E-05 | global batch size: 256 | lm loss: 4.498354E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2940.061 | TFLOPs: 10.94 | 7: iteration 142410/ 173500 | consumed samples: 36456960 | consumed tokens: 74663854080 | elapsed time per iteration (s): 0.08 | learning rate: 3.416E-05 | global batch size: 256 | lm loss: 4.506693E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.160 | TFLOPs: 11.92 | 7: iteration 142420/ 173500 | consumed samples: 36459520 | consumed tokens: 74669096960 | elapsed time per iteration (s): 0.09 | learning rate: 3.415E-05 | global batch size: 256 | lm loss: 4.507362E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2796.353 | TFLOPs: 10.40 | 7: iteration 142430/ 173500 | consumed samples: 36462080 | consumed tokens: 74674339840 | elapsed time per iteration (s): 0.10 | learning rate: 3.415E-05 | global batch size: 256 | lm loss: 4.508587E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.752 | TFLOPs: 9.16 | 7: iteration 142440/ 173500 | consumed samples: 36464640 | consumed tokens: 74679582720 | elapsed time per iteration (s): 0.10 | learning rate: 3.414E-05 | global batch size: 256 | lm loss: 4.508220E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.399 | TFLOPs: 9.56 | 7: iteration 142450/ 173500 | consumed samples: 36467200 | consumed tokens: 74684825600 | elapsed time per iteration (s): 0.10 | learning rate: 3.413E-05 | global batch size: 256 | lm loss: 4.512316E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.771 | TFLOPs: 9.37 | 7: iteration 142460/ 173500 | consumed samples: 36469760 | consumed tokens: 74690068480 | elapsed time per iteration (s): 0.11 | learning rate: 3.412E-05 | global batch size: 256 | lm loss: 4.501810E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2372.880 | TFLOPs: 8.83 | 7: iteration 142470/ 173500 | consumed samples: 36472320 | consumed tokens: 74695311360 | elapsed time per iteration (s): 0.08 | learning rate: 3.411E-05 | global batch size: 256 | lm loss: 4.520135E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.363 | TFLOPs: 11.92 | 7: iteration 142480/ 173500 | consumed samples: 36474880 | consumed tokens: 74700554240 | elapsed time per iteration (s): 0.08 | learning rate: 3.410E-05 | global batch size: 256 | lm loss: 4.497066E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.342 | TFLOPs: 11.89 | 7: iteration 142490/ 173500 | consumed samples: 36477440 | consumed tokens: 74705797120 | elapsed time per iteration (s): 0.08 | learning rate: 3.409E-05 | global batch size: 256 | lm loss: 4.524142E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.017 | TFLOPs: 11.87 | 7: iteration 142500/ 173500 | consumed samples: 36480000 | consumed tokens: 74711040000 | elapsed time per iteration (s): 0.08 | learning rate: 3.408E-05 | global batch size: 256 | lm loss: 4.502807E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.377 | TFLOPs: 11.81 | 7: iteration 142510/ 173500 | consumed samples: 36482560 | consumed tokens: 74716282880 | elapsed time per iteration (s): 0.08 | learning rate: 3.407E-05 | global batch size: 256 | lm loss: 4.509063E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.906 | TFLOPs: 11.81 | 7: iteration 142520/ 173500 | consumed samples: 36485120 | consumed tokens: 74721525760 | elapsed time per iteration (s): 0.09 | learning rate: 3.407E-05 | global batch size: 256 | lm loss: 4.514428E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.144 | TFLOPs: 11.06 | 7: iteration 142530/ 173500 | consumed samples: 36487680 | consumed tokens: 74726768640 | elapsed time per iteration (s): 0.09 | learning rate: 3.406E-05 | global batch size: 256 | lm loss: 4.518515E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.792 | TFLOPs: 11.09 | 7: iteration 142540/ 173500 | consumed samples: 36490240 | consumed tokens: 74732011520 | elapsed time per iteration (s): 0.08 | learning rate: 3.405E-05 | global batch size: 256 | lm loss: 4.512601E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.363 | TFLOPs: 11.32 | 7: iteration 142550/ 173500 | consumed samples: 36492800 | consumed tokens: 74737254400 | elapsed time per iteration (s): 0.09 | learning rate: 3.404E-05 | global batch size: 256 | lm loss: 4.499640E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.110 | TFLOPs: 10.88 | 7: iteration 142560/ 173500 | consumed samples: 36495360 | consumed tokens: 74742497280 | elapsed time per iteration (s): 0.09 | learning rate: 3.403E-05 | global batch size: 256 | lm loss: 4.508867E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.957 | TFLOPs: 11.01 | 7: iteration 142570/ 173500 | consumed samples: 36497920 | consumed tokens: 74747740160 | elapsed time per iteration (s): 0.09 | learning rate: 3.402E-05 | global batch size: 256 | lm loss: 4.512700E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.402 | TFLOPs: 10.84 | 7: iteration 142580/ 173500 | consumed samples: 36500480 | consumed tokens: 74752983040 | elapsed time per iteration (s): 0.11 | learning rate: 3.401E-05 | global batch size: 256 | lm loss: 4.512751E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2368.411 | TFLOPs: 8.81 | 7: iteration 142590/ 173500 | consumed samples: 36503040 | consumed tokens: 74758225920 | elapsed time per iteration (s): 0.09 | learning rate: 3.400E-05 | global batch size: 256 | lm loss: 4.513588E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.370 | TFLOPs: 10.74 | 7: iteration 142600/ 173500 | consumed samples: 36505600 | consumed tokens: 74763468800 | elapsed time per iteration (s): 0.08 | learning rate: 3.400E-05 | global batch size: 256 | lm loss: 4.503795E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.712 | TFLOPs: 11.62 | 7: iteration 142610/ 173500 | consumed samples: 36508160 | consumed tokens: 74768711680 | elapsed time per iteration (s): 0.08 | learning rate: 3.399E-05 | global batch size: 256 | lm loss: 4.505354E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.586 | TFLOPs: 11.63 | 7: iteration 142620/ 173500 | consumed samples: 36510720 | consumed tokens: 74773954560 | elapsed time per iteration (s): 0.08 | learning rate: 3.398E-05 | global batch size: 256 | lm loss: 4.514011E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3020.135 | TFLOPs: 11.23 | 7: iteration 142630/ 173500 | consumed samples: 36513280 | consumed tokens: 74779197440 | elapsed time per iteration (s): 0.08 | learning rate: 3.397E-05 | global batch size: 256 | lm loss: 4.512433E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.357 | TFLOPs: 11.66 | 7: iteration 142640/ 173500 | consumed samples: 36515840 | consumed tokens: 74784440320 | elapsed time per iteration (s): 0.08 | learning rate: 3.396E-05 | global batch size: 256 | lm loss: 4.522999E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.255 | TFLOPs: 11.98 | 7: iteration 142650/ 173500 | consumed samples: 36518400 | consumed tokens: 74789683200 | elapsed time per iteration (s): 0.08 | learning rate: 3.395E-05 | global batch size: 256 | lm loss: 4.507903E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.967 | TFLOPs: 11.90 | 7: iteration 142660/ 173500 | consumed samples: 36520960 | consumed tokens: 74794926080 | elapsed time per iteration (s): 0.08 | learning rate: 3.394E-05 | global batch size: 256 | lm loss: 4.491065E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.005 | TFLOPs: 11.99 | 7: iteration 142670/ 173500 | consumed samples: 36523520 | consumed tokens: 74800168960 | elapsed time per iteration (s): 0.08 | learning rate: 3.393E-05 | global batch size: 256 | lm loss: 4.515760E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.039 | TFLOPs: 11.94 | 7: iteration 142680/ 173500 | consumed samples: 36526080 | consumed tokens: 74805411840 | elapsed time per iteration (s): 0.08 | learning rate: 3.392E-05 | global batch size: 256 | lm loss: 4.505410E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.850 | TFLOPs: 11.96 | 7: iteration 142690/ 173500 | consumed samples: 36528640 | consumed tokens: 74810654720 | elapsed time per iteration (s): 0.08 | learning rate: 3.392E-05 | global batch size: 256 | lm loss: 4.511477E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.957 | TFLOPs: 11.92 | 7: iteration 142700/ 173500 | consumed samples: 36531200 | consumed tokens: 74815897600 | elapsed time per iteration (s): 0.08 | learning rate: 3.391E-05 | global batch size: 256 | lm loss: 4.526816E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.793 | TFLOPs: 11.95 | 7: iteration 142710/ 173500 | consumed samples: 36533760 | consumed tokens: 74821140480 | elapsed time per iteration (s): 0.08 | learning rate: 3.390E-05 | global batch size: 256 | lm loss: 4.506247E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.169 | TFLOPs: 11.91 | 7: iteration 142720/ 173500 | consumed samples: 36536320 | consumed tokens: 74826383360 | elapsed time per iteration (s): 0.08 | learning rate: 3.389E-05 | global batch size: 256 | lm loss: 4.508664E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.061 | TFLOPs: 11.45 | 7: iteration 142730/ 173500 | consumed samples: 36538880 | consumed tokens: 74831626240 | elapsed time per iteration (s): 0.08 | learning rate: 3.388E-05 | global batch size: 256 | lm loss: 4.505336E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.856 | TFLOPs: 11.54 | 7: iteration 142740/ 173500 | consumed samples: 36541440 | consumed tokens: 74836869120 | elapsed time per iteration (s): 0.12 | learning rate: 3.387E-05 | global batch size: 256 | lm loss: 4.510992E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2197.864 | TFLOPs: 8.18 | 7: iteration 142750/ 173500 | consumed samples: 36544000 | consumed tokens: 74842112000 | elapsed time per iteration (s): 0.13 | learning rate: 3.386E-05 | global batch size: 256 | lm loss: 4.504445E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1906.853 | TFLOPs: 7.09 | 7: iteration 142760/ 173500 | consumed samples: 36546560 | consumed tokens: 74847354880 | elapsed time per iteration (s): 0.13 | learning rate: 3.385E-05 | global batch size: 256 | lm loss: 4.517235E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2011.908 | TFLOPs: 7.48 | 7: iteration 142770/ 173500 | consumed samples: 36549120 | consumed tokens: 74852597760 | elapsed time per iteration (s): 0.13 | learning rate: 3.385E-05 | global batch size: 256 | lm loss: 4.510665E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2029.562 | TFLOPs: 7.55 | 7: iteration 142780/ 173500 | consumed samples: 36551680 | consumed tokens: 74857840640 | elapsed time per iteration (s): 0.08 | learning rate: 3.384E-05 | global batch size: 256 | lm loss: 4.495797E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.689 | TFLOPs: 11.66 | 7: iteration 142790/ 173500 | consumed samples: 36554240 | consumed tokens: 74863083520 | elapsed time per iteration (s): 0.08 | learning rate: 3.383E-05 | global batch size: 256 | lm loss: 4.498796E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.704 | TFLOPs: 11.93 | 7: iteration 142800/ 173500 | consumed samples: 36556800 | consumed tokens: 74868326400 | elapsed time per iteration (s): 0.08 | learning rate: 3.382E-05 | global batch size: 256 | lm loss: 4.514486E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.074 | TFLOPs: 11.56 | 7: iteration 142810/ 173500 | consumed samples: 36559360 | consumed tokens: 74873569280 | elapsed time per iteration (s): 0.11 | learning rate: 3.381E-05 | global batch size: 256 | lm loss: 4.490668E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2233.985 | TFLOPs: 8.31 | 7: iteration 142820/ 173500 | consumed samples: 36561920 | consumed tokens: 74878812160 | elapsed time per iteration (s): 0.08 | learning rate: 3.380E-05 | global batch size: 256 | lm loss: 4.515496E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.230 | TFLOPs: 11.91 | 7: iteration 142830/ 173500 | consumed samples: 36564480 | consumed tokens: 74884055040 | elapsed time per iteration (s): 0.08 | learning rate: 3.379E-05 | global batch size: 256 | lm loss: 4.504016E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.033 | TFLOPs: 11.96 | 7: iteration 142840/ 173500 | consumed samples: 36567040 | consumed tokens: 74889297920 | elapsed time per iteration (s): 0.08 | learning rate: 3.378E-05 | global batch size: 256 | lm loss: 4.508267E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.338 | TFLOPs: 11.97 | 7: iteration 142850/ 173500 | consumed samples: 36569600 | consumed tokens: 74894540800 | elapsed time per iteration (s): 0.09 | learning rate: 3.378E-05 | global batch size: 256 | lm loss: 4.507965E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.712 | TFLOPs: 10.59 | 7: iteration 142860/ 173500 | consumed samples: 36572160 | consumed tokens: 74899783680 | elapsed time per iteration (s): 0.09 | learning rate: 3.377E-05 | global batch size: 256 | lm loss: 4.505557E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.616 | TFLOPs: 10.05 | 7: iteration 142870/ 173500 | consumed samples: 36574720 | consumed tokens: 74905026560 | elapsed time per iteration (s): 0.08 | learning rate: 3.376E-05 | global batch size: 256 | lm loss: 4.507965E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.842 | TFLOPs: 11.59 | 7: iteration 142880/ 173500 | consumed samples: 36577280 | consumed tokens: 74910269440 | elapsed time per iteration (s): 0.08 | learning rate: 3.375E-05 | global batch size: 256 | lm loss: 4.517145E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.508 | TFLOPs: 11.68 | 7: iteration 142890/ 173500 | consumed samples: 36579840 | consumed tokens: 74915512320 | elapsed time per iteration (s): 0.08 | learning rate: 3.374E-05 | global batch size: 256 | lm loss: 4.517680E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.513 | TFLOPs: 11.96 | 7: iteration 142900/ 173500 | consumed samples: 36582400 | consumed tokens: 74920755200 | elapsed time per iteration (s): 0.08 | learning rate: 3.373E-05 | global batch size: 256 | lm loss: 4.508524E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.107 | TFLOPs: 11.93 | 7: iteration 142910/ 173500 | consumed samples: 36584960 | consumed tokens: 74925998080 | elapsed time per iteration (s): 0.08 | learning rate: 3.372E-05 | global batch size: 256 | lm loss: 4.486200E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.288 | TFLOPs: 11.96 | 7: iteration 142920/ 173500 | consumed samples: 36587520 | consumed tokens: 74931240960 | elapsed time per iteration (s): 0.08 | learning rate: 3.371E-05 | global batch size: 256 | lm loss: 4.510952E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.855 | TFLOPs: 11.97 | 7: iteration 142930/ 173500 | consumed samples: 36590080 | consumed tokens: 74936483840 | elapsed time per iteration (s): 0.08 | learning rate: 3.371E-05 | global batch size: 256 | lm loss: 4.503601E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.328 | TFLOPs: 11.96 | 7: iteration 142940/ 173500 | consumed samples: 36592640 | consumed tokens: 74941726720 | elapsed time per iteration (s): 0.08 | learning rate: 3.370E-05 | global batch size: 256 | lm loss: 4.518427E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.037 | TFLOPs: 12.01 | 7: iteration 142950/ 173500 | consumed samples: 36595200 | consumed tokens: 74946969600 | elapsed time per iteration (s): 0.08 | learning rate: 3.369E-05 | global batch size: 256 | lm loss: 4.495325E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.875 | TFLOPs: 11.97 | 7: iteration 142960/ 173500 | consumed samples: 36597760 | consumed tokens: 74952212480 | elapsed time per iteration (s): 0.08 | learning rate: 3.368E-05 | global batch size: 256 | lm loss: 4.502173E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.931 | TFLOPs: 11.98 | 7: iteration 142970/ 173500 | consumed samples: 36600320 | consumed tokens: 74957455360 | elapsed time per iteration (s): 0.09 | learning rate: 3.367E-05 | global batch size: 256 | lm loss: 4.510753E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.247 | TFLOPs: 10.24 | 7: iteration 142980/ 173500 | consumed samples: 36602880 | consumed tokens: 74962698240 | elapsed time per iteration (s): 0.11 | learning rate: 3.366E-05 | global batch size: 256 | lm loss: 4.510757E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2416.761 | TFLOPs: 8.99 | 7: iteration 142990/ 173500 | consumed samples: 36605440 | consumed tokens: 74967941120 | elapsed time per iteration (s): 0.10 | learning rate: 3.365E-05 | global batch size: 256 | lm loss: 4.506264E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2538.670 | TFLOPs: 9.44 | 7: iteration 143000/ 173500 | consumed samples: 36608000 | consumed tokens: 74973184000 | elapsed time per iteration (s): 0.10 | learning rate: 3.364E-05 | global batch size: 256 | lm loss: 4.513549E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2460.553 | TFLOPs: 9.15 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 143000 | lm loss value: 4.387990E+00 | lm loss PPL: 8.047849E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 143000 to checkpoints_14m91b100m 0: [2023-03-17 03:46:22,782] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step143000 is begin to save! 0: [2023-03-17 03:46:22,785] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:46:22,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:46:22,813] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:46:22,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:46:22,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:46:22,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:46:22,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:46:22,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:46:22,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:46:22,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:46:22,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:46:22,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:46:22,829] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step143000/mp_rank_00_model_states.pt 0: [2023-03-17 03:46:22,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:46:22,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:46:22,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:46:22,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 3: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 2: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 7: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 5: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 6: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 4: [2023-03-17 03:46:22,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 1: [2023-03-17 03:46:22,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:46:22,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step143000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:46:22,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step143000 is ready now! 0: successfully saved checkpoint at iteration 143000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.32 7: iteration 143010/ 173500 | consumed samples: 36610560 | consumed tokens: 74978426880 | elapsed time per iteration (s): 0.11 | learning rate: 3.364E-05 | global batch size: 256 | lm loss: 4.514924E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2432.435 | TFLOPs: 9.05 | 7: iteration 143020/ 173500 | consumed samples: 36613120 | consumed tokens: 74983669760 | elapsed time per iteration (s): 0.08 | learning rate: 3.363E-05 | global batch size: 256 | lm loss: 4.503372E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.934 | TFLOPs: 12.00 | 7: iteration 143030/ 173500 | consumed samples: 36615680 | consumed tokens: 74988912640 | elapsed time per iteration (s): 0.08 | learning rate: 3.362E-05 | global batch size: 256 | lm loss: 4.507413E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.428 | TFLOPs: 11.93 | 7: iteration 143040/ 173500 | consumed samples: 36618240 | consumed tokens: 74994155520 | elapsed time per iteration (s): 0.08 | learning rate: 3.361E-05 | global batch size: 256 | lm loss: 4.504589E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.789 | TFLOPs: 12.00 | 7: iteration 143050/ 173500 | consumed samples: 36620800 | consumed tokens: 74999398400 | elapsed time per iteration (s): 0.08 | learning rate: 3.360E-05 | global batch size: 256 | lm loss: 4.489552E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.870 | TFLOPs: 12.00 | 7: iteration 143060/ 173500 | consumed samples: 36623360 | consumed tokens: 75004641280 | elapsed time per iteration (s): 0.08 | learning rate: 3.359E-05 | global batch size: 256 | lm loss: 4.523212E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.921 | TFLOPs: 11.41 | 7: iteration 143070/ 173500 | consumed samples: 36625920 | consumed tokens: 75009884160 | elapsed time per iteration (s): 0.08 | learning rate: 3.358E-05 | global batch size: 256 | lm loss: 4.506881E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.758 | TFLOPs: 11.85 | 7: iteration 143080/ 173500 | consumed samples: 36628480 | consumed tokens: 75015127040 | elapsed time per iteration (s): 0.08 | learning rate: 3.358E-05 | global batch size: 256 | lm loss: 4.507236E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.864 | TFLOPs: 11.95 | 7: iteration 143090/ 173500 | consumed samples: 36631040 | consumed tokens: 75020369920 | elapsed time per iteration (s): 0.08 | learning rate: 3.357E-05 | global batch size: 256 | lm loss: 4.501241E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.800 | TFLOPs: 11.79 | 7: iteration 143100/ 173500 | consumed samples: 36633600 | consumed tokens: 75025612800 | elapsed time per iteration (s): 0.08 | learning rate: 3.356E-05 | global batch size: 256 | lm loss: 4.510924E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.420 | TFLOPs: 11.85 | 7: iteration 143110/ 173500 | consumed samples: 36636160 | consumed tokens: 75030855680 | elapsed time per iteration (s): 0.08 | learning rate: 3.355E-05 | global batch size: 256 | lm loss: 4.503094E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.564 | TFLOPs: 11.89 | 7: iteration 143120/ 173500 | consumed samples: 36638720 | consumed tokens: 75036098560 | elapsed time per iteration (s): 0.08 | learning rate: 3.354E-05 | global batch size: 256 | lm loss: 4.502435E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.551 | TFLOPs: 11.92 | 7: iteration 143130/ 173500 | consumed samples: 36641280 | consumed tokens: 75041341440 | elapsed time per iteration (s): 0.08 | learning rate: 3.353E-05 | global batch size: 256 | lm loss: 4.508498E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.299 | TFLOPs: 11.89 | 7: iteration 143140/ 173500 | consumed samples: 36643840 | consumed tokens: 75046584320 | elapsed time per iteration (s): 0.08 | learning rate: 3.352E-05 | global batch size: 256 | lm loss: 4.507663E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.438 | TFLOPs: 11.93 | 7: iteration 143150/ 173500 | consumed samples: 36646400 | consumed tokens: 75051827200 | elapsed time per iteration (s): 0.08 | learning rate: 3.351E-05 | global batch size: 256 | lm loss: 4.516843E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.692 | TFLOPs: 11.89 | 7: iteration 143160/ 173500 | consumed samples: 36648960 | consumed tokens: 75057070080 | elapsed time per iteration (s): 0.08 | learning rate: 3.351E-05 | global batch size: 256 | lm loss: 4.506969E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.580 | TFLOPs: 11.92 | 7: iteration 143170/ 173500 | consumed samples: 36651520 | consumed tokens: 75062312960 | elapsed time per iteration (s): 0.08 | learning rate: 3.350E-05 | global batch size: 256 | lm loss: 4.501853E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.351 | TFLOPs: 11.94 | 7: iteration 143180/ 173500 | consumed samples: 36654080 | consumed tokens: 75067555840 | elapsed time per iteration (s): 0.08 | learning rate: 3.349E-05 | global batch size: 256 | lm loss: 4.519642E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.348 | TFLOPs: 11.96 | 7: iteration 143190/ 173500 | consumed samples: 36656640 | consumed tokens: 75072798720 | elapsed time per iteration (s): 0.08 | learning rate: 3.348E-05 | global batch size: 256 | lm loss: 4.490248E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.471 | TFLOPs: 11.99 | 7: iteration 143200/ 173500 | consumed samples: 36659200 | consumed tokens: 75078041600 | elapsed time per iteration (s): 0.08 | learning rate: 3.347E-05 | global batch size: 256 | lm loss: 4.500744E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.981 | TFLOPs: 12.00 | 7: iteration 143210/ 173500 | consumed samples: 36661760 | consumed tokens: 75083284480 | elapsed time per iteration (s): 0.08 | learning rate: 3.346E-05 | global batch size: 256 | lm loss: 4.513018E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.994 | TFLOPs: 11.94 | 7: iteration 143220/ 173500 | consumed samples: 36664320 | consumed tokens: 75088527360 | elapsed time per iteration (s): 0.08 | learning rate: 3.345E-05 | global batch size: 256 | lm loss: 4.512999E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.676 | TFLOPs: 11.97 | 7: iteration 143230/ 173500 | consumed samples: 36666880 | consumed tokens: 75093770240 | elapsed time per iteration (s): 0.08 | learning rate: 3.344E-05 | global batch size: 256 | lm loss: 4.515883E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.762 | TFLOPs: 11.95 | 7: iteration 143240/ 173500 | consumed samples: 36669440 | consumed tokens: 75099013120 | elapsed time per iteration (s): 0.08 | learning rate: 3.344E-05 | global batch size: 256 | lm loss: 4.509084E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.460 | TFLOPs: 11.83 | 7: iteration 143250/ 173500 | consumed samples: 36672000 | consumed tokens: 75104256000 | elapsed time per iteration (s): 0.08 | learning rate: 3.343E-05 | global batch size: 256 | lm loss: 4.485269E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.741 | TFLOPs: 11.92 | 7: iteration 143260/ 173500 | consumed samples: 36674560 | consumed tokens: 75109498880 | elapsed time per iteration (s): 0.08 | learning rate: 3.342E-05 | global batch size: 256 | lm loss: 4.502259E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.936 | TFLOPs: 11.84 | 7: iteration 143270/ 173500 | consumed samples: 36677120 | consumed tokens: 75114741760 | elapsed time per iteration (s): 0.08 | learning rate: 3.341E-05 | global batch size: 256 | lm loss: 4.512131E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.001 | TFLOPs: 11.90 | 7: iteration 143280/ 173500 | consumed samples: 36679680 | consumed tokens: 75119984640 | elapsed time per iteration (s): 0.08 | learning rate: 3.340E-05 | global batch size: 256 | lm loss: 4.513763E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.249 | TFLOPs: 11.95 | 7: iteration 143290/ 173500 | consumed samples: 36682240 | consumed tokens: 75125227520 | elapsed time per iteration (s): 0.08 | learning rate: 3.339E-05 | global batch size: 256 | lm loss: 4.510068E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.577 | TFLOPs: 11.97 | 7: iteration 143300/ 173500 | consumed samples: 36684800 | consumed tokens: 75130470400 | elapsed time per iteration (s): 0.08 | learning rate: 3.338E-05 | global batch size: 256 | lm loss: 4.505860E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.773 | TFLOPs: 12.03 | 7: iteration 143310/ 173500 | consumed samples: 36687360 | consumed tokens: 75135713280 | elapsed time per iteration (s): 0.08 | learning rate: 3.338E-05 | global batch size: 256 | lm loss: 4.524811E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.491 | TFLOPs: 12.01 | 7: iteration 143320/ 173500 | consumed samples: 36689920 | consumed tokens: 75140956160 | elapsed time per iteration (s): 0.08 | learning rate: 3.337E-05 | global batch size: 256 | lm loss: 4.506501E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.483 | TFLOPs: 11.26 | 7: iteration 143330/ 173500 | consumed samples: 36692480 | consumed tokens: 75146199040 | elapsed time per iteration (s): 0.08 | learning rate: 3.336E-05 | global batch size: 256 | lm loss: 4.508336E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.532 | TFLOPs: 11.92 | 7: iteration 143340/ 173500 | consumed samples: 36695040 | consumed tokens: 75151441920 | elapsed time per iteration (s): 0.08 | learning rate: 3.335E-05 | global batch size: 256 | lm loss: 4.500576E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.978 | TFLOPs: 11.91 | 7: iteration 143350/ 173500 | consumed samples: 36697600 | consumed tokens: 75156684800 | elapsed time per iteration (s): 0.08 | learning rate: 3.334E-05 | global batch size: 256 | lm loss: 4.510595E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.019 | TFLOPs: 11.78 | 7: iteration 143360/ 173500 | consumed samples: 36700160 | consumed tokens: 75161927680 | elapsed time per iteration (s): 0.08 | learning rate: 3.333E-05 | global batch size: 256 | lm loss: 4.500648E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.480 | TFLOPs: 11.83 | 7: iteration 143370/ 173500 | consumed samples: 36702720 | consumed tokens: 75167170560 | elapsed time per iteration (s): 0.08 | learning rate: 3.332E-05 | global batch size: 256 | lm loss: 4.505254E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.114 | TFLOPs: 11.84 | 7: iteration 143380/ 173500 | consumed samples: 36705280 | consumed tokens: 75172413440 | elapsed time per iteration (s): 0.08 | learning rate: 3.332E-05 | global batch size: 256 | lm loss: 4.503827E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.644 | TFLOPs: 11.80 | 7: iteration 143390/ 173500 | consumed samples: 36707840 | consumed tokens: 75177656320 | elapsed time per iteration (s): 0.08 | learning rate: 3.331E-05 | global batch size: 256 | lm loss: 4.517067E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.336 | TFLOPs: 11.44 | 7: iteration 143400/ 173500 | consumed samples: 36710400 | consumed tokens: 75182899200 | elapsed time per iteration (s): 0.08 | learning rate: 3.330E-05 | global batch size: 256 | lm loss: 4.499474E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.370 | TFLOPs: 11.80 | 7: iteration 143410/ 173500 | consumed samples: 36712960 | consumed tokens: 75188142080 | elapsed time per iteration (s): 0.09 | learning rate: 3.329E-05 | global batch size: 256 | lm loss: 4.488697E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.189 | TFLOPs: 10.78 | 7: iteration 143420/ 173500 | consumed samples: 36715520 | consumed tokens: 75193384960 | elapsed time per iteration (s): 0.13 | learning rate: 3.328E-05 | global batch size: 256 | lm loss: 4.505360E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2000.429 | TFLOPs: 7.44 | 7: iteration 143430/ 173500 | consumed samples: 36718080 | consumed tokens: 75198627840 | elapsed time per iteration (s): 0.12 | learning rate: 3.327E-05 | global batch size: 256 | lm loss: 4.507881E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.690 | TFLOPs: 7.78 | 7: iteration 143440/ 173500 | consumed samples: 36720640 | consumed tokens: 75203870720 | elapsed time per iteration (s): 0.13 | learning rate: 3.326E-05 | global batch size: 256 | lm loss: 4.510755E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.799 | TFLOPs: 7.61 | 7: iteration 143450/ 173500 | consumed samples: 36723200 | consumed tokens: 75209113600 | elapsed time per iteration (s): 0.09 | learning rate: 3.326E-05 | global batch size: 256 | lm loss: 4.501659E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.021 | TFLOPs: 11.18 | 7: iteration 143460/ 173500 | consumed samples: 36725760 | consumed tokens: 75214356480 | elapsed time per iteration (s): 0.08 | learning rate: 3.325E-05 | global batch size: 256 | lm loss: 4.510868E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.678 | TFLOPs: 11.90 | 7: iteration 143470/ 173500 | consumed samples: 36728320 | consumed tokens: 75219599360 | elapsed time per iteration (s): 0.08 | learning rate: 3.324E-05 | global batch size: 256 | lm loss: 4.498471E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.481 | TFLOPs: 11.93 | 7: iteration 143480/ 173500 | consumed samples: 36730880 | consumed tokens: 75224842240 | elapsed time per iteration (s): 0.09 | learning rate: 3.323E-05 | global batch size: 256 | lm loss: 4.493655E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2883.606 | TFLOPs: 10.73 | 7: iteration 143490/ 173500 | consumed samples: 36733440 | consumed tokens: 75230085120 | elapsed time per iteration (s): 0.08 | learning rate: 3.322E-05 | global batch size: 256 | lm loss: 4.507587E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.258 | TFLOPs: 11.52 | 7: iteration 143500/ 173500 | consumed samples: 36736000 | consumed tokens: 75235328000 | elapsed time per iteration (s): 0.08 | learning rate: 3.321E-05 | global batch size: 256 | lm loss: 4.508309E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.576 | TFLOPs: 11.88 | 7: iteration 143510/ 173500 | consumed samples: 36738560 | consumed tokens: 75240570880 | elapsed time per iteration (s): 0.09 | learning rate: 3.320E-05 | global batch size: 256 | lm loss: 4.508659E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.728 | TFLOPs: 10.55 | 7: iteration 143520/ 173500 | consumed samples: 36741120 | consumed tokens: 75245813760 | elapsed time per iteration (s): 0.10 | learning rate: 3.320E-05 | global batch size: 256 | lm loss: 4.506054E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2652.160 | TFLOPs: 9.86 | 7: iteration 143530/ 173500 | consumed samples: 36743680 | consumed tokens: 75251056640 | elapsed time per iteration (s): 0.09 | learning rate: 3.319E-05 | global batch size: 256 | lm loss: 4.509268E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.486 | TFLOPs: 11.07 | 7: iteration 143540/ 173500 | consumed samples: 36746240 | consumed tokens: 75256299520 | elapsed time per iteration (s): 0.09 | learning rate: 3.318E-05 | global batch size: 256 | lm loss: 4.516777E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.532 | TFLOPs: 10.77 | 7: iteration 143550/ 173500 | consumed samples: 36748800 | consumed tokens: 75261542400 | elapsed time per iteration (s): 0.08 | learning rate: 3.317E-05 | global batch size: 256 | lm loss: 4.508032E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.473 | TFLOPs: 11.95 | 7: iteration 143560/ 173500 | consumed samples: 36751360 | consumed tokens: 75266785280 | elapsed time per iteration (s): 0.08 | learning rate: 3.316E-05 | global batch size: 256 | lm loss: 4.498701E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.579 | TFLOPs: 11.93 | 7: iteration 143570/ 173500 | consumed samples: 36753920 | consumed tokens: 75272028160 | elapsed time per iteration (s): 0.08 | learning rate: 3.315E-05 | global batch size: 256 | lm loss: 4.526324E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.804 | TFLOPs: 11.96 | 7: iteration 143580/ 173500 | consumed samples: 36756480 | consumed tokens: 75277271040 | elapsed time per iteration (s): 0.08 | learning rate: 3.314E-05 | global batch size: 256 | lm loss: 4.506982E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.567 | TFLOPs: 11.84 | 7: iteration 143590/ 173500 | consumed samples: 36759040 | consumed tokens: 75282513920 | elapsed time per iteration (s): 0.08 | learning rate: 3.314E-05 | global batch size: 256 | lm loss: 4.511814E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.832 | TFLOPs: 11.55 | 7: iteration 143600/ 173500 | consumed samples: 36761600 | consumed tokens: 75287756800 | elapsed time per iteration (s): 0.08 | learning rate: 3.313E-05 | global batch size: 256 | lm loss: 4.511448E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.295 | TFLOPs: 11.83 | 7: iteration 143610/ 173500 | consumed samples: 36764160 | consumed tokens: 75292999680 | elapsed time per iteration (s): 0.08 | learning rate: 3.312E-05 | global batch size: 256 | lm loss: 4.500938E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.621 | TFLOPs: 11.86 | 7: iteration 143620/ 173500 | consumed samples: 36766720 | consumed tokens: 75298242560 | elapsed time per iteration (s): 0.08 | learning rate: 3.311E-05 | global batch size: 256 | lm loss: 4.504883E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.049 | TFLOPs: 11.63 | 7: iteration 143630/ 173500 | consumed samples: 36769280 | consumed tokens: 75303485440 | elapsed time per iteration (s): 0.08 | learning rate: 3.310E-05 | global batch size: 256 | lm loss: 4.506622E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.086 | TFLOPs: 11.90 | 7: iteration 143640/ 173500 | consumed samples: 36771840 | consumed tokens: 75308728320 | elapsed time per iteration (s): 0.08 | learning rate: 3.309E-05 | global batch size: 256 | lm loss: 4.513017E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.572 | TFLOPs: 11.81 | 7: iteration 143650/ 173500 | consumed samples: 36774400 | consumed tokens: 75313971200 | elapsed time per iteration (s): 0.08 | learning rate: 3.308E-05 | global batch size: 256 | lm loss: 4.504803E+00 | grad norm: 0.457 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.309 | TFLOPs: 11.92 | 7: iteration 143660/ 173500 | consumed samples: 36776960 | consumed tokens: 75319214080 | elapsed time per iteration (s): 0.08 | learning rate: 3.308E-05 | global batch size: 256 | lm loss: 4.489102E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.636 | TFLOPs: 11.94 | 7: iteration 143670/ 173500 | consumed samples: 36779520 | consumed tokens: 75324456960 | elapsed time per iteration (s): 0.08 | learning rate: 3.307E-05 | global batch size: 256 | lm loss: 4.517530E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.998 | TFLOPs: 11.87 | 7: iteration 143680/ 173500 | consumed samples: 36782080 | consumed tokens: 75329699840 | elapsed time per iteration (s): 0.08 | learning rate: 3.306E-05 | global batch size: 256 | lm loss: 4.506749E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.935 | TFLOPs: 11.86 | 7: iteration 143690/ 173500 | consumed samples: 36784640 | consumed tokens: 75334942720 | elapsed time per iteration (s): 0.08 | learning rate: 3.305E-05 | global batch size: 256 | lm loss: 4.502847E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.123 | TFLOPs: 11.88 | 7: iteration 143700/ 173500 | consumed samples: 36787200 | consumed tokens: 75340185600 | elapsed time per iteration (s): 0.08 | learning rate: 3.304E-05 | global batch size: 256 | lm loss: 4.503873E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.789 | TFLOPs: 11.90 | 7: iteration 143710/ 173500 | consumed samples: 36789760 | consumed tokens: 75345428480 | elapsed time per iteration (s): 0.08 | learning rate: 3.303E-05 | global batch size: 256 | lm loss: 4.511587E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.847 | TFLOPs: 11.82 | 7: iteration 143720/ 173500 | consumed samples: 36792320 | consumed tokens: 75350671360 | elapsed time per iteration (s): 0.08 | learning rate: 3.302E-05 | global batch size: 256 | lm loss: 4.504077E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.097 | TFLOPs: 11.74 | 7: iteration 143730/ 173500 | consumed samples: 36794880 | consumed tokens: 75355914240 | elapsed time per iteration (s): 0.08 | learning rate: 3.302E-05 | global batch size: 256 | lm loss: 4.509308E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.787 | TFLOPs: 11.30 | 7: iteration 143740/ 173500 | consumed samples: 36797440 | consumed tokens: 75361157120 | elapsed time per iteration (s): 0.08 | learning rate: 3.301E-05 | global batch size: 256 | lm loss: 4.503241E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.939 | TFLOPs: 11.84 | 7: iteration 143750/ 173500 | consumed samples: 36800000 | consumed tokens: 75366400000 | elapsed time per iteration (s): 0.08 | learning rate: 3.300E-05 | global batch size: 256 | lm loss: 4.502138E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.263 | TFLOPs: 11.91 | 7: iteration 143760/ 173500 | consumed samples: 36802560 | consumed tokens: 75371642880 | elapsed time per iteration (s): 0.08 | learning rate: 3.299E-05 | global batch size: 256 | lm loss: 4.504739E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.847 | TFLOPs: 11.91 | 7: iteration 143770/ 173500 | consumed samples: 36805120 | consumed tokens: 75376885760 | elapsed time per iteration (s): 0.08 | learning rate: 3.298E-05 | global batch size: 256 | lm loss: 4.497285E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.399 | TFLOPs: 11.91 | 7: iteration 143780/ 173500 | consumed samples: 36807680 | consumed tokens: 75382128640 | elapsed time per iteration (s): 0.08 | learning rate: 3.297E-05 | global batch size: 256 | lm loss: 4.508327E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.026 | TFLOPs: 11.90 | 7: iteration 143790/ 173500 | consumed samples: 36810240 | consumed tokens: 75387371520 | elapsed time per iteration (s): 0.08 | learning rate: 3.296E-05 | global batch size: 256 | lm loss: 4.491834E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.507 | TFLOPs: 11.91 | 7: iteration 143800/ 173500 | consumed samples: 36812800 | consumed tokens: 75392614400 | elapsed time per iteration (s): 0.08 | learning rate: 3.296E-05 | global batch size: 256 | lm loss: 4.495151E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.675 | TFLOPs: 11.83 | 7: iteration 143810/ 173500 | consumed samples: 36815360 | consumed tokens: 75397857280 | elapsed time per iteration (s): 0.08 | learning rate: 3.295E-05 | global batch size: 256 | lm loss: 4.510997E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.587 | TFLOPs: 11.92 | 7: iteration 143820/ 173500 | consumed samples: 36817920 | consumed tokens: 75403100160 | elapsed time per iteration (s): 0.08 | learning rate: 3.294E-05 | global batch size: 256 | lm loss: 4.509711E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.558 | TFLOPs: 11.91 | 7: iteration 143830/ 173500 | consumed samples: 36820480 | consumed tokens: 75408343040 | elapsed time per iteration (s): 0.08 | learning rate: 3.293E-05 | global batch size: 256 | lm loss: 4.497402E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.442 | TFLOPs: 11.93 | 7: iteration 143840/ 173500 | consumed samples: 36823040 | consumed tokens: 75413585920 | elapsed time per iteration (s): 0.08 | learning rate: 3.292E-05 | global batch size: 256 | lm loss: 4.504185E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.645 | TFLOPs: 11.93 | 7: iteration 143850/ 173500 | consumed samples: 36825600 | consumed tokens: 75418828800 | elapsed time per iteration (s): 0.08 | learning rate: 3.291E-05 | global batch size: 256 | lm loss: 4.500558E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.850 | TFLOPs: 11.86 | 7: iteration 143860/ 173500 | consumed samples: 36828160 | consumed tokens: 75424071680 | elapsed time per iteration (s): 0.09 | learning rate: 3.290E-05 | global batch size: 256 | lm loss: 4.515443E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.071 | TFLOPs: 10.78 | 7: iteration 143870/ 173500 | consumed samples: 36830720 | consumed tokens: 75429314560 | elapsed time per iteration (s): 0.09 | learning rate: 3.290E-05 | global batch size: 256 | lm loss: 4.507898E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.573 | TFLOPs: 10.53 | 7: iteration 143880/ 173500 | consumed samples: 36833280 | consumed tokens: 75434557440 | elapsed time per iteration (s): 0.08 | learning rate: 3.289E-05 | global batch size: 256 | lm loss: 4.491241E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.324 | TFLOPs: 11.89 | 7: iteration 143890/ 173500 | consumed samples: 36835840 | consumed tokens: 75439800320 | elapsed time per iteration (s): 0.08 | learning rate: 3.288E-05 | global batch size: 256 | lm loss: 4.512641E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.896 | TFLOPs: 11.87 | 7: iteration 143900/ 173500 | consumed samples: 36838400 | consumed tokens: 75445043200 | elapsed time per iteration (s): 0.08 | learning rate: 3.287E-05 | global batch size: 256 | lm loss: 4.510426E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.156 | TFLOPs: 11.84 | 7: iteration 143910/ 173500 | consumed samples: 36840960 | consumed tokens: 75450286080 | elapsed time per iteration (s): 0.08 | learning rate: 3.286E-05 | global batch size: 256 | lm loss: 4.522739E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.457 | TFLOPs: 11.87 | 7: iteration 143920/ 173500 | consumed samples: 36843520 | consumed tokens: 75455528960 | elapsed time per iteration (s): 0.08 | learning rate: 3.285E-05 | global batch size: 256 | lm loss: 4.508023E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.435 | TFLOPs: 11.67 | 7: iteration 143930/ 173500 | consumed samples: 36846080 | consumed tokens: 75460771840 | elapsed time per iteration (s): 0.08 | learning rate: 3.285E-05 | global batch size: 256 | lm loss: 4.505153E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.648 | TFLOPs: 11.93 | 7: iteration 143940/ 173500 | consumed samples: 36848640 | consumed tokens: 75466014720 | elapsed time per iteration (s): 0.08 | learning rate: 3.284E-05 | global batch size: 256 | lm loss: 4.496410E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.612 | TFLOPs: 11.91 | 7: iteration 143950/ 173500 | consumed samples: 36851200 | consumed tokens: 75471257600 | elapsed time per iteration (s): 0.08 | learning rate: 3.283E-05 | global batch size: 256 | lm loss: 4.510995E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.702 | TFLOPs: 11.88 | 7: iteration 143960/ 173500 | consumed samples: 36853760 | consumed tokens: 75476500480 | elapsed time per iteration (s): 0.08 | learning rate: 3.282E-05 | global batch size: 256 | lm loss: 4.516950E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.105 | TFLOPs: 11.90 | 7: iteration 143970/ 173500 | consumed samples: 36856320 | consumed tokens: 75481743360 | elapsed time per iteration (s): 0.08 | learning rate: 3.281E-05 | global batch size: 256 | lm loss: 4.516802E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.106 | TFLOPs: 11.65 | 7: iteration 143980/ 173500 | consumed samples: 36858880 | consumed tokens: 75486986240 | elapsed time per iteration (s): 0.08 | learning rate: 3.280E-05 | global batch size: 256 | lm loss: 4.510126E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.483 | TFLOPs: 11.98 | 7: iteration 143990/ 173500 | consumed samples: 36861440 | consumed tokens: 75492229120 | elapsed time per iteration (s): 0.08 | learning rate: 3.279E-05 | global batch size: 256 | lm loss: 4.511284E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.157 | TFLOPs: 11.36 | 0: [2023-03-17 03:47:45,479] [INFO] [logging.py:68:log_dist] [Rank 0] step=144000, skipped=0, lr=[3.278611280458685e-05, 3.278611280458685e-05, 3.278611280458685e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 144000/ 173500 | consumed samples: 36864000 | consumed tokens: 75497472000 | elapsed time per iteration (s): 0.08 | learning rate: 3.279E-05 | global batch size: 256 | lm loss: 4.507926E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.099 | TFLOPs: 12.00 | 0: steps: 144000 loss: 4.5250 iter time (s): 0.086 samples/sec: 2988.835 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 144000 | lm loss value: 4.445435E+00 | lm loss PPL: 8.523695E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 144000 to checkpoints_14m91b100m 0: [2023-03-17 03:47:45,537] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step144000 is begin to save! 0: [2023-03-17 03:47:45,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:47:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:47:45,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:47:45,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:47:45,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:47:45,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:47:45,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:47:45,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:47:45,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:47:45,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:47:45,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:47:45,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:47:45,579] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step144000/mp_rank_00_model_states.pt 0: [2023-03-17 03:47:45,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:47:45,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:47:45,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:47:45,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 1: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 1: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 1: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 1: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 1: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 4: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 2: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 7: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: [2023-03-17 03:47:45,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:47:45,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 3: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:47:45,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 6: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:47:45,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 5: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:47:45,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step144000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:47:45,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step144000 is ready now! 0: successfully saved checkpoint at iteration 144000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 77.77 7: iteration 144010/ 173500 | consumed samples: 36866560 | consumed tokens: 75502714880 | elapsed time per iteration (s): 0.09 | learning rate: 3.278E-05 | global batch size: 256 | lm loss: 4.505496E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.063 | TFLOPs: 10.31 | 7: iteration 144020/ 173500 | consumed samples: 36869120 | consumed tokens: 75507957760 | elapsed time per iteration (s): 0.08 | learning rate: 3.277E-05 | global batch size: 256 | lm loss: 4.503339E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.684 | TFLOPs: 12.04 | 7: iteration 144030/ 173500 | consumed samples: 36871680 | consumed tokens: 75513200640 | elapsed time per iteration (s): 0.08 | learning rate: 3.276E-05 | global batch size: 256 | lm loss: 4.509598E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.858 | TFLOPs: 12.06 | 7: iteration 144040/ 173500 | consumed samples: 36874240 | consumed tokens: 75518443520 | elapsed time per iteration (s): 0.08 | learning rate: 3.275E-05 | global batch size: 256 | lm loss: 4.517525E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.936 | TFLOPs: 11.76 | 7: iteration 144050/ 173500 | consumed samples: 36876800 | consumed tokens: 75523686400 | elapsed time per iteration (s): 0.08 | learning rate: 3.274E-05 | global batch size: 256 | lm loss: 4.503503E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.907 | TFLOPs: 12.04 | 7: iteration 144060/ 173500 | consumed samples: 36879360 | consumed tokens: 75528929280 | elapsed time per iteration (s): 0.08 | learning rate: 3.274E-05 | global batch size: 256 | lm loss: 4.509070E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.745 | TFLOPs: 12.05 | 7: iteration 144070/ 173500 | consumed samples: 36881920 | consumed tokens: 75534172160 | elapsed time per iteration (s): 0.08 | learning rate: 3.273E-05 | global batch size: 256 | lm loss: 4.507056E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.172 | TFLOPs: 11.78 | 7: iteration 144080/ 173500 | consumed samples: 36884480 | consumed tokens: 75539415040 | elapsed time per iteration (s): 0.08 | learning rate: 3.272E-05 | global batch size: 256 | lm loss: 4.509153E+00 | grad norm: 0.724 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3046.934 | TFLOPs: 11.33 | 7: iteration 144090/ 173500 | consumed samples: 36887040 | consumed tokens: 75544657920 | elapsed time per iteration (s): 0.09 | learning rate: 3.271E-05 | global batch size: 256 | lm loss: 4.498475E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2761.848 | TFLOPs: 10.27 | 7: iteration 144100/ 173500 | consumed samples: 36889600 | consumed tokens: 75549900800 | elapsed time per iteration (s): 0.09 | learning rate: 3.270E-05 | global batch size: 256 | lm loss: 4.513435E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.195 | TFLOPs: 11.08 | 7: iteration 144110/ 173500 | consumed samples: 36892160 | consumed tokens: 75555143680 | elapsed time per iteration (s): 0.09 | learning rate: 3.269E-05 | global batch size: 256 | lm loss: 4.503806E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2975.252 | TFLOPs: 11.07 | 7: iteration 144120/ 173500 | consumed samples: 36894720 | consumed tokens: 75560386560 | elapsed time per iteration (s): 0.08 | learning rate: 3.268E-05 | global batch size: 256 | lm loss: 4.500076E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.095 | TFLOPs: 12.02 | 7: iteration 144130/ 173500 | consumed samples: 36897280 | consumed tokens: 75565629440 | elapsed time per iteration (s): 0.09 | learning rate: 3.268E-05 | global batch size: 256 | lm loss: 4.519685E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.771 | TFLOPs: 10.19 | 7: iteration 144140/ 173500 | consumed samples: 36899840 | consumed tokens: 75570872320 | elapsed time per iteration (s): 0.12 | learning rate: 3.267E-05 | global batch size: 256 | lm loss: 4.507059E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.372 | TFLOPs: 7.70 | 7: iteration 144150/ 173500 | consumed samples: 36902400 | consumed tokens: 75576115200 | elapsed time per iteration (s): 0.12 | learning rate: 3.266E-05 | global batch size: 256 | lm loss: 4.495299E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2174.404 | TFLOPs: 8.09 | 7: iteration 144160/ 173500 | consumed samples: 36904960 | consumed tokens: 75581358080 | elapsed time per iteration (s): 0.10 | learning rate: 3.265E-05 | global batch size: 256 | lm loss: 4.515213E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2504.658 | TFLOPs: 9.32 | 7: iteration 144170/ 173500 | consumed samples: 36907520 | consumed tokens: 75586600960 | elapsed time per iteration (s): 0.08 | learning rate: 3.264E-05 | global batch size: 256 | lm loss: 4.513542E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.969 | TFLOPs: 11.75 | 7: iteration 144180/ 173500 | consumed samples: 36910080 | consumed tokens: 75591843840 | elapsed time per iteration (s): 0.08 | learning rate: 3.263E-05 | global batch size: 256 | lm loss: 4.513514E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.145 | TFLOPs: 11.80 | 7: iteration 144190/ 173500 | consumed samples: 36912640 | consumed tokens: 75597086720 | elapsed time per iteration (s): 0.10 | learning rate: 3.263E-05 | global batch size: 256 | lm loss: 4.506234E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2439.899 | TFLOPs: 9.08 | 7: iteration 144200/ 173500 | consumed samples: 36915200 | consumed tokens: 75602329600 | elapsed time per iteration (s): 0.08 | learning rate: 3.262E-05 | global batch size: 256 | lm loss: 4.498510E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.513 | TFLOPs: 11.98 | 7: iteration 144210/ 173500 | consumed samples: 36917760 | consumed tokens: 75607572480 | elapsed time per iteration (s): 0.08 | learning rate: 3.261E-05 | global batch size: 256 | lm loss: 4.502919E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.552 | TFLOPs: 11.89 | 7: iteration 144220/ 173500 | consumed samples: 36920320 | consumed tokens: 75612815360 | elapsed time per iteration (s): 0.08 | learning rate: 3.260E-05 | global batch size: 256 | lm loss: 4.512539E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.933 | TFLOPs: 11.79 | 7: iteration 144230/ 173500 | consumed samples: 36922880 | consumed tokens: 75618058240 | elapsed time per iteration (s): 0.08 | learning rate: 3.259E-05 | global batch size: 256 | lm loss: 4.513938E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.999 | TFLOPs: 11.86 | 7: iteration 144240/ 173500 | consumed samples: 36925440 | consumed tokens: 75623301120 | elapsed time per iteration (s): 0.08 | learning rate: 3.258E-05 | global batch size: 256 | lm loss: 4.512134E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.745 | TFLOPs: 11.90 | 7: iteration 144250/ 173500 | consumed samples: 36928000 | consumed tokens: 75628544000 | elapsed time per iteration (s): 0.08 | learning rate: 3.258E-05 | global batch size: 256 | lm loss: 4.501677E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.937 | TFLOPs: 11.88 | 7: iteration 144260/ 173500 | consumed samples: 36930560 | consumed tokens: 75633786880 | elapsed time per iteration (s): 0.09 | learning rate: 3.257E-05 | global batch size: 256 | lm loss: 4.504238E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.341 | TFLOPs: 11.06 | 7: iteration 144270/ 173500 | consumed samples: 36933120 | consumed tokens: 75639029760 | elapsed time per iteration (s): 0.08 | learning rate: 3.256E-05 | global batch size: 256 | lm loss: 4.502117E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3091.590 | TFLOPs: 11.50 | 7: iteration 144280/ 173500 | consumed samples: 36935680 | consumed tokens: 75644272640 | elapsed time per iteration (s): 0.08 | learning rate: 3.255E-05 | global batch size: 256 | lm loss: 4.517766E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.253 | TFLOPs: 11.89 | 7: iteration 144290/ 173500 | consumed samples: 36938240 | consumed tokens: 75649515520 | elapsed time per iteration (s): 0.08 | learning rate: 3.254E-05 | global batch size: 256 | lm loss: 4.497247E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.439 | TFLOPs: 11.89 | 7: iteration 144300/ 173500 | consumed samples: 36940800 | consumed tokens: 75654758400 | elapsed time per iteration (s): 0.08 | learning rate: 3.253E-05 | global batch size: 256 | lm loss: 4.521836E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.824 | TFLOPs: 11.95 | 7: iteration 144310/ 173500 | consumed samples: 36943360 | consumed tokens: 75660001280 | elapsed time per iteration (s): 0.08 | learning rate: 3.253E-05 | global batch size: 256 | lm loss: 4.512101E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.066 | TFLOPs: 11.85 | 7: iteration 144320/ 173500 | consumed samples: 36945920 | consumed tokens: 75665244160 | elapsed time per iteration (s): 0.08 | learning rate: 3.252E-05 | global batch size: 256 | lm loss: 4.495671E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.424 | TFLOPs: 11.93 | 7: iteration 144330/ 173500 | consumed samples: 36948480 | consumed tokens: 75670487040 | elapsed time per iteration (s): 0.08 | learning rate: 3.251E-05 | global batch size: 256 | lm loss: 4.497507E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.643 | TFLOPs: 11.92 | 7: iteration 144340/ 173500 | consumed samples: 36951040 | consumed tokens: 75675729920 | elapsed time per iteration (s): 0.08 | learning rate: 3.250E-05 | global batch size: 256 | lm loss: 4.519549E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.653 | TFLOPs: 11.88 | 7: iteration 144350/ 173500 | consumed samples: 36953600 | consumed tokens: 75680972800 | elapsed time per iteration (s): 0.08 | learning rate: 3.249E-05 | global batch size: 256 | lm loss: 4.507330E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.458 | TFLOPs: 11.92 | 7: iteration 144360/ 173500 | consumed samples: 36956160 | consumed tokens: 75686215680 | elapsed time per iteration (s): 0.08 | learning rate: 3.248E-05 | global batch size: 256 | lm loss: 4.517265E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.366 | TFLOPs: 11.95 | 7: iteration 144370/ 173500 | consumed samples: 36958720 | consumed tokens: 75691458560 | elapsed time per iteration (s): 0.08 | learning rate: 3.247E-05 | global batch size: 256 | lm loss: 4.501285E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.567 | TFLOPs: 11.86 | 7: iteration 144380/ 173500 | consumed samples: 36961280 | consumed tokens: 75696701440 | elapsed time per iteration (s): 0.08 | learning rate: 3.247E-05 | global batch size: 256 | lm loss: 4.508275E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.950 | TFLOPs: 11.91 | 7: iteration 144390/ 173500 | consumed samples: 36963840 | consumed tokens: 75701944320 | elapsed time per iteration (s): 0.08 | learning rate: 3.246E-05 | global batch size: 256 | lm loss: 4.509661E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.735 | TFLOPs: 11.93 | 7: iteration 144400/ 173500 | consumed samples: 36966400 | consumed tokens: 75707187200 | elapsed time per iteration (s): 0.08 | learning rate: 3.245E-05 | global batch size: 256 | lm loss: 4.511905E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.166 | TFLOPs: 11.90 | 7: iteration 144410/ 173500 | consumed samples: 36968960 | consumed tokens: 75712430080 | elapsed time per iteration (s): 0.08 | learning rate: 3.244E-05 | global batch size: 256 | lm loss: 4.501330E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.749 | TFLOPs: 12.05 | 7: iteration 144420/ 173500 | consumed samples: 36971520 | consumed tokens: 75717672960 | elapsed time per iteration (s): 0.08 | learning rate: 3.243E-05 | global batch size: 256 | lm loss: 4.515768E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.572 | TFLOPs: 12.06 | 7: iteration 144430/ 173500 | consumed samples: 36974080 | consumed tokens: 75722915840 | elapsed time per iteration (s): 0.09 | learning rate: 3.242E-05 | global batch size: 256 | lm loss: 4.518715E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2941.519 | TFLOPs: 10.94 | 7: iteration 144440/ 173500 | consumed samples: 36976640 | consumed tokens: 75728158720 | elapsed time per iteration (s): 0.08 | learning rate: 3.242E-05 | global batch size: 256 | lm loss: 4.502856E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.972 | TFLOPs: 11.76 | 7: iteration 144450/ 173500 | consumed samples: 36979200 | consumed tokens: 75733401600 | elapsed time per iteration (s): 0.08 | learning rate: 3.241E-05 | global batch size: 256 | lm loss: 4.504018E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.247 | TFLOPs: 11.72 | 7: iteration 144460/ 173500 | consumed samples: 36981760 | consumed tokens: 75738644480 | elapsed time per iteration (s): 0.08 | learning rate: 3.240E-05 | global batch size: 256 | lm loss: 4.505410E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.773 | TFLOPs: 11.78 | 7: iteration 144470/ 173500 | consumed samples: 36984320 | consumed tokens: 75743887360 | elapsed time per iteration (s): 0.08 | learning rate: 3.239E-05 | global batch size: 256 | lm loss: 4.511180E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.986 | TFLOPs: 11.74 | 7: iteration 144480/ 173500 | consumed samples: 36986880 | consumed tokens: 75749130240 | elapsed time per iteration (s): 0.09 | learning rate: 3.238E-05 | global batch size: 256 | lm loss: 4.491437E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.708 | TFLOPs: 10.23 | 7: iteration 144490/ 173500 | consumed samples: 36989440 | consumed tokens: 75754373120 | elapsed time per iteration (s): 0.08 | learning rate: 3.237E-05 | global batch size: 256 | lm loss: 4.494695E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.839 | TFLOPs: 11.90 | 7: iteration 144500/ 173500 | consumed samples: 36992000 | consumed tokens: 75759616000 | elapsed time per iteration (s): 0.08 | learning rate: 3.237E-05 | global batch size: 256 | lm loss: 4.512286E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.823 | TFLOPs: 12.05 | 7: iteration 144510/ 173500 | consumed samples: 36994560 | consumed tokens: 75764858880 | elapsed time per iteration (s): 0.08 | learning rate: 3.236E-05 | global batch size: 256 | lm loss: 4.521077E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.232 | TFLOPs: 12.01 | 7: iteration 144520/ 173500 | consumed samples: 36997120 | consumed tokens: 75770101760 | elapsed time per iteration (s): 0.08 | learning rate: 3.235E-05 | global batch size: 256 | lm loss: 4.515610E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.956 | TFLOPs: 12.07 | 7: iteration 144530/ 173500 | consumed samples: 36999680 | consumed tokens: 75775344640 | elapsed time per iteration (s): 0.08 | learning rate: 3.234E-05 | global batch size: 256 | lm loss: 4.512262E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.810 | TFLOPs: 12.03 | 7: iteration 144540/ 173500 | consumed samples: 37002240 | consumed tokens: 75780587520 | elapsed time per iteration (s): 0.08 | learning rate: 3.233E-05 | global batch size: 256 | lm loss: 4.505554E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.937 | TFLOPs: 11.95 | 7: iteration 144550/ 173500 | consumed samples: 37004800 | consumed tokens: 75785830400 | elapsed time per iteration (s): 0.08 | learning rate: 3.232E-05 | global batch size: 256 | lm loss: 4.508540E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.765 | TFLOPs: 11.69 | 7: iteration 144560/ 173500 | consumed samples: 37007360 | consumed tokens: 75791073280 | elapsed time per iteration (s): 0.08 | learning rate: 3.232E-05 | global batch size: 256 | lm loss: 4.506305E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.205 | TFLOPs: 12.07 | 7: iteration 144570/ 173500 | consumed samples: 37009920 | consumed tokens: 75796316160 | elapsed time per iteration (s): 0.08 | learning rate: 3.231E-05 | global batch size: 256 | lm loss: 4.504026E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.291 | TFLOPs: 12.04 | 7: iteration 144580/ 173500 | consumed samples: 37012480 | consumed tokens: 75801559040 | elapsed time per iteration (s): 0.08 | learning rate: 3.230E-05 | global batch size: 256 | lm loss: 4.512320E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.212 | TFLOPs: 12.02 | 7: iteration 144590/ 173500 | consumed samples: 37015040 | consumed tokens: 75806801920 | elapsed time per iteration (s): 0.08 | learning rate: 3.229E-05 | global batch size: 256 | lm loss: 4.492840E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.564 | TFLOPs: 12.01 | 7: iteration 144600/ 173500 | consumed samples: 37017600 | consumed tokens: 75812044800 | elapsed time per iteration (s): 0.08 | learning rate: 3.228E-05 | global batch size: 256 | lm loss: 4.521966E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.420 | TFLOPs: 11.90 | 7: iteration 144610/ 173500 | consumed samples: 37020160 | consumed tokens: 75817287680 | elapsed time per iteration (s): 0.08 | learning rate: 3.228E-05 | global batch size: 256 | lm loss: 4.514561E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.917 | TFLOPs: 11.90 | 7: iteration 144620/ 173500 | consumed samples: 37022720 | consumed tokens: 75822530560 | elapsed time per iteration (s): 0.08 | learning rate: 3.227E-05 | global batch size: 256 | lm loss: 4.505310E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.676 | TFLOPs: 11.83 | 7: iteration 144630/ 173500 | consumed samples: 37025280 | consumed tokens: 75827773440 | elapsed time per iteration (s): 0.08 | learning rate: 3.226E-05 | global batch size: 256 | lm loss: 4.501376E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.548 | TFLOPs: 11.86 | 7: iteration 144640/ 173500 | consumed samples: 37027840 | consumed tokens: 75833016320 | elapsed time per iteration (s): 0.08 | learning rate: 3.225E-05 | global batch size: 256 | lm loss: 4.511236E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.894 | TFLOPs: 11.85 | 7: iteration 144650/ 173500 | consumed samples: 37030400 | consumed tokens: 75838259200 | elapsed time per iteration (s): 0.08 | learning rate: 3.224E-05 | global batch size: 256 | lm loss: 4.501527E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.954 | TFLOPs: 11.84 | 7: iteration 144660/ 173500 | consumed samples: 37032960 | consumed tokens: 75843502080 | elapsed time per iteration (s): 0.08 | learning rate: 3.223E-05 | global batch size: 256 | lm loss: 4.505364E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.173 | TFLOPs: 11.82 | 7: iteration 144670/ 173500 | consumed samples: 37035520 | consumed tokens: 75848744960 | elapsed time per iteration (s): 0.08 | learning rate: 3.223E-05 | global batch size: 256 | lm loss: 4.505795E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.640 | TFLOPs: 11.90 | 7: iteration 144680/ 173500 | consumed samples: 37038080 | consumed tokens: 75853987840 | elapsed time per iteration (s): 0.08 | learning rate: 3.222E-05 | global batch size: 256 | lm loss: 4.502528E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.816 | TFLOPs: 11.89 | 7: iteration 144690/ 173500 | consumed samples: 37040640 | consumed tokens: 75859230720 | elapsed time per iteration (s): 0.08 | learning rate: 3.221E-05 | global batch size: 256 | lm loss: 4.508927E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.675 | TFLOPs: 11.92 | 7: iteration 144700/ 173500 | consumed samples: 37043200 | consumed tokens: 75864473600 | elapsed time per iteration (s): 0.08 | learning rate: 3.220E-05 | global batch size: 256 | lm loss: 4.502033E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.348 | TFLOPs: 11.94 | 7: iteration 144710/ 173500 | consumed samples: 37045760 | consumed tokens: 75869716480 | elapsed time per iteration (s): 0.08 | learning rate: 3.219E-05 | global batch size: 256 | lm loss: 4.499680E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.493 | TFLOPs: 11.56 | 7: iteration 144720/ 173500 | consumed samples: 37048320 | consumed tokens: 75874959360 | elapsed time per iteration (s): 0.11 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 4.505597E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2433.579 | TFLOPs: 9.05 | 7: iteration 144730/ 173500 | consumed samples: 37050880 | consumed tokens: 75880202240 | elapsed time per iteration (s): 0.10 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 4.503495E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.468 | TFLOPs: 9.08 | 7: iteration 144740/ 173500 | consumed samples: 37053440 | consumed tokens: 75885445120 | elapsed time per iteration (s): 0.10 | learning rate: 3.217E-05 | global batch size: 256 | lm loss: 4.512688E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2624.382 | TFLOPs: 9.76 | 7: iteration 144750/ 173500 | consumed samples: 37056000 | consumed tokens: 75890688000 | elapsed time per iteration (s): 0.08 | learning rate: 3.216E-05 | global batch size: 256 | lm loss: 4.508480E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.624 | TFLOPs: 11.86 | 7: iteration 144760/ 173500 | consumed samples: 37058560 | consumed tokens: 75895930880 | elapsed time per iteration (s): 0.08 | learning rate: 3.215E-05 | global batch size: 256 | lm loss: 4.503885E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.456 | TFLOPs: 11.29 | 7: iteration 144770/ 173500 | consumed samples: 37061120 | consumed tokens: 75901173760 | elapsed time per iteration (s): 0.10 | learning rate: 3.214E-05 | global batch size: 256 | lm loss: 4.511308E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2501.614 | TFLOPs: 9.30 | 7: iteration 144780/ 173500 | consumed samples: 37063680 | consumed tokens: 75906416640 | elapsed time per iteration (s): 0.08 | learning rate: 3.213E-05 | global batch size: 256 | lm loss: 4.510909E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.552 | TFLOPs: 11.92 | 7: iteration 144790/ 173500 | consumed samples: 37066240 | consumed tokens: 75911659520 | elapsed time per iteration (s): 0.08 | learning rate: 3.213E-05 | global batch size: 256 | lm loss: 4.516703E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.598 | TFLOPs: 11.83 | 7: iteration 144800/ 173500 | consumed samples: 37068800 | consumed tokens: 75916902400 | elapsed time per iteration (s): 0.08 | learning rate: 3.212E-05 | global batch size: 256 | lm loss: 4.511398E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.753 | TFLOPs: 12.00 | 7: iteration 144810/ 173500 | consumed samples: 37071360 | consumed tokens: 75922145280 | elapsed time per iteration (s): 0.08 | learning rate: 3.211E-05 | global batch size: 256 | lm loss: 4.518314E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3079.326 | TFLOPs: 11.45 | 7: iteration 144820/ 173500 | consumed samples: 37073920 | consumed tokens: 75927388160 | elapsed time per iteration (s): 0.12 | learning rate: 3.210E-05 | global batch size: 256 | lm loss: 4.493674E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2119.292 | TFLOPs: 7.88 | 7: iteration 144830/ 173500 | consumed samples: 37076480 | consumed tokens: 75932631040 | elapsed time per iteration (s): 0.08 | learning rate: 3.209E-05 | global batch size: 256 | lm loss: 4.510443E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.560 | TFLOPs: 11.97 | 7: iteration 144840/ 173500 | consumed samples: 37079040 | consumed tokens: 75937873920 | elapsed time per iteration (s): 0.08 | learning rate: 3.208E-05 | global batch size: 256 | lm loss: 4.525318E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.295 | TFLOPs: 11.98 | 7: iteration 144850/ 173500 | consumed samples: 37081600 | consumed tokens: 75943116800 | elapsed time per iteration (s): 0.08 | learning rate: 3.208E-05 | global batch size: 256 | lm loss: 4.502112E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.344 | TFLOPs: 11.85 | 7: iteration 144860/ 173500 | consumed samples: 37084160 | consumed tokens: 75948359680 | elapsed time per iteration (s): 0.08 | learning rate: 3.207E-05 | global batch size: 256 | lm loss: 4.501500E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.806 | TFLOPs: 11.94 | 7: iteration 144870/ 173500 | consumed samples: 37086720 | consumed tokens: 75953602560 | elapsed time per iteration (s): 0.08 | learning rate: 3.206E-05 | global batch size: 256 | lm loss: 4.516085E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.949 | TFLOPs: 11.95 | 7: iteration 144880/ 173500 | consumed samples: 37089280 | consumed tokens: 75958845440 | elapsed time per iteration (s): 0.08 | learning rate: 3.205E-05 | global batch size: 256 | lm loss: 4.499392E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.260 | TFLOPs: 11.95 | 7: iteration 144890/ 173500 | consumed samples: 37091840 | consumed tokens: 75964088320 | elapsed time per iteration (s): 0.08 | learning rate: 3.204E-05 | global batch size: 256 | lm loss: 4.509086E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.776 | TFLOPs: 11.87 | 7: iteration 144900/ 173500 | consumed samples: 37094400 | consumed tokens: 75969331200 | elapsed time per iteration (s): 0.08 | learning rate: 3.204E-05 | global batch size: 256 | lm loss: 4.520537E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.201 | TFLOPs: 11.87 | 7: iteration 144910/ 173500 | consumed samples: 37096960 | consumed tokens: 75974574080 | elapsed time per iteration (s): 0.08 | learning rate: 3.203E-05 | global batch size: 256 | lm loss: 4.510337E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.711 | TFLOPs: 11.89 | 7: iteration 144920/ 173500 | consumed samples: 37099520 | consumed tokens: 75979816960 | elapsed time per iteration (s): 0.08 | learning rate: 3.202E-05 | global batch size: 256 | lm loss: 4.508971E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.943 | TFLOPs: 11.89 | 7: iteration 144930/ 173500 | consumed samples: 37102080 | consumed tokens: 75985059840 | elapsed time per iteration (s): 0.08 | learning rate: 3.201E-05 | global batch size: 256 | lm loss: 4.498154E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.021 | TFLOPs: 11.40 | 7: iteration 144940/ 173500 | consumed samples: 37104640 | consumed tokens: 75990302720 | elapsed time per iteration (s): 0.08 | learning rate: 3.200E-05 | global batch size: 256 | lm loss: 4.511970E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.319 | TFLOPs: 11.84 | 7: iteration 144950/ 173500 | consumed samples: 37107200 | consumed tokens: 75995545600 | elapsed time per iteration (s): 0.09 | learning rate: 3.199E-05 | global batch size: 256 | lm loss: 4.504700E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.638 | TFLOPs: 10.78 | 7: iteration 144960/ 173500 | consumed samples: 37109760 | consumed tokens: 76000788480 | elapsed time per iteration (s): 0.10 | learning rate: 3.199E-05 | global batch size: 256 | lm loss: 4.516718E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2570.168 | TFLOPs: 9.56 | 7: iteration 144970/ 173500 | consumed samples: 37112320 | consumed tokens: 76006031360 | elapsed time per iteration (s): 0.10 | learning rate: 3.198E-05 | global batch size: 256 | lm loss: 4.506348E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2500.457 | TFLOPs: 9.30 | 7: iteration 144980/ 173500 | consumed samples: 37114880 | consumed tokens: 76011274240 | elapsed time per iteration (s): 0.12 | learning rate: 3.197E-05 | global batch size: 256 | lm loss: 4.496775E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2144.001 | TFLOPs: 7.97 | 7: iteration 144990/ 173500 | consumed samples: 37117440 | consumed tokens: 76016517120 | elapsed time per iteration (s): 0.13 | learning rate: 3.196E-05 | global batch size: 256 | lm loss: 4.504529E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2018.805 | TFLOPs: 7.51 | 7: iteration 145000/ 173500 | consumed samples: 37120000 | consumed tokens: 76021760000 | elapsed time per iteration (s): 0.10 | learning rate: 3.195E-05 | global batch size: 256 | lm loss: 4.503247E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.745 | TFLOPs: 9.41 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 145000 | lm loss value: 4.412455E+00 | lm loss PPL: 8.247169E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 145000 to checkpoints_14m91b100m 0: [2023-03-17 03:49:10,624] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step145000 is begin to save! 0: [2023-03-17 03:49:10,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:49:10,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:49:10,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:49:10,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:49:10,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:49:10,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:49:10,661] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:49:10,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:49:10,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:49:10,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:49:10,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:49:10,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:49:10,668] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step145000/mp_rank_00_model_states.pt 0: [2023-03-17 03:49:10,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:49:10,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:49:10,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:49:10,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 7: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 2: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:49:10,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 5: [2023-03-17 03:49:10,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 4: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 3: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 6: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 1: [2023-03-17 03:49:10,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step145000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:49:10,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step145000 is ready now! 0: successfully saved checkpoint at iteration 145000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.84 7: iteration 145010/ 173500 | consumed samples: 37122560 | consumed tokens: 76027002880 | elapsed time per iteration (s): 0.10 | learning rate: 3.195E-05 | global batch size: 256 | lm loss: 4.513116E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2591.207 | TFLOPs: 9.64 | 7: iteration 145020/ 173500 | consumed samples: 37125120 | consumed tokens: 76032245760 | elapsed time per iteration (s): 0.08 | learning rate: 3.194E-05 | global batch size: 256 | lm loss: 4.496606E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.583 | TFLOPs: 11.79 | 7: iteration 145030/ 173500 | consumed samples: 37127680 | consumed tokens: 76037488640 | elapsed time per iteration (s): 0.08 | learning rate: 3.193E-05 | global batch size: 256 | lm loss: 4.511002E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.103 | TFLOPs: 11.48 | 7: iteration 145040/ 173500 | consumed samples: 37130240 | consumed tokens: 76042731520 | elapsed time per iteration (s): 0.08 | learning rate: 3.192E-05 | global batch size: 256 | lm loss: 4.494841E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.860 | TFLOPs: 11.81 | 7: iteration 145050/ 173500 | consumed samples: 37132800 | consumed tokens: 76047974400 | elapsed time per iteration (s): 0.08 | learning rate: 3.191E-05 | global batch size: 256 | lm loss: 4.500772E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.506 | TFLOPs: 11.75 | 7: iteration 145060/ 173500 | consumed samples: 37135360 | consumed tokens: 76053217280 | elapsed time per iteration (s): 0.09 | learning rate: 3.190E-05 | global batch size: 256 | lm loss: 4.512697E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2744.160 | TFLOPs: 10.21 | 7: iteration 145070/ 173500 | consumed samples: 37137920 | consumed tokens: 76058460160 | elapsed time per iteration (s): 0.13 | learning rate: 3.190E-05 | global batch size: 256 | lm loss: 4.515109E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1955.163 | TFLOPs: 7.27 | 7: iteration 145080/ 173500 | consumed samples: 37140480 | consumed tokens: 76063703040 | elapsed time per iteration (s): 0.09 | learning rate: 3.189E-05 | global batch size: 256 | lm loss: 4.518707E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2826.665 | TFLOPs: 10.51 | 7: iteration 145090/ 173500 | consumed samples: 37143040 | consumed tokens: 76068945920 | elapsed time per iteration (s): 0.08 | learning rate: 3.188E-05 | global batch size: 256 | lm loss: 4.509891E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.258 | TFLOPs: 11.84 | 7: iteration 145100/ 173500 | consumed samples: 37145600 | consumed tokens: 76074188800 | elapsed time per iteration (s): 0.08 | learning rate: 3.187E-05 | global batch size: 256 | lm loss: 4.508158E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.501 | TFLOPs: 11.84 | 7: iteration 145110/ 173500 | consumed samples: 37148160 | consumed tokens: 76079431680 | elapsed time per iteration (s): 0.08 | learning rate: 3.186E-05 | global batch size: 256 | lm loss: 4.516368E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.085 | TFLOPs: 11.81 | 7: iteration 145120/ 173500 | consumed samples: 37150720 | consumed tokens: 76084674560 | elapsed time per iteration (s): 0.08 | learning rate: 3.186E-05 | global batch size: 256 | lm loss: 4.508848E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.854 | TFLOPs: 11.79 | 7: iteration 145130/ 173500 | consumed samples: 37153280 | consumed tokens: 76089917440 | elapsed time per iteration (s): 0.08 | learning rate: 3.185E-05 | global batch size: 256 | lm loss: 4.515229E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.252 | TFLOPs: 11.83 | 7: iteration 145140/ 173500 | consumed samples: 37155840 | consumed tokens: 76095160320 | elapsed time per iteration (s): 0.08 | learning rate: 3.184E-05 | global batch size: 256 | lm loss: 4.517915E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.853 | TFLOPs: 11.81 | 7: iteration 145150/ 173500 | consumed samples: 37158400 | consumed tokens: 76100403200 | elapsed time per iteration (s): 0.08 | learning rate: 3.183E-05 | global batch size: 256 | lm loss: 4.512488E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.469 | TFLOPs: 11.56 | 7: iteration 145160/ 173500 | consumed samples: 37160960 | consumed tokens: 76105646080 | elapsed time per iteration (s): 0.08 | learning rate: 3.182E-05 | global batch size: 256 | lm loss: 4.514706E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.388 | TFLOPs: 11.82 | 7: iteration 145170/ 173500 | consumed samples: 37163520 | consumed tokens: 76110888960 | elapsed time per iteration (s): 0.08 | learning rate: 3.181E-05 | global batch size: 256 | lm loss: 4.510474E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.992 | TFLOPs: 11.83 | 7: iteration 145180/ 173500 | consumed samples: 37166080 | consumed tokens: 76116131840 | elapsed time per iteration (s): 0.08 | learning rate: 3.181E-05 | global batch size: 256 | lm loss: 4.504273E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.854 | TFLOPs: 11.81 | 7: iteration 145190/ 173500 | consumed samples: 37168640 | consumed tokens: 76121374720 | elapsed time per iteration (s): 0.08 | learning rate: 3.180E-05 | global batch size: 256 | lm loss: 4.510207E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.370 | TFLOPs: 11.84 | 7: iteration 145200/ 173500 | consumed samples: 37171200 | consumed tokens: 76126617600 | elapsed time per iteration (s): 0.08 | learning rate: 3.179E-05 | global batch size: 256 | lm loss: 4.499474E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.268 | TFLOPs: 11.79 | 7: iteration 145210/ 173500 | consumed samples: 37173760 | consumed tokens: 76131860480 | elapsed time per iteration (s): 0.08 | learning rate: 3.178E-05 | global batch size: 256 | lm loss: 4.498537E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.075 | TFLOPs: 11.83 | 7: iteration 145220/ 173500 | consumed samples: 37176320 | consumed tokens: 76137103360 | elapsed time per iteration (s): 0.08 | learning rate: 3.177E-05 | global batch size: 256 | lm loss: 4.511522E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.950 | TFLOPs: 11.78 | 7: iteration 145230/ 173500 | consumed samples: 37178880 | consumed tokens: 76142346240 | elapsed time per iteration (s): 0.08 | learning rate: 3.177E-05 | global batch size: 256 | lm loss: 4.508170E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.668 | TFLOPs: 11.87 | 7: iteration 145240/ 173500 | consumed samples: 37181440 | consumed tokens: 76147589120 | elapsed time per iteration (s): 0.08 | learning rate: 3.176E-05 | global batch size: 256 | lm loss: 4.486336E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.804 | TFLOPs: 11.88 | 7: iteration 145250/ 173500 | consumed samples: 37184000 | consumed tokens: 76152832000 | elapsed time per iteration (s): 0.08 | learning rate: 3.175E-05 | global batch size: 256 | lm loss: 4.515519E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.913 | TFLOPs: 11.90 | 7: iteration 145260/ 173500 | consumed samples: 37186560 | consumed tokens: 76158074880 | elapsed time per iteration (s): 0.08 | learning rate: 3.174E-05 | global batch size: 256 | lm loss: 4.497676E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.669 | TFLOPs: 11.67 | 7: iteration 145270/ 173500 | consumed samples: 37189120 | consumed tokens: 76163317760 | elapsed time per iteration (s): 0.09 | learning rate: 3.173E-05 | global batch size: 256 | lm loss: 4.509438E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.087 | TFLOPs: 10.90 | 7: iteration 145280/ 173500 | consumed samples: 37191680 | consumed tokens: 76168560640 | elapsed time per iteration (s): 0.08 | learning rate: 3.172E-05 | global batch size: 256 | lm loss: 4.512338E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.661 | TFLOPs: 11.87 | 7: iteration 145290/ 173500 | consumed samples: 37194240 | consumed tokens: 76173803520 | elapsed time per iteration (s): 0.08 | learning rate: 3.172E-05 | global batch size: 256 | lm loss: 4.503286E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.881 | TFLOPs: 11.59 | 7: iteration 145300/ 173500 | consumed samples: 37196800 | consumed tokens: 76179046400 | elapsed time per iteration (s): 0.08 | learning rate: 3.171E-05 | global batch size: 256 | lm loss: 4.501189E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.840 | TFLOPs: 11.88 | 7: iteration 145310/ 173500 | consumed samples: 37199360 | consumed tokens: 76184289280 | elapsed time per iteration (s): 0.08 | learning rate: 3.170E-05 | global batch size: 256 | lm loss: 4.500340E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.303 | TFLOPs: 11.87 | 7: iteration 145320/ 173500 | consumed samples: 37201920 | consumed tokens: 76189532160 | elapsed time per iteration (s): 0.08 | learning rate: 3.169E-05 | global batch size: 256 | lm loss: 4.504480E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.080 | TFLOPs: 11.46 | 7: iteration 145330/ 173500 | consumed samples: 37204480 | consumed tokens: 76194775040 | elapsed time per iteration (s): 0.08 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 4.507499E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.549 | TFLOPs: 11.85 | 7: iteration 145340/ 173500 | consumed samples: 37207040 | consumed tokens: 76200017920 | elapsed time per iteration (s): 0.08 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 4.496037E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.860 | TFLOPs: 11.62 | 7: iteration 145350/ 173500 | consumed samples: 37209600 | consumed tokens: 76205260800 | elapsed time per iteration (s): 0.10 | learning rate: 3.167E-05 | global batch size: 256 | lm loss: 4.499916E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2463.371 | TFLOPs: 9.16 | 7: iteration 145360/ 173500 | consumed samples: 37212160 | consumed tokens: 76210503680 | elapsed time per iteration (s): 0.11 | learning rate: 3.166E-05 | global batch size: 256 | lm loss: 4.499715E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.231 | TFLOPs: 8.75 | 7: iteration 145370/ 173500 | consumed samples: 37214720 | consumed tokens: 76215746560 | elapsed time per iteration (s): 0.09 | learning rate: 3.165E-05 | global batch size: 256 | lm loss: 4.499680E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2889.978 | TFLOPs: 10.75 | 7: iteration 145380/ 173500 | consumed samples: 37217280 | consumed tokens: 76220989440 | elapsed time per iteration (s): 0.08 | learning rate: 3.164E-05 | global batch size: 256 | lm loss: 4.507066E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.151 | TFLOPs: 11.89 | 7: iteration 145390/ 173500 | consumed samples: 37219840 | consumed tokens: 76226232320 | elapsed time per iteration (s): 0.09 | learning rate: 3.164E-05 | global batch size: 256 | lm loss: 4.494608E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2884.741 | TFLOPs: 10.73 | 7: iteration 145400/ 173500 | consumed samples: 37222400 | consumed tokens: 76231475200 | elapsed time per iteration (s): 0.09 | learning rate: 3.163E-05 | global batch size: 256 | lm loss: 4.494506E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.766 | TFLOPs: 11.13 | 7: iteration 145410/ 173500 | consumed samples: 37224960 | consumed tokens: 76236718080 | elapsed time per iteration (s): 0.09 | learning rate: 3.162E-05 | global batch size: 256 | lm loss: 4.489483E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2968.224 | TFLOPs: 11.04 | 7: iteration 145420/ 173500 | consumed samples: 37227520 | consumed tokens: 76241960960 | elapsed time per iteration (s): 0.08 | learning rate: 3.161E-05 | global batch size: 256 | lm loss: 4.502516E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.389 | TFLOPs: 11.71 | 7: iteration 145430/ 173500 | consumed samples: 37230080 | consumed tokens: 76247203840 | elapsed time per iteration (s): 0.08 | learning rate: 3.160E-05 | global batch size: 256 | lm loss: 4.512834E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.416 | TFLOPs: 11.58 | 7: iteration 145440/ 173500 | consumed samples: 37232640 | consumed tokens: 76252446720 | elapsed time per iteration (s): 0.10 | learning rate: 3.160E-05 | global batch size: 256 | lm loss: 4.516835E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2566.801 | TFLOPs: 9.55 | 7: iteration 145450/ 173500 | consumed samples: 37235200 | consumed tokens: 76257689600 | elapsed time per iteration (s): 0.08 | learning rate: 3.159E-05 | global batch size: 256 | lm loss: 4.502840E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.495 | TFLOPs: 11.68 | 7: iteration 145460/ 173500 | consumed samples: 37237760 | consumed tokens: 76262932480 | elapsed time per iteration (s): 0.08 | learning rate: 3.158E-05 | global batch size: 256 | lm loss: 4.503497E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.212 | TFLOPs: 11.23 | 7: iteration 145470/ 173500 | consumed samples: 37240320 | consumed tokens: 76268175360 | elapsed time per iteration (s): 0.08 | learning rate: 3.157E-05 | global batch size: 256 | lm loss: 4.497323E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.390 | TFLOPs: 11.56 | 7: iteration 145480/ 173500 | consumed samples: 37242880 | consumed tokens: 76273418240 | elapsed time per iteration (s): 0.08 | learning rate: 3.156E-05 | global batch size: 256 | lm loss: 4.501229E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.347 | TFLOPs: 11.68 | 7: iteration 145490/ 173500 | consumed samples: 37245440 | consumed tokens: 76278661120 | elapsed time per iteration (s): 0.08 | learning rate: 3.155E-05 | global batch size: 256 | lm loss: 4.505312E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.773 | TFLOPs: 11.85 | 7: iteration 145500/ 173500 | consumed samples: 37248000 | consumed tokens: 76283904000 | elapsed time per iteration (s): 0.08 | learning rate: 3.155E-05 | global batch size: 256 | lm loss: 4.503598E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.044 | TFLOPs: 11.81 | 7: iteration 145510/ 173500 | consumed samples: 37250560 | consumed tokens: 76289146880 | elapsed time per iteration (s): 0.08 | learning rate: 3.154E-05 | global batch size: 256 | lm loss: 4.512881E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.145 | TFLOPs: 11.85 | 7: iteration 145520/ 173500 | consumed samples: 37253120 | consumed tokens: 76294389760 | elapsed time per iteration (s): 0.08 | learning rate: 3.153E-05 | global batch size: 256 | lm loss: 4.505754E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.868 | TFLOPs: 11.89 | 7: iteration 145530/ 173500 | consumed samples: 37255680 | consumed tokens: 76299632640 | elapsed time per iteration (s): 0.08 | learning rate: 3.152E-05 | global batch size: 256 | lm loss: 4.515625E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.610 | TFLOPs: 11.91 | 7: iteration 145540/ 173500 | consumed samples: 37258240 | consumed tokens: 76304875520 | elapsed time per iteration (s): 0.08 | learning rate: 3.151E-05 | global batch size: 256 | lm loss: 4.506349E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.322 | TFLOPs: 11.88 | 7: iteration 145550/ 173500 | consumed samples: 37260800 | consumed tokens: 76310118400 | elapsed time per iteration (s): 0.08 | learning rate: 3.151E-05 | global batch size: 256 | lm loss: 4.505762E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.663 | TFLOPs: 11.92 | 7: iteration 145560/ 173500 | consumed samples: 37263360 | consumed tokens: 76315361280 | elapsed time per iteration (s): 0.08 | learning rate: 3.150E-05 | global batch size: 256 | lm loss: 4.502710E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.213 | TFLOPs: 11.88 | 7: iteration 145570/ 173500 | consumed samples: 37265920 | consumed tokens: 76320604160 | elapsed time per iteration (s): 0.09 | learning rate: 3.149E-05 | global batch size: 256 | lm loss: 4.520583E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2789.880 | TFLOPs: 10.38 | 7: iteration 145580/ 173500 | consumed samples: 37268480 | consumed tokens: 76325847040 | elapsed time per iteration (s): 0.08 | learning rate: 3.148E-05 | global batch size: 256 | lm loss: 4.500270E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.522 | TFLOPs: 11.95 | 7: iteration 145590/ 173500 | consumed samples: 37271040 | consumed tokens: 76331089920 | elapsed time per iteration (s): 0.08 | learning rate: 3.147E-05 | global batch size: 256 | lm loss: 4.499635E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.860 | TFLOPs: 11.92 | 7: iteration 145600/ 173500 | consumed samples: 37273600 | consumed tokens: 76336332800 | elapsed time per iteration (s): 0.08 | learning rate: 3.147E-05 | global batch size: 256 | lm loss: 4.518602E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.238 | TFLOPs: 11.91 | 7: iteration 145610/ 173500 | consumed samples: 37276160 | consumed tokens: 76341575680 | elapsed time per iteration (s): 0.08 | learning rate: 3.146E-05 | global batch size: 256 | lm loss: 4.498771E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.294 | TFLOPs: 11.68 | 7: iteration 145620/ 173500 | consumed samples: 37278720 | consumed tokens: 76346818560 | elapsed time per iteration (s): 0.08 | learning rate: 3.145E-05 | global batch size: 256 | lm loss: 4.503056E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.700 | TFLOPs: 11.62 | 7: iteration 145630/ 173500 | consumed samples: 37281280 | consumed tokens: 76352061440 | elapsed time per iteration (s): 0.08 | learning rate: 3.144E-05 | global batch size: 256 | lm loss: 4.505103E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.092 | TFLOPs: 11.46 | 7: iteration 145640/ 173500 | consumed samples: 37283840 | consumed tokens: 76357304320 | elapsed time per iteration (s): 0.08 | learning rate: 3.143E-05 | global batch size: 256 | lm loss: 4.506403E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.119 | TFLOPs: 11.59 | 7: iteration 145650/ 173500 | consumed samples: 37286400 | consumed tokens: 76362547200 | elapsed time per iteration (s): 0.08 | learning rate: 3.143E-05 | global batch size: 256 | lm loss: 4.507359E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.764 | TFLOPs: 11.65 | 7: iteration 145660/ 173500 | consumed samples: 37288960 | consumed tokens: 76367790080 | elapsed time per iteration (s): 0.08 | learning rate: 3.142E-05 | global batch size: 256 | lm loss: 4.517695E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.126 | TFLOPs: 11.35 | 7: iteration 145670/ 173500 | consumed samples: 37291520 | consumed tokens: 76373032960 | elapsed time per iteration (s): 0.08 | learning rate: 3.141E-05 | global batch size: 256 | lm loss: 4.510048E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3063.353 | TFLOPs: 11.39 | 7: iteration 145680/ 173500 | consumed samples: 37294080 | consumed tokens: 76378275840 | elapsed time per iteration (s): 0.08 | learning rate: 3.140E-05 | global batch size: 256 | lm loss: 4.500705E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.327 | TFLOPs: 11.86 | 7: iteration 145690/ 173500 | consumed samples: 37296640 | consumed tokens: 76383518720 | elapsed time per iteration (s): 0.08 | learning rate: 3.139E-05 | global batch size: 256 | lm loss: 4.491768E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.751 | TFLOPs: 11.90 | 7: iteration 145700/ 173500 | consumed samples: 37299200 | consumed tokens: 76388761600 | elapsed time per iteration (s): 0.08 | learning rate: 3.139E-05 | global batch size: 256 | lm loss: 4.513295E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.989 | TFLOPs: 11.59 | 7: iteration 145710/ 173500 | consumed samples: 37301760 | consumed tokens: 76394004480 | elapsed time per iteration (s): 0.08 | learning rate: 3.138E-05 | global batch size: 256 | lm loss: 4.507462E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.428 | TFLOPs: 11.67 | 7: iteration 145720/ 173500 | consumed samples: 37304320 | consumed tokens: 76399247360 | elapsed time per iteration (s): 0.08 | learning rate: 3.137E-05 | global batch size: 256 | lm loss: 4.510069E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.470 | TFLOPs: 11.77 | 7: iteration 145730/ 173500 | consumed samples: 37306880 | consumed tokens: 76404490240 | elapsed time per iteration (s): 0.08 | learning rate: 3.136E-05 | global batch size: 256 | lm loss: 4.507657E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.772 | TFLOPs: 11.85 | 7: iteration 145740/ 173500 | consumed samples: 37309440 | consumed tokens: 76409733120 | elapsed time per iteration (s): 0.08 | learning rate: 3.135E-05 | global batch size: 256 | lm loss: 4.500145E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.146 | TFLOPs: 11.86 | 7: iteration 145750/ 173500 | consumed samples: 37312000 | consumed tokens: 76414976000 | elapsed time per iteration (s): 0.09 | learning rate: 3.135E-05 | global batch size: 256 | lm loss: 4.507140E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2942.835 | TFLOPs: 10.95 | 7: iteration 145760/ 173500 | consumed samples: 37314560 | consumed tokens: 76420218880 | elapsed time per iteration (s): 0.08 | learning rate: 3.134E-05 | global batch size: 256 | lm loss: 4.500240E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.948 | TFLOPs: 11.80 | 7: iteration 145770/ 173500 | consumed samples: 37317120 | consumed tokens: 76425461760 | elapsed time per iteration (s): 0.08 | learning rate: 3.133E-05 | global batch size: 256 | lm loss: 4.498994E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.347 | TFLOPs: 11.87 | 7: iteration 145780/ 173500 | consumed samples: 37319680 | consumed tokens: 76430704640 | elapsed time per iteration (s): 0.08 | learning rate: 3.132E-05 | global batch size: 256 | lm loss: 4.506414E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.541 | TFLOPs: 11.87 | 7: iteration 145790/ 173500 | consumed samples: 37322240 | consumed tokens: 76435947520 | elapsed time per iteration (s): 0.08 | learning rate: 3.131E-05 | global batch size: 256 | lm loss: 4.500039E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.447 | TFLOPs: 11.87 | 7: iteration 145800/ 173500 | consumed samples: 37324800 | consumed tokens: 76441190400 | elapsed time per iteration (s): 0.08 | learning rate: 3.131E-05 | global batch size: 256 | lm loss: 4.509238E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.102 | TFLOPs: 11.77 | 7: iteration 145810/ 173500 | consumed samples: 37327360 | consumed tokens: 76446433280 | elapsed time per iteration (s): 0.08 | learning rate: 3.130E-05 | global batch size: 256 | lm loss: 4.516724E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.861 | TFLOPs: 11.86 | 7: iteration 145820/ 173500 | consumed samples: 37329920 | consumed tokens: 76451676160 | elapsed time per iteration (s): 0.08 | learning rate: 3.129E-05 | global batch size: 256 | lm loss: 4.506017E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.987 | TFLOPs: 11.86 | 7: iteration 145830/ 173500 | consumed samples: 37332480 | consumed tokens: 76456919040 | elapsed time per iteration (s): 0.08 | learning rate: 3.128E-05 | global batch size: 256 | lm loss: 4.497224E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.302 | TFLOPs: 11.86 | 7: iteration 145840/ 173500 | consumed samples: 37335040 | consumed tokens: 76462161920 | elapsed time per iteration (s): 0.08 | learning rate: 3.127E-05 | global batch size: 256 | lm loss: 4.509007E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.250 | TFLOPs: 11.86 | 7: iteration 145850/ 173500 | consumed samples: 37337600 | consumed tokens: 76467404800 | elapsed time per iteration (s): 0.08 | learning rate: 3.127E-05 | global batch size: 256 | lm loss: 4.497191E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.413 | TFLOPs: 11.89 | 7: iteration 145860/ 173500 | consumed samples: 37340160 | consumed tokens: 76472647680 | elapsed time per iteration (s): 0.08 | learning rate: 3.126E-05 | global batch size: 256 | lm loss: 4.493438E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.777 | TFLOPs: 11.57 | 7: iteration 145870/ 173500 | consumed samples: 37342720 | consumed tokens: 76477890560 | elapsed time per iteration (s): 0.08 | learning rate: 3.125E-05 | global batch size: 256 | lm loss: 4.500402E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.135 | TFLOPs: 11.87 | 7: iteration 145880/ 173500 | consumed samples: 37345280 | consumed tokens: 76483133440 | elapsed time per iteration (s): 0.08 | learning rate: 3.124E-05 | global batch size: 256 | lm loss: 4.500652E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.298 | TFLOPs: 11.87 | 7: iteration 145890/ 173500 | consumed samples: 37347840 | consumed tokens: 76488376320 | elapsed time per iteration (s): 0.08 | learning rate: 3.123E-05 | global batch size: 256 | lm loss: 4.499060E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.442 | TFLOPs: 11.87 | 7: iteration 145900/ 173500 | consumed samples: 37350400 | consumed tokens: 76493619200 | elapsed time per iteration (s): 0.12 | learning rate: 3.123E-05 | global batch size: 256 | lm loss: 4.512244E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2116.933 | TFLOPs: 7.87 | 7: iteration 145910/ 173500 | consumed samples: 37352960 | consumed tokens: 76498862080 | elapsed time per iteration (s): 0.08 | learning rate: 3.122E-05 | global batch size: 256 | lm loss: 4.495412E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.729 | TFLOPs: 11.88 | 7: iteration 145920/ 173500 | consumed samples: 37355520 | consumed tokens: 76504104960 | elapsed time per iteration (s): 0.09 | learning rate: 3.121E-05 | global batch size: 256 | lm loss: 4.504448E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.070 | TFLOPs: 10.43 | 7: iteration 145930/ 173500 | consumed samples: 37358080 | consumed tokens: 76509347840 | elapsed time per iteration (s): 0.09 | learning rate: 3.120E-05 | global batch size: 256 | lm loss: 4.497980E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.152 | TFLOPs: 11.08 | 7: iteration 145940/ 173500 | consumed samples: 37360640 | consumed tokens: 76514590720 | elapsed time per iteration (s): 0.08 | learning rate: 3.119E-05 | global batch size: 256 | lm loss: 4.486122E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.957 | TFLOPs: 11.69 | 7: iteration 145950/ 173500 | consumed samples: 37363200 | consumed tokens: 76519833600 | elapsed time per iteration (s): 0.08 | learning rate: 3.119E-05 | global batch size: 256 | lm loss: 4.506609E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.321 | TFLOPs: 11.81 | 7: iteration 145960/ 173500 | consumed samples: 37365760 | consumed tokens: 76525076480 | elapsed time per iteration (s): 0.08 | learning rate: 3.118E-05 | global batch size: 256 | lm loss: 4.514835E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.103 | TFLOPs: 11.85 | 7: iteration 145970/ 173500 | consumed samples: 37368320 | consumed tokens: 76530319360 | elapsed time per iteration (s): 0.08 | learning rate: 3.117E-05 | global batch size: 256 | lm loss: 4.509809E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.851 | TFLOPs: 11.85 | 7: iteration 145980/ 173500 | consumed samples: 37370880 | consumed tokens: 76535562240 | elapsed time per iteration (s): 0.08 | learning rate: 3.116E-05 | global batch size: 256 | lm loss: 4.490613E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.639 | TFLOPs: 11.88 | 7: iteration 145990/ 173500 | consumed samples: 37373440 | consumed tokens: 76540805120 | elapsed time per iteration (s): 0.08 | learning rate: 3.115E-05 | global batch size: 256 | lm loss: 4.510301E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.938 | TFLOPs: 11.87 | 0: [2023-03-17 03:50:34,117] [INFO] [logging.py:68:log_dist] [Rank 0] step=146000, skipped=0, lr=[3.1146732758228304e-05, 3.1146732758228304e-05, 3.1146732758228304e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 146000/ 173500 | consumed samples: 37376000 | consumed tokens: 76546048000 | elapsed time per iteration (s): 0.08 | learning rate: 3.115E-05 | global batch size: 256 | lm loss: 4.497449E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.707 | TFLOPs: 11.88 | 0: steps: 146000 loss: 4.4806 iter time (s): 0.084 samples/sec: 3062.413 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 146000 | lm loss value: 4.430222E+00 | lm loss PPL: 8.395005E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 146000 to checkpoints_14m91b100m 0: [2023-03-17 03:50:34,176] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step146000 is begin to save! 0: [2023-03-17 03:50:34,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:50:34,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:50:34,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:50:34,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:50:34,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:50:34,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:50:34,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:50:34,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:50:34,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:50:34,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:50:34,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:50:34,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:50:34,220] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step146000/mp_rank_00_model_states.pt 0: [2023-03-17 03:50:34,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:50:34,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:50:34,239] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,245] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,245] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:50:34,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:50:34,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:50:34,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 6: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 4: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 4: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 2: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 7: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 1: [2023-03-17 03:50:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 03:50:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 5: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:50:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:50:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 3: [2023-03-17 03:50:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:50:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step146000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:50:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step146000 is ready now! 0: successfully saved checkpoint at iteration 146000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.91 7: iteration 146010/ 173500 | consumed samples: 37378560 | consumed tokens: 76551290880 | elapsed time per iteration (s): 0.09 | learning rate: 3.114E-05 | global batch size: 256 | lm loss: 4.506549E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2722.072 | TFLOPs: 10.12 | 7: iteration 146020/ 173500 | consumed samples: 37381120 | consumed tokens: 76556533760 | elapsed time per iteration (s): 0.08 | learning rate: 3.113E-05 | global batch size: 256 | lm loss: 4.502108E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.846 | TFLOPs: 11.87 | 7: iteration 146030/ 173500 | consumed samples: 37383680 | consumed tokens: 76561776640 | elapsed time per iteration (s): 0.08 | learning rate: 3.112E-05 | global batch size: 256 | lm loss: 4.510914E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.557 | TFLOPs: 11.85 | 7: iteration 146040/ 173500 | consumed samples: 37386240 | consumed tokens: 76567019520 | elapsed time per iteration (s): 0.08 | learning rate: 3.112E-05 | global batch size: 256 | lm loss: 4.523913E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.122 | TFLOPs: 11.80 | 7: iteration 146050/ 173500 | consumed samples: 37388800 | consumed tokens: 76572262400 | elapsed time per iteration (s): 0.08 | learning rate: 3.111E-05 | global batch size: 256 | lm loss: 4.510622E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.245 | TFLOPs: 11.57 | 7: iteration 146060/ 173500 | consumed samples: 37391360 | consumed tokens: 76577505280 | elapsed time per iteration (s): 0.08 | learning rate: 3.110E-05 | global batch size: 256 | lm loss: 4.501945E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.797 | TFLOPs: 11.57 | 7: iteration 146070/ 173500 | consumed samples: 37393920 | consumed tokens: 76582748160 | elapsed time per iteration (s): 0.08 | learning rate: 3.109E-05 | global batch size: 256 | lm loss: 4.511105E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.170 | TFLOPs: 11.80 | 7: iteration 146080/ 173500 | consumed samples: 37396480 | consumed tokens: 76587991040 | elapsed time per iteration (s): 0.08 | learning rate: 3.108E-05 | global batch size: 256 | lm loss: 4.516684E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.804 | TFLOPs: 11.76 | 7: iteration 146090/ 173500 | consumed samples: 37399040 | consumed tokens: 76593233920 | elapsed time per iteration (s): 0.08 | learning rate: 3.108E-05 | global batch size: 256 | lm loss: 4.502683E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.291 | TFLOPs: 11.61 | 7: iteration 146100/ 173500 | consumed samples: 37401600 | consumed tokens: 76598476800 | elapsed time per iteration (s): 0.09 | learning rate: 3.107E-05 | global batch size: 256 | lm loss: 4.515400E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2760.482 | TFLOPs: 10.27 | 7: iteration 146110/ 173500 | consumed samples: 37404160 | consumed tokens: 76603719680 | elapsed time per iteration (s): 0.09 | learning rate: 3.106E-05 | global batch size: 256 | lm loss: 4.511856E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.627 | TFLOPs: 10.69 | 7: iteration 146120/ 173500 | consumed samples: 37406720 | consumed tokens: 76608962560 | elapsed time per iteration (s): 0.08 | learning rate: 3.105E-05 | global batch size: 256 | lm loss: 4.504138E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.735 | TFLOPs: 11.73 | 7: iteration 146130/ 173500 | consumed samples: 37409280 | consumed tokens: 76614205440 | elapsed time per iteration (s): 0.08 | learning rate: 3.104E-05 | global batch size: 256 | lm loss: 4.507270E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.164 | TFLOPs: 11.78 | 7: iteration 146140/ 173500 | consumed samples: 37411840 | consumed tokens: 76619448320 | elapsed time per iteration (s): 0.08 | learning rate: 3.104E-05 | global batch size: 256 | lm loss: 4.511343E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.714 | TFLOPs: 11.82 | 7: iteration 146150/ 173500 | consumed samples: 37414400 | consumed tokens: 76624691200 | elapsed time per iteration (s): 0.08 | learning rate: 3.103E-05 | global batch size: 256 | lm loss: 4.502523E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.122 | TFLOPs: 11.84 | 7: iteration 146160/ 173500 | consumed samples: 37416960 | consumed tokens: 76629934080 | elapsed time per iteration (s): 0.08 | learning rate: 3.102E-05 | global batch size: 256 | lm loss: 4.502977E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.356 | TFLOPs: 11.84 | 7: iteration 146170/ 173500 | consumed samples: 37419520 | consumed tokens: 76635176960 | elapsed time per iteration (s): 0.08 | learning rate: 3.101E-05 | global batch size: 256 | lm loss: 4.505553E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.919 | TFLOPs: 11.80 | 7: iteration 146180/ 173500 | consumed samples: 37422080 | consumed tokens: 76640419840 | elapsed time per iteration (s): 0.08 | learning rate: 3.100E-05 | global batch size: 256 | lm loss: 4.511983E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.835 | TFLOPs: 11.83 | 7: iteration 146190/ 173500 | consumed samples: 37424640 | consumed tokens: 76645662720 | elapsed time per iteration (s): 0.08 | learning rate: 3.100E-05 | global batch size: 256 | lm loss: 4.502322E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.312 | TFLOPs: 11.78 | 7: iteration 146200/ 173500 | consumed samples: 37427200 | consumed tokens: 76650905600 | elapsed time per iteration (s): 0.08 | learning rate: 3.099E-05 | global batch size: 256 | lm loss: 4.503445E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.022 | TFLOPs: 11.84 | 7: iteration 146210/ 173500 | consumed samples: 37429760 | consumed tokens: 76656148480 | elapsed time per iteration (s): 0.08 | learning rate: 3.098E-05 | global batch size: 256 | lm loss: 4.508994E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.210 | TFLOPs: 11.82 | 7: iteration 146220/ 173500 | consumed samples: 37432320 | consumed tokens: 76661391360 | elapsed time per iteration (s): 0.08 | learning rate: 3.097E-05 | global batch size: 256 | lm loss: 4.510183E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.123 | TFLOPs: 11.84 | 7: iteration 146230/ 173500 | consumed samples: 37434880 | consumed tokens: 76666634240 | elapsed time per iteration (s): 0.08 | learning rate: 3.096E-05 | global batch size: 256 | lm loss: 4.512381E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.225 | TFLOPs: 11.84 | 7: iteration 146240/ 173500 | consumed samples: 37437440 | consumed tokens: 76671877120 | elapsed time per iteration (s): 0.08 | learning rate: 3.096E-05 | global batch size: 256 | lm loss: 4.524750E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.529 | TFLOPs: 11.83 | 7: iteration 146250/ 173500 | consumed samples: 37440000 | consumed tokens: 76677120000 | elapsed time per iteration (s): 0.08 | learning rate: 3.095E-05 | global batch size: 256 | lm loss: 4.507593E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.742 | TFLOPs: 11.84 | 7: iteration 146260/ 173500 | consumed samples: 37442560 | consumed tokens: 76682362880 | elapsed time per iteration (s): 0.08 | learning rate: 3.094E-05 | global batch size: 256 | lm loss: 4.506098E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.767 | TFLOPs: 11.78 | 7: iteration 146270/ 173500 | consumed samples: 37445120 | consumed tokens: 76687605760 | elapsed time per iteration (s): 0.08 | learning rate: 3.093E-05 | global batch size: 256 | lm loss: 4.512723E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.060 | TFLOPs: 11.82 | 7: iteration 146280/ 173500 | consumed samples: 37447680 | consumed tokens: 76692848640 | elapsed time per iteration (s): 0.08 | learning rate: 3.093E-05 | global batch size: 256 | lm loss: 4.504825E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.862 | TFLOPs: 11.57 | 7: iteration 146290/ 173500 | consumed samples: 37450240 | consumed tokens: 76698091520 | elapsed time per iteration (s): 0.08 | learning rate: 3.092E-05 | global batch size: 256 | lm loss: 4.510374E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.436 | TFLOPs: 11.85 | 7: iteration 146300/ 173500 | consumed samples: 37452800 | consumed tokens: 76703334400 | elapsed time per iteration (s): 0.08 | learning rate: 3.091E-05 | global batch size: 256 | lm loss: 4.500906E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.907 | TFLOPs: 11.90 | 7: iteration 146310/ 173500 | consumed samples: 37455360 | consumed tokens: 76708577280 | elapsed time per iteration (s): 0.08 | learning rate: 3.090E-05 | global batch size: 256 | lm loss: 4.515123E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.876 | TFLOPs: 11.82 | 7: iteration 146320/ 173500 | consumed samples: 37457920 | consumed tokens: 76713820160 | elapsed time per iteration (s): 0.08 | learning rate: 3.089E-05 | global batch size: 256 | lm loss: 4.514030E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.250 | TFLOPs: 11.96 | 7: iteration 146330/ 173500 | consumed samples: 37460480 | consumed tokens: 76719063040 | elapsed time per iteration (s): 0.08 | learning rate: 3.089E-05 | global batch size: 256 | lm loss: 4.516125E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.475 | TFLOPs: 11.93 | 7: iteration 146340/ 173500 | consumed samples: 37463040 | consumed tokens: 76724305920 | elapsed time per iteration (s): 0.08 | learning rate: 3.088E-05 | global batch size: 256 | lm loss: 4.501044E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.009 | TFLOPs: 11.93 | 7: iteration 146350/ 173500 | consumed samples: 37465600 | consumed tokens: 76729548800 | elapsed time per iteration (s): 0.20 | learning rate: 3.087E-05 | global batch size: 256 | lm loss: 4.507912E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1266.822 | TFLOPs: 4.71 | 7: iteration 146360/ 173500 | consumed samples: 37468160 | consumed tokens: 76734791680 | elapsed time per iteration (s): 0.08 | learning rate: 3.086E-05 | global batch size: 256 | lm loss: 4.502548E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.661 | TFLOPs: 11.86 | 7: iteration 146370/ 173500 | consumed samples: 37470720 | consumed tokens: 76740034560 | elapsed time per iteration (s): 0.08 | learning rate: 3.085E-05 | global batch size: 256 | lm loss: 4.505834E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.408 | TFLOPs: 11.86 | 7: iteration 146380/ 173500 | consumed samples: 37473280 | consumed tokens: 76745277440 | elapsed time per iteration (s): 0.08 | learning rate: 3.085E-05 | global batch size: 256 | lm loss: 4.512936E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.740 | TFLOPs: 11.93 | 7: iteration 146390/ 173500 | consumed samples: 37475840 | consumed tokens: 76750520320 | elapsed time per iteration (s): 0.08 | learning rate: 3.084E-05 | global batch size: 256 | lm loss: 4.512760E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.542 | TFLOPs: 11.91 | 7: iteration 146400/ 173500 | consumed samples: 37478400 | consumed tokens: 76755763200 | elapsed time per iteration (s): 0.08 | learning rate: 3.083E-05 | global batch size: 256 | lm loss: 4.508496E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.129 | TFLOPs: 11.64 | 7: iteration 146410/ 173500 | consumed samples: 37480960 | consumed tokens: 76761006080 | elapsed time per iteration (s): 0.08 | learning rate: 3.082E-05 | global batch size: 256 | lm loss: 4.504015E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.737 | TFLOPs: 11.90 | 7: iteration 146420/ 173500 | consumed samples: 37483520 | consumed tokens: 76766248960 | elapsed time per iteration (s): 0.11 | learning rate: 3.082E-05 | global batch size: 256 | lm loss: 4.508461E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2313.450 | TFLOPs: 8.61 | 7: iteration 146430/ 173500 | consumed samples: 37486080 | consumed tokens: 76771491840 | elapsed time per iteration (s): 0.08 | learning rate: 3.081E-05 | global batch size: 256 | lm loss: 4.501533E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.506 | TFLOPs: 11.96 | 7: iteration 146440/ 173500 | consumed samples: 37488640 | consumed tokens: 76776734720 | elapsed time per iteration (s): 0.08 | learning rate: 3.080E-05 | global batch size: 256 | lm loss: 4.497637E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.459 | TFLOPs: 11.95 | 7: iteration 146450/ 173500 | consumed samples: 37491200 | consumed tokens: 76781977600 | elapsed time per iteration (s): 0.08 | learning rate: 3.079E-05 | global batch size: 256 | lm loss: 4.498263E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.152 | TFLOPs: 11.93 | 7: iteration 146460/ 173500 | consumed samples: 37493760 | consumed tokens: 76787220480 | elapsed time per iteration (s): 0.08 | learning rate: 3.078E-05 | global batch size: 256 | lm loss: 4.514549E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.613 | TFLOPs: 11.75 | 7: iteration 146470/ 173500 | consumed samples: 37496320 | consumed tokens: 76792463360 | elapsed time per iteration (s): 0.08 | learning rate: 3.078E-05 | global batch size: 256 | lm loss: 4.497183E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.940 | TFLOPs: 11.94 | 7: iteration 146480/ 173500 | consumed samples: 37498880 | consumed tokens: 76797706240 | elapsed time per iteration (s): 0.08 | learning rate: 3.077E-05 | global batch size: 256 | lm loss: 4.503090E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.894 | TFLOPs: 11.95 | 7: iteration 146490/ 173500 | consumed samples: 37501440 | consumed tokens: 76802949120 | elapsed time per iteration (s): 0.08 | learning rate: 3.076E-05 | global batch size: 256 | lm loss: 4.513256E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.616 | TFLOPs: 11.65 | 7: iteration 146500/ 173500 | consumed samples: 37504000 | consumed tokens: 76808192000 | elapsed time per iteration (s): 0.08 | learning rate: 3.075E-05 | global batch size: 256 | lm loss: 4.513942E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.908 | TFLOPs: 11.94 | 7: iteration 146510/ 173500 | consumed samples: 37506560 | consumed tokens: 76813434880 | elapsed time per iteration (s): 0.08 | learning rate: 3.075E-05 | global batch size: 256 | lm loss: 4.503579E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.196 | TFLOPs: 11.96 | 7: iteration 146520/ 173500 | consumed samples: 37509120 | consumed tokens: 76818677760 | elapsed time per iteration (s): 0.08 | learning rate: 3.074E-05 | global batch size: 256 | lm loss: 4.523042E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.227 | TFLOPs: 11.65 | 7: iteration 146530/ 173500 | consumed samples: 37511680 | consumed tokens: 76823920640 | elapsed time per iteration (s): 0.08 | learning rate: 3.073E-05 | global batch size: 256 | lm loss: 4.509407E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.680 | TFLOPs: 11.61 | 7: iteration 146540/ 173500 | consumed samples: 37514240 | consumed tokens: 76829163520 | elapsed time per iteration (s): 0.08 | learning rate: 3.072E-05 | global batch size: 256 | lm loss: 4.501102E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.838 | TFLOPs: 11.96 | 7: iteration 146550/ 173500 | consumed samples: 37516800 | consumed tokens: 76834406400 | elapsed time per iteration (s): 0.08 | learning rate: 3.071E-05 | global batch size: 256 | lm loss: 4.508031E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.582 | TFLOPs: 11.95 | 7: iteration 146560/ 173500 | consumed samples: 37519360 | consumed tokens: 76839649280 | elapsed time per iteration (s): 0.08 | learning rate: 3.071E-05 | global batch size: 256 | lm loss: 4.505431E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.066 | TFLOPs: 11.68 | 7: iteration 146570/ 173500 | consumed samples: 37521920 | consumed tokens: 76844892160 | elapsed time per iteration (s): 0.08 | learning rate: 3.070E-05 | global batch size: 256 | lm loss: 4.497922E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.668 | TFLOPs: 11.67 | 7: iteration 146580/ 173500 | consumed samples: 37524480 | consumed tokens: 76850135040 | elapsed time per iteration (s): 0.08 | learning rate: 3.069E-05 | global batch size: 256 | lm loss: 4.516281E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.286 | TFLOPs: 11.64 | 7: iteration 146590/ 173500 | consumed samples: 37527040 | consumed tokens: 76855377920 | elapsed time per iteration (s): 0.08 | learning rate: 3.068E-05 | global batch size: 256 | lm loss: 4.516559E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.151 | TFLOPs: 11.63 | 7: iteration 146600/ 173500 | consumed samples: 37529600 | consumed tokens: 76860620800 | elapsed time per iteration (s): 0.08 | learning rate: 3.068E-05 | global batch size: 256 | lm loss: 4.500774E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.107 | TFLOPs: 11.78 | 7: iteration 146610/ 173500 | consumed samples: 37532160 | consumed tokens: 76865863680 | elapsed time per iteration (s): 0.08 | learning rate: 3.067E-05 | global batch size: 256 | lm loss: 4.510473E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.458 | TFLOPs: 11.61 | 7: iteration 146620/ 173500 | consumed samples: 37534720 | consumed tokens: 76871106560 | elapsed time per iteration (s): 0.08 | learning rate: 3.066E-05 | global batch size: 256 | lm loss: 4.506671E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.256 | TFLOPs: 11.90 | 7: iteration 146630/ 173500 | consumed samples: 37537280 | consumed tokens: 76876349440 | elapsed time per iteration (s): 0.08 | learning rate: 3.065E-05 | global batch size: 256 | lm loss: 4.508403E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.776 | TFLOPs: 11.92 | 7: iteration 146640/ 173500 | consumed samples: 37539840 | consumed tokens: 76881592320 | elapsed time per iteration (s): 0.08 | learning rate: 3.064E-05 | global batch size: 256 | lm loss: 4.501914E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.992 | TFLOPs: 11.85 | 7: iteration 146650/ 173500 | consumed samples: 37542400 | consumed tokens: 76886835200 | elapsed time per iteration (s): 0.08 | learning rate: 3.064E-05 | global batch size: 256 | lm loss: 4.518634E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.024 | TFLOPs: 11.61 | 7: iteration 146660/ 173500 | consumed samples: 37544960 | consumed tokens: 76892078080 | elapsed time per iteration (s): 0.08 | learning rate: 3.063E-05 | global batch size: 256 | lm loss: 4.503614E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.443 | TFLOPs: 11.87 | 7: iteration 146670/ 173500 | consumed samples: 37547520 | consumed tokens: 76897320960 | elapsed time per iteration (s): 0.08 | learning rate: 3.062E-05 | global batch size: 256 | lm loss: 4.506715E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.274 | TFLOPs: 11.62 | 7: iteration 146680/ 173500 | consumed samples: 37550080 | consumed tokens: 76902563840 | elapsed time per iteration (s): 0.08 | learning rate: 3.061E-05 | global batch size: 256 | lm loss: 4.516931E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.758 | TFLOPs: 11.98 | 7: iteration 146690/ 173500 | consumed samples: 37552640 | consumed tokens: 76907806720 | elapsed time per iteration (s): 0.08 | learning rate: 3.061E-05 | global batch size: 256 | lm loss: 4.501194E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.293 | TFLOPs: 11.90 | 7: iteration 146700/ 173500 | consumed samples: 37555200 | consumed tokens: 76913049600 | elapsed time per iteration (s): 0.08 | learning rate: 3.060E-05 | global batch size: 256 | lm loss: 4.501450E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.359 | TFLOPs: 11.93 | 7: iteration 146710/ 173500 | consumed samples: 37557760 | consumed tokens: 76918292480 | elapsed time per iteration (s): 0.08 | learning rate: 3.059E-05 | global batch size: 256 | lm loss: 4.522062E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.567 | TFLOPs: 11.99 | 7: iteration 146720/ 173500 | consumed samples: 37560320 | consumed tokens: 76923535360 | elapsed time per iteration (s): 0.08 | learning rate: 3.058E-05 | global batch size: 256 | lm loss: 4.507956E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.142 | TFLOPs: 11.98 | 7: iteration 146730/ 173500 | consumed samples: 37562880 | consumed tokens: 76928778240 | elapsed time per iteration (s): 0.08 | learning rate: 3.057E-05 | global batch size: 256 | lm loss: 4.503939E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.113 | TFLOPs: 11.98 | 7: iteration 146740/ 173500 | consumed samples: 37565440 | consumed tokens: 76934021120 | elapsed time per iteration (s): 0.08 | learning rate: 3.057E-05 | global batch size: 256 | lm loss: 4.512643E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.616 | TFLOPs: 11.99 | 7: iteration 146750/ 173500 | consumed samples: 37568000 | consumed tokens: 76939264000 | elapsed time per iteration (s): 0.08 | learning rate: 3.056E-05 | global batch size: 256 | lm loss: 4.498758E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.714 | TFLOPs: 11.71 | 7: iteration 146760/ 173500 | consumed samples: 37570560 | consumed tokens: 76944506880 | elapsed time per iteration (s): 0.08 | learning rate: 3.055E-05 | global batch size: 256 | lm loss: 4.513889E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.618 | TFLOPs: 11.86 | 7: iteration 146770/ 173500 | consumed samples: 37573120 | consumed tokens: 76949749760 | elapsed time per iteration (s): 0.08 | learning rate: 3.054E-05 | global batch size: 256 | lm loss: 4.503549E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.618 | TFLOPs: 11.92 | 7: iteration 146780/ 173500 | consumed samples: 37575680 | consumed tokens: 76954992640 | elapsed time per iteration (s): 0.08 | learning rate: 3.054E-05 | global batch size: 256 | lm loss: 4.510665E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.142 | TFLOPs: 11.97 | 7: iteration 146790/ 173500 | consumed samples: 37578240 | consumed tokens: 76960235520 | elapsed time per iteration (s): 0.08 | learning rate: 3.053E-05 | global batch size: 256 | lm loss: 4.492286E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.969 | TFLOPs: 11.95 | 7: iteration 146800/ 173500 | consumed samples: 37580800 | consumed tokens: 76965478400 | elapsed time per iteration (s): 0.08 | learning rate: 3.052E-05 | global batch size: 256 | lm loss: 4.496267E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.310 | TFLOPs: 11.97 | 7: iteration 146810/ 173500 | consumed samples: 37583360 | consumed tokens: 76970721280 | elapsed time per iteration (s): 0.08 | learning rate: 3.051E-05 | global batch size: 256 | lm loss: 4.486409E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.606 | TFLOPs: 11.60 | 7: iteration 146820/ 173500 | consumed samples: 37585920 | consumed tokens: 76975964160 | elapsed time per iteration (s): 0.08 | learning rate: 3.050E-05 | global batch size: 256 | lm loss: 4.505969E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.663 | TFLOPs: 11.93 | 7: iteration 146830/ 173500 | consumed samples: 37588480 | consumed tokens: 76981207040 | elapsed time per iteration (s): 0.08 | learning rate: 3.050E-05 | global batch size: 256 | lm loss: 4.496587E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.172 | TFLOPs: 11.95 | 7: iteration 146840/ 173500 | consumed samples: 37591040 | consumed tokens: 76986449920 | elapsed time per iteration (s): 0.08 | learning rate: 3.049E-05 | global batch size: 256 | lm loss: 4.509433E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.673 | TFLOPs: 11.90 | 7: iteration 146850/ 173500 | consumed samples: 37593600 | consumed tokens: 76991692800 | elapsed time per iteration (s): 0.08 | learning rate: 3.048E-05 | global batch size: 256 | lm loss: 4.517691E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.939 | TFLOPs: 11.88 | 7: iteration 146860/ 173500 | consumed samples: 37596160 | consumed tokens: 76996935680 | elapsed time per iteration (s): 0.08 | learning rate: 3.047E-05 | global batch size: 256 | lm loss: 4.496542E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.249 | TFLOPs: 11.58 | 7: iteration 146870/ 173500 | consumed samples: 37598720 | consumed tokens: 77002178560 | elapsed time per iteration (s): 0.08 | learning rate: 3.047E-05 | global batch size: 256 | lm loss: 4.511112E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.104 | TFLOPs: 11.94 | 7: iteration 146880/ 173500 | consumed samples: 37601280 | consumed tokens: 77007421440 | elapsed time per iteration (s): 0.08 | learning rate: 3.046E-05 | global batch size: 256 | lm loss: 4.507104E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.170 | TFLOPs: 11.90 | 7: iteration 146890/ 173500 | consumed samples: 37603840 | consumed tokens: 77012664320 | elapsed time per iteration (s): 0.08 | learning rate: 3.045E-05 | global batch size: 256 | lm loss: 4.511792E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.888 | TFLOPs: 11.84 | 7: iteration 146900/ 173500 | consumed samples: 37606400 | consumed tokens: 77017907200 | elapsed time per iteration (s): 0.08 | learning rate: 3.044E-05 | global batch size: 256 | lm loss: 4.503022E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.644 | TFLOPs: 11.86 | 7: iteration 146910/ 173500 | consumed samples: 37608960 | consumed tokens: 77023150080 | elapsed time per iteration (s): 0.08 | learning rate: 3.044E-05 | global batch size: 256 | lm loss: 4.503617E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.472 | TFLOPs: 11.92 | 7: iteration 146920/ 173500 | consumed samples: 37611520 | consumed tokens: 77028392960 | elapsed time per iteration (s): 0.08 | learning rate: 3.043E-05 | global batch size: 256 | lm loss: 4.512346E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.060 | TFLOPs: 11.65 | 7: iteration 146930/ 173500 | consumed samples: 37614080 | consumed tokens: 77033635840 | elapsed time per iteration (s): 0.08 | learning rate: 3.042E-05 | global batch size: 256 | lm loss: 4.508110E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.337 | TFLOPs: 11.92 | 7: iteration 146940/ 173500 | consumed samples: 37616640 | consumed tokens: 77038878720 | elapsed time per iteration (s): 0.08 | learning rate: 3.041E-05 | global batch size: 256 | lm loss: 4.508921E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.577 | TFLOPs: 11.90 | 7: iteration 146950/ 173500 | consumed samples: 37619200 | consumed tokens: 77044121600 | elapsed time per iteration (s): 0.08 | learning rate: 3.040E-05 | global batch size: 256 | lm loss: 4.504740E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.608 | TFLOPs: 11.94 | 7: iteration 146960/ 173500 | consumed samples: 37621760 | consumed tokens: 77049364480 | elapsed time per iteration (s): 0.08 | learning rate: 3.040E-05 | global batch size: 256 | lm loss: 4.503419E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.648 | TFLOPs: 11.88 | 7: iteration 146970/ 173500 | consumed samples: 37624320 | consumed tokens: 77054607360 | elapsed time per iteration (s): 0.08 | learning rate: 3.039E-05 | global batch size: 256 | lm loss: 4.513657E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.659 | TFLOPs: 11.77 | 7: iteration 146980/ 173500 | consumed samples: 37626880 | consumed tokens: 77059850240 | elapsed time per iteration (s): 0.08 | learning rate: 3.038E-05 | global batch size: 256 | lm loss: 4.505893E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.839 | TFLOPs: 11.82 | 7: iteration 146990/ 173500 | consumed samples: 37629440 | consumed tokens: 77065093120 | elapsed time per iteration (s): 0.08 | learning rate: 3.037E-05 | global batch size: 256 | lm loss: 4.509188E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.266 | TFLOPs: 11.73 | 7: iteration 147000/ 173500 | consumed samples: 37632000 | consumed tokens: 77070336000 | elapsed time per iteration (s): 0.08 | learning rate: 3.037E-05 | global batch size: 256 | lm loss: 4.509912E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.049 | TFLOPs: 11.94 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 147000 | lm loss value: 4.382566E+00 | lm loss PPL: 8.004320E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 147000 to checkpoints_14m91b100m 0: [2023-03-17 03:51:56,519] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step147000 is begin to save! 0: [2023-03-17 03:51:56,523] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:51:56,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:51:56,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:51:56,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:51:56,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:51:56,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:51:56,566] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:51:56,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:51:56,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:51:56,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:51:56,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:51:56,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:51:56,575] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step147000/mp_rank_00_model_states.pt 0: [2023-03-17 03:51:56,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:51:56,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:51:56,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:51:56,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 7: [2023-03-17 03:51:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 5: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 4: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 1: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 3: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:51:56,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:51:56,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 6: [2023-03-17 03:51:56,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:51:56,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:51:56,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 2: [2023-03-17 03:51:56,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:51:56,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step147000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:51:56,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step147000 is ready now! 0: successfully saved checkpoint at iteration 147000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 93.51 7: iteration 147010/ 173500 | consumed samples: 37634560 | consumed tokens: 77075578880 | elapsed time per iteration (s): 0.09 | learning rate: 3.036E-05 | global batch size: 256 | lm loss: 4.511694E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.637 | TFLOPs: 10.32 | 7: iteration 147020/ 173500 | consumed samples: 37637120 | consumed tokens: 77080821760 | elapsed time per iteration (s): 0.08 | learning rate: 3.035E-05 | global batch size: 256 | lm loss: 4.494960E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.676 | TFLOPs: 11.94 | 7: iteration 147030/ 173500 | consumed samples: 37639680 | consumed tokens: 77086064640 | elapsed time per iteration (s): 0.08 | learning rate: 3.034E-05 | global batch size: 256 | lm loss: 4.502470E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.024 | TFLOPs: 11.92 | 7: iteration 147040/ 173500 | consumed samples: 37642240 | consumed tokens: 77091307520 | elapsed time per iteration (s): 0.08 | learning rate: 3.034E-05 | global batch size: 256 | lm loss: 4.516776E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.603 | TFLOPs: 11.89 | 7: iteration 147050/ 173500 | consumed samples: 37644800 | consumed tokens: 77096550400 | elapsed time per iteration (s): 0.08 | learning rate: 3.033E-05 | global batch size: 256 | lm loss: 4.506262E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.683 | TFLOPs: 11.95 | 7: iteration 147060/ 173500 | consumed samples: 37647360 | consumed tokens: 77101793280 | elapsed time per iteration (s): 0.08 | learning rate: 3.032E-05 | global batch size: 256 | lm loss: 4.501759E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.638 | TFLOPs: 11.83 | 7: iteration 147070/ 173500 | consumed samples: 37649920 | consumed tokens: 77107036160 | elapsed time per iteration (s): 0.08 | learning rate: 3.031E-05 | global batch size: 256 | lm loss: 4.521359E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.935 | TFLOPs: 11.83 | 7: iteration 147080/ 173500 | consumed samples: 37652480 | consumed tokens: 77112279040 | elapsed time per iteration (s): 0.08 | learning rate: 3.031E-05 | global batch size: 256 | lm loss: 4.497778E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.434 | TFLOPs: 11.84 | 7: iteration 147090/ 173500 | consumed samples: 37655040 | consumed tokens: 77117521920 | elapsed time per iteration (s): 0.08 | learning rate: 3.030E-05 | global batch size: 256 | lm loss: 4.500353E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.054 | TFLOPs: 11.79 | 7: iteration 147100/ 173500 | consumed samples: 37657600 | consumed tokens: 77122764800 | elapsed time per iteration (s): 0.08 | learning rate: 3.029E-05 | global batch size: 256 | lm loss: 4.493523E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.980 | TFLOPs: 11.78 | 7: iteration 147110/ 173500 | consumed samples: 37660160 | consumed tokens: 77128007680 | elapsed time per iteration (s): 0.08 | learning rate: 3.028E-05 | global batch size: 256 | lm loss: 4.518490E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.583 | TFLOPs: 11.78 | 7: iteration 147120/ 173500 | consumed samples: 37662720 | consumed tokens: 77133250560 | elapsed time per iteration (s): 0.08 | learning rate: 3.027E-05 | global batch size: 256 | lm loss: 4.506213E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.345 | TFLOPs: 11.93 | 7: iteration 147130/ 173500 | consumed samples: 37665280 | consumed tokens: 77138493440 | elapsed time per iteration (s): 0.10 | learning rate: 3.027E-05 | global batch size: 256 | lm loss: 4.498966E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2545.127 | TFLOPs: 9.47 | 7: iteration 147140/ 173500 | consumed samples: 37667840 | consumed tokens: 77143736320 | elapsed time per iteration (s): 0.11 | learning rate: 3.026E-05 | global batch size: 256 | lm loss: 4.515103E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.278 | TFLOPs: 8.78 | 7: iteration 147150/ 173500 | consumed samples: 37670400 | consumed tokens: 77148979200 | elapsed time per iteration (s): 0.11 | learning rate: 3.025E-05 | global batch size: 256 | lm loss: 4.497252E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2388.343 | TFLOPs: 8.88 | 7: iteration 147160/ 173500 | consumed samples: 37672960 | consumed tokens: 77154222080 | elapsed time per iteration (s): 0.11 | learning rate: 3.024E-05 | global batch size: 256 | lm loss: 4.500544E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2293.996 | TFLOPs: 8.53 | 7: iteration 147170/ 173500 | consumed samples: 37675520 | consumed tokens: 77159464960 | elapsed time per iteration (s): 0.11 | learning rate: 3.024E-05 | global batch size: 256 | lm loss: 4.491535E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.538 | TFLOPs: 8.59 | 7: iteration 147180/ 173500 | consumed samples: 37678080 | consumed tokens: 77164707840 | elapsed time per iteration (s): 0.11 | learning rate: 3.023E-05 | global batch size: 256 | lm loss: 4.508194E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.153 | TFLOPs: 8.82 | 7: iteration 147190/ 173500 | consumed samples: 37680640 | consumed tokens: 77169950720 | elapsed time per iteration (s): 0.11 | learning rate: 3.022E-05 | global batch size: 256 | lm loss: 4.523270E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.851 | TFLOPs: 8.60 | 7: iteration 147200/ 173500 | consumed samples: 37683200 | consumed tokens: 77175193600 | elapsed time per iteration (s): 0.11 | learning rate: 3.021E-05 | global batch size: 256 | lm loss: 4.496062E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2396.860 | TFLOPs: 8.92 | 7: iteration 147210/ 173500 | consumed samples: 37685760 | consumed tokens: 77180436480 | elapsed time per iteration (s): 0.11 | learning rate: 3.021E-05 | global batch size: 256 | lm loss: 4.527530E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2310.537 | TFLOPs: 8.59 | 7: iteration 147220/ 173500 | consumed samples: 37688320 | consumed tokens: 77185679360 | elapsed time per iteration (s): 0.11 | learning rate: 3.020E-05 | global batch size: 256 | lm loss: 4.508363E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.427 | TFLOPs: 8.82 | 7: iteration 147230/ 173500 | consumed samples: 37690880 | consumed tokens: 77190922240 | elapsed time per iteration (s): 0.11 | learning rate: 3.019E-05 | global batch size: 256 | lm loss: 4.513308E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2406.113 | TFLOPs: 8.95 | 7: iteration 147240/ 173500 | consumed samples: 37693440 | consumed tokens: 77196165120 | elapsed time per iteration (s): 0.11 | learning rate: 3.018E-05 | global batch size: 256 | lm loss: 4.504542E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.128 | TFLOPs: 8.98 | 7: iteration 147250/ 173500 | consumed samples: 37696000 | consumed tokens: 77201408000 | elapsed time per iteration (s): 0.09 | learning rate: 3.018E-05 | global batch size: 256 | lm loss: 4.510877E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2897.352 | TFLOPs: 10.78 | 7: iteration 147260/ 173500 | consumed samples: 37698560 | consumed tokens: 77206650880 | elapsed time per iteration (s): 0.08 | learning rate: 3.017E-05 | global batch size: 256 | lm loss: 4.512954E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.351 | TFLOPs: 11.89 | 7: iteration 147270/ 173500 | consumed samples: 37701120 | consumed tokens: 77211893760 | elapsed time per iteration (s): 0.08 | learning rate: 3.016E-05 | global batch size: 256 | lm loss: 4.492315E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.225 | TFLOPs: 11.73 | 7: iteration 147280/ 173500 | consumed samples: 37703680 | consumed tokens: 77217136640 | elapsed time per iteration (s): 0.08 | learning rate: 3.015E-05 | global batch size: 256 | lm loss: 4.503405E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.655 | TFLOPs: 11.93 | 7: iteration 147290/ 173500 | consumed samples: 37706240 | consumed tokens: 77222379520 | elapsed time per iteration (s): 0.08 | learning rate: 3.015E-05 | global batch size: 256 | lm loss: 4.526338E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.645 | TFLOPs: 11.78 | 7: iteration 147300/ 173500 | consumed samples: 37708800 | consumed tokens: 77227622400 | elapsed time per iteration (s): 0.08 | learning rate: 3.014E-05 | global batch size: 256 | lm loss: 4.512953E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.666 | TFLOPs: 11.81 | 7: iteration 147310/ 173500 | consumed samples: 37711360 | consumed tokens: 77232865280 | elapsed time per iteration (s): 0.08 | learning rate: 3.013E-05 | global batch size: 256 | lm loss: 4.515817E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.200 | TFLOPs: 11.92 | 7: iteration 147320/ 173500 | consumed samples: 37713920 | consumed tokens: 77238108160 | elapsed time per iteration (s): 0.08 | learning rate: 3.012E-05 | global batch size: 256 | lm loss: 4.496740E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.693 | TFLOPs: 11.89 | 7: iteration 147330/ 173500 | consumed samples: 37716480 | consumed tokens: 77243351040 | elapsed time per iteration (s): 0.08 | learning rate: 3.011E-05 | global batch size: 256 | lm loss: 4.509434E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.139 | TFLOPs: 11.93 | 7: iteration 147340/ 173500 | consumed samples: 37719040 | consumed tokens: 77248593920 | elapsed time per iteration (s): 0.08 | learning rate: 3.011E-05 | global batch size: 256 | lm loss: 4.513570E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.729 | TFLOPs: 11.84 | 7: iteration 147350/ 173500 | consumed samples: 37721600 | consumed tokens: 77253836800 | elapsed time per iteration (s): 0.08 | learning rate: 3.010E-05 | global batch size: 256 | lm loss: 4.506371E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.976 | TFLOPs: 11.87 | 7: iteration 147360/ 173500 | consumed samples: 37724160 | consumed tokens: 77259079680 | elapsed time per iteration (s): 0.09 | learning rate: 3.009E-05 | global batch size: 256 | lm loss: 4.500161E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2865.882 | TFLOPs: 10.66 | 7: iteration 147370/ 173500 | consumed samples: 37726720 | consumed tokens: 77264322560 | elapsed time per iteration (s): 0.08 | learning rate: 3.008E-05 | global batch size: 256 | lm loss: 4.513579E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.530 | TFLOPs: 11.82 | 7: iteration 147380/ 173500 | consumed samples: 37729280 | consumed tokens: 77269565440 | elapsed time per iteration (s): 0.08 | learning rate: 3.008E-05 | global batch size: 256 | lm loss: 4.500786E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.989 | TFLOPs: 11.81 | 7: iteration 147390/ 173500 | consumed samples: 37731840 | consumed tokens: 77274808320 | elapsed time per iteration (s): 0.08 | learning rate: 3.007E-05 | global batch size: 256 | lm loss: 4.501196E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.243 | TFLOPs: 11.85 | 7: iteration 147400/ 173500 | consumed samples: 37734400 | consumed tokens: 77280051200 | elapsed time per iteration (s): 0.08 | learning rate: 3.006E-05 | global batch size: 256 | lm loss: 4.515886E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.092 | TFLOPs: 11.83 | 7: iteration 147410/ 173500 | consumed samples: 37736960 | consumed tokens: 77285294080 | elapsed time per iteration (s): 0.08 | learning rate: 3.005E-05 | global batch size: 256 | lm loss: 4.516463E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.929 | TFLOPs: 11.86 | 7: iteration 147420/ 173500 | consumed samples: 37739520 | consumed tokens: 77290536960 | elapsed time per iteration (s): 0.08 | learning rate: 3.005E-05 | global batch size: 256 | lm loss: 4.507963E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.619 | TFLOPs: 11.82 | 7: iteration 147430/ 173500 | consumed samples: 37742080 | consumed tokens: 77295779840 | elapsed time per iteration (s): 0.08 | learning rate: 3.004E-05 | global batch size: 256 | lm loss: 4.505709E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.709 | TFLOPs: 11.88 | 7: iteration 147440/ 173500 | consumed samples: 37744640 | consumed tokens: 77301022720 | elapsed time per iteration (s): 0.08 | learning rate: 3.003E-05 | global batch size: 256 | lm loss: 4.506568E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.153 | TFLOPs: 11.87 | 7: iteration 147450/ 173500 | consumed samples: 37747200 | consumed tokens: 77306265600 | elapsed time per iteration (s): 0.08 | learning rate: 3.002E-05 | global batch size: 256 | lm loss: 4.502797E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.184 | TFLOPs: 11.88 | 7: iteration 147460/ 173500 | consumed samples: 37749760 | consumed tokens: 77311508480 | elapsed time per iteration (s): 0.08 | learning rate: 3.002E-05 | global batch size: 256 | lm loss: 4.508428E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.636 | TFLOPs: 11.38 | 7: iteration 147470/ 173500 | consumed samples: 37752320 | consumed tokens: 77316751360 | elapsed time per iteration (s): 0.08 | learning rate: 3.001E-05 | global batch size: 256 | lm loss: 4.508242E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.271 | TFLOPs: 11.26 | 7: iteration 147480/ 173500 | consumed samples: 37754880 | consumed tokens: 77321994240 | elapsed time per iteration (s): 0.08 | learning rate: 3.000E-05 | global batch size: 256 | lm loss: 4.507220E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.680 | TFLOPs: 11.82 | 7: iteration 147490/ 173500 | consumed samples: 37757440 | consumed tokens: 77327237120 | elapsed time per iteration (s): 0.08 | learning rate: 2.999E-05 | global batch size: 256 | lm loss: 4.515734E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.712 | TFLOPs: 11.86 | 7: iteration 147500/ 173500 | consumed samples: 37760000 | consumed tokens: 77332480000 | elapsed time per iteration (s): 0.08 | learning rate: 2.999E-05 | global batch size: 256 | lm loss: 4.512925E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.654 | TFLOPs: 11.89 | 7: iteration 147510/ 173500 | consumed samples: 37762560 | consumed tokens: 77337722880 | elapsed time per iteration (s): 0.08 | learning rate: 2.998E-05 | global batch size: 256 | lm loss: 4.502458E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.582 | TFLOPs: 11.66 | 7: iteration 147520/ 173500 | consumed samples: 37765120 | consumed tokens: 77342965760 | elapsed time per iteration (s): 0.08 | learning rate: 2.997E-05 | global batch size: 256 | lm loss: 4.509370E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.864 | TFLOPs: 11.87 | 7: iteration 147530/ 173500 | consumed samples: 37767680 | consumed tokens: 77348208640 | elapsed time per iteration (s): 0.09 | learning rate: 2.996E-05 | global batch size: 256 | lm loss: 4.507837E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2833.354 | TFLOPs: 10.54 | 7: iteration 147540/ 173500 | consumed samples: 37770240 | consumed tokens: 77353451520 | elapsed time per iteration (s): 0.12 | learning rate: 2.996E-05 | global batch size: 256 | lm loss: 4.507652E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2091.407 | TFLOPs: 7.78 | 7: iteration 147550/ 173500 | consumed samples: 37772800 | consumed tokens: 77358694400 | elapsed time per iteration (s): 0.11 | learning rate: 2.995E-05 | global batch size: 256 | lm loss: 4.504064E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2285.859 | TFLOPs: 8.50 | 7: iteration 147560/ 173500 | consumed samples: 37775360 | consumed tokens: 77363937280 | elapsed time per iteration (s): 0.12 | learning rate: 2.994E-05 | global batch size: 256 | lm loss: 4.503719E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.260 | TFLOPs: 7.70 | 7: iteration 147570/ 173500 | consumed samples: 37777920 | consumed tokens: 77369180160 | elapsed time per iteration (s): 0.12 | learning rate: 2.993E-05 | global batch size: 256 | lm loss: 4.498972E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2133.026 | TFLOPs: 7.93 | 7: iteration 147580/ 173500 | consumed samples: 37780480 | consumed tokens: 77374423040 | elapsed time per iteration (s): 0.12 | learning rate: 2.993E-05 | global batch size: 256 | lm loss: 4.503922E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2133.725 | TFLOPs: 7.94 | 7: iteration 147590/ 173500 | consumed samples: 37783040 | consumed tokens: 77379665920 | elapsed time per iteration (s): 0.11 | learning rate: 2.992E-05 | global batch size: 256 | lm loss: 4.512865E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2361.603 | TFLOPs: 8.78 | 7: iteration 147600/ 173500 | consumed samples: 37785600 | consumed tokens: 77384908800 | elapsed time per iteration (s): 0.11 | learning rate: 2.991E-05 | global batch size: 256 | lm loss: 4.508571E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.112 | TFLOPs: 8.72 | 7: iteration 147610/ 173500 | consumed samples: 37788160 | consumed tokens: 77390151680 | elapsed time per iteration (s): 0.11 | learning rate: 2.990E-05 | global batch size: 256 | lm loss: 4.524582E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.629 | TFLOPs: 8.82 | 7: iteration 147620/ 173500 | consumed samples: 37790720 | consumed tokens: 77395394560 | elapsed time per iteration (s): 0.11 | learning rate: 2.990E-05 | global batch size: 256 | lm loss: 4.512804E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2344.207 | TFLOPs: 8.72 | 7: iteration 147630/ 173500 | consumed samples: 37793280 | consumed tokens: 77400637440 | elapsed time per iteration (s): 0.11 | learning rate: 2.989E-05 | global batch size: 256 | lm loss: 4.500233E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2397.189 | TFLOPs: 8.92 | 7: iteration 147640/ 173500 | consumed samples: 37795840 | consumed tokens: 77405880320 | elapsed time per iteration (s): 0.11 | learning rate: 2.988E-05 | global batch size: 256 | lm loss: 4.507926E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.292 | TFLOPs: 8.82 | 7: iteration 147650/ 173500 | consumed samples: 37798400 | consumed tokens: 77411123200 | elapsed time per iteration (s): 0.11 | learning rate: 2.987E-05 | global batch size: 256 | lm loss: 4.508553E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2415.025 | TFLOPs: 8.98 | 7: iteration 147660/ 173500 | consumed samples: 37800960 | consumed tokens: 77416366080 | elapsed time per iteration (s): 0.11 | learning rate: 2.987E-05 | global batch size: 256 | lm loss: 4.520841E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2353.314 | TFLOPs: 8.75 | 7: iteration 147670/ 173500 | consumed samples: 37803520 | consumed tokens: 77421608960 | elapsed time per iteration (s): 0.11 | learning rate: 2.986E-05 | global batch size: 256 | lm loss: 4.512462E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.113 | TFLOPs: 8.82 | 7: iteration 147680/ 173500 | consumed samples: 37806080 | consumed tokens: 77426851840 | elapsed time per iteration (s): 0.09 | learning rate: 2.985E-05 | global batch size: 256 | lm loss: 4.507012E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2811.299 | TFLOPs: 10.46 | 7: iteration 147690/ 173500 | consumed samples: 37808640 | consumed tokens: 77432094720 | elapsed time per iteration (s): 0.08 | learning rate: 2.984E-05 | global batch size: 256 | lm loss: 4.526944E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3067.177 | TFLOPs: 11.41 | 7: iteration 147700/ 173500 | consumed samples: 37811200 | consumed tokens: 77437337600 | elapsed time per iteration (s): 0.09 | learning rate: 2.984E-05 | global batch size: 256 | lm loss: 4.508513E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2945.845 | TFLOPs: 10.96 | 7: iteration 147710/ 173500 | consumed samples: 37813760 | consumed tokens: 77442580480 | elapsed time per iteration (s): 0.08 | learning rate: 2.983E-05 | global batch size: 256 | lm loss: 4.506268E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.377 | TFLOPs: 11.80 | 7: iteration 147720/ 173500 | consumed samples: 37816320 | consumed tokens: 77447823360 | elapsed time per iteration (s): 0.11 | learning rate: 2.982E-05 | global batch size: 256 | lm loss: 4.510334E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.915 | TFLOPs: 8.82 | 7: iteration 147730/ 173500 | consumed samples: 37818880 | consumed tokens: 77453066240 | elapsed time per iteration (s): 0.08 | learning rate: 2.981E-05 | global batch size: 256 | lm loss: 4.529654E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.554 | TFLOPs: 11.48 | 7: iteration 147740/ 173500 | consumed samples: 37821440 | consumed tokens: 77458309120 | elapsed time per iteration (s): 0.08 | learning rate: 2.981E-05 | global batch size: 256 | lm loss: 4.498714E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.827 | TFLOPs: 11.91 | 7: iteration 147750/ 173500 | consumed samples: 37824000 | consumed tokens: 77463552000 | elapsed time per iteration (s): 0.08 | learning rate: 2.980E-05 | global batch size: 256 | lm loss: 4.503886E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.461 | TFLOPs: 11.92 | 7: iteration 147760/ 173500 | consumed samples: 37826560 | consumed tokens: 77468794880 | elapsed time per iteration (s): 0.08 | learning rate: 2.979E-05 | global batch size: 256 | lm loss: 4.499412E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.544 | TFLOPs: 11.73 | 7: iteration 147770/ 173500 | consumed samples: 37829120 | consumed tokens: 77474037760 | elapsed time per iteration (s): 0.08 | learning rate: 2.978E-05 | global batch size: 256 | lm loss: 4.505164E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.617 | TFLOPs: 11.94 | 7: iteration 147780/ 173500 | consumed samples: 37831680 | consumed tokens: 77479280640 | elapsed time per iteration (s): 0.08 | learning rate: 2.978E-05 | global batch size: 256 | lm loss: 4.522366E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.622 | TFLOPs: 12.02 | 7: iteration 147790/ 173500 | consumed samples: 37834240 | consumed tokens: 77484523520 | elapsed time per iteration (s): 0.08 | learning rate: 2.977E-05 | global batch size: 256 | lm loss: 4.497547E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.758 | TFLOPs: 12.02 | 7: iteration 147800/ 173500 | consumed samples: 37836800 | consumed tokens: 77489766400 | elapsed time per iteration (s): 0.08 | learning rate: 2.976E-05 | global batch size: 256 | lm loss: 4.506459E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.903 | TFLOPs: 12.04 | 7: iteration 147810/ 173500 | consumed samples: 37839360 | consumed tokens: 77495009280 | elapsed time per iteration (s): 0.08 | learning rate: 2.975E-05 | global batch size: 256 | lm loss: 4.499421E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.848 | TFLOPs: 12.00 | 7: iteration 147820/ 173500 | consumed samples: 37841920 | consumed tokens: 77500252160 | elapsed time per iteration (s): 0.08 | learning rate: 2.975E-05 | global batch size: 256 | lm loss: 4.515001E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.407 | TFLOPs: 12.00 | 7: iteration 147830/ 173500 | consumed samples: 37844480 | consumed tokens: 77505495040 | elapsed time per iteration (s): 0.08 | learning rate: 2.974E-05 | global batch size: 256 | lm loss: 4.502261E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.483 | TFLOPs: 11.98 | 7: iteration 147840/ 173500 | consumed samples: 37847040 | consumed tokens: 77510737920 | elapsed time per iteration (s): 0.08 | learning rate: 2.973E-05 | global batch size: 256 | lm loss: 4.496812E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.879 | TFLOPs: 12.00 | 7: iteration 147850/ 173500 | consumed samples: 37849600 | consumed tokens: 77515980800 | elapsed time per iteration (s): 0.08 | learning rate: 2.972E-05 | global batch size: 256 | lm loss: 4.489700E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.737 | TFLOPs: 11.56 | 7: iteration 147860/ 173500 | consumed samples: 37852160 | consumed tokens: 77521223680 | elapsed time per iteration (s): 0.08 | learning rate: 2.972E-05 | global batch size: 256 | lm loss: 4.507863E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.673 | TFLOPs: 12.02 | 7: iteration 147870/ 173500 | consumed samples: 37854720 | consumed tokens: 77526466560 | elapsed time per iteration (s): 0.08 | learning rate: 2.971E-05 | global batch size: 256 | lm loss: 4.514530E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.970 | TFLOPs: 12.03 | 7: iteration 147880/ 173500 | consumed samples: 37857280 | consumed tokens: 77531709440 | elapsed time per iteration (s): 0.08 | learning rate: 2.970E-05 | global batch size: 256 | lm loss: 4.496395E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.533 | TFLOPs: 12.06 | 7: iteration 147890/ 173500 | consumed samples: 37859840 | consumed tokens: 77536952320 | elapsed time per iteration (s): 0.08 | learning rate: 2.969E-05 | global batch size: 256 | lm loss: 4.504561E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.397 | TFLOPs: 11.96 | 7: iteration 147900/ 173500 | consumed samples: 37862400 | consumed tokens: 77542195200 | elapsed time per iteration (s): 0.08 | learning rate: 2.969E-05 | global batch size: 256 | lm loss: 4.503839E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.950 | TFLOPs: 12.03 | 7: iteration 147910/ 173500 | consumed samples: 37864960 | consumed tokens: 77547438080 | elapsed time per iteration (s): 0.08 | learning rate: 2.968E-05 | global batch size: 256 | lm loss: 4.515979E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.528 | TFLOPs: 12.05 | 7: iteration 147920/ 173500 | consumed samples: 37867520 | consumed tokens: 77552680960 | elapsed time per iteration (s): 0.08 | learning rate: 2.967E-05 | global batch size: 256 | lm loss: 4.516094E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.863 | TFLOPs: 12.08 | 7: iteration 147930/ 173500 | consumed samples: 37870080 | consumed tokens: 77557923840 | elapsed time per iteration (s): 0.08 | learning rate: 2.966E-05 | global batch size: 256 | lm loss: 4.508685E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.288 | TFLOPs: 12.02 | 7: iteration 147940/ 173500 | consumed samples: 37872640 | consumed tokens: 77563166720 | elapsed time per iteration (s): 0.08 | learning rate: 2.966E-05 | global batch size: 256 | lm loss: 4.504678E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.979 | TFLOPs: 12.01 | 7: iteration 147950/ 173500 | consumed samples: 37875200 | consumed tokens: 77568409600 | elapsed time per iteration (s): 0.08 | learning rate: 2.965E-05 | global batch size: 256 | lm loss: 4.507358E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.161 | TFLOPs: 12.00 | 7: iteration 147960/ 173500 | consumed samples: 37877760 | consumed tokens: 77573652480 | elapsed time per iteration (s): 0.08 | learning rate: 2.964E-05 | global batch size: 256 | lm loss: 4.491549E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.290 | TFLOPs: 11.80 | 7: iteration 147970/ 173500 | consumed samples: 37880320 | consumed tokens: 77578895360 | elapsed time per iteration (s): 0.08 | learning rate: 2.964E-05 | global batch size: 256 | lm loss: 4.507178E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.832 | TFLOPs: 11.91 | 7: iteration 147980/ 173500 | consumed samples: 37882880 | consumed tokens: 77584138240 | elapsed time per iteration (s): 0.08 | learning rate: 2.963E-05 | global batch size: 256 | lm loss: 4.513465E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.545 | TFLOPs: 11.83 | 7: iteration 147990/ 173500 | consumed samples: 37885440 | consumed tokens: 77589381120 | elapsed time per iteration (s): 0.08 | learning rate: 2.962E-05 | global batch size: 256 | lm loss: 4.501394E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.918 | TFLOPs: 11.57 | 0: [2023-03-17 03:53:25,451] [INFO] [logging.py:68:log_dist] [Rank 0] step=148000, skipped=0, lr=[2.9612854264054498e-05, 2.9612854264054498e-05, 2.9612854264054498e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 148000/ 173500 | consumed samples: 37888000 | consumed tokens: 77594624000 | elapsed time per iteration (s): 0.09 | learning rate: 2.961E-05 | global batch size: 256 | lm loss: 4.510580E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.971 | TFLOPs: 11.08 | 0: steps: 148000 loss: 4.5144 iter time (s): 0.085 samples/sec: 3014.392 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 148000 | lm loss value: 4.383156E+00 | lm loss PPL: 8.009042E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 148000 to checkpoints_14m91b100m 0: [2023-03-17 03:53:25,555] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step148000 is begin to save! 0: [2023-03-17 03:53:25,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:53:25,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:53:25,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:53:25,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:53:25,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:53:25,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:53:25,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:53:25,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:53:25,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:53:25,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:53:25,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:53:25,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:53:25,599] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step148000/mp_rank_00_model_states.pt 0: [2023-03-17 03:53:25,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:53:25,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:53:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:53:25,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:53:25,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:53:25,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:53:25,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 2: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 5: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 7: [2023-03-17 03:53:25,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 3: [2023-03-17 03:53:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:53:25,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:53:25,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 1: [2023-03-17 03:53:25,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:53:25,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 03:53:25,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 4: [2023-03-17 03:53:25,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:53:25,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:53:25,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:53:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step148000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:53:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step148000 is ready now! 0: successfully saved checkpoint at iteration 148000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 143.74 7: iteration 148010/ 173500 | consumed samples: 37890560 | consumed tokens: 77599866880 | elapsed time per iteration (s): 0.10 | learning rate: 2.961E-05 | global batch size: 256 | lm loss: 4.503948E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.427 | TFLOPs: 9.17 | 7: iteration 148020/ 173500 | consumed samples: 37893120 | consumed tokens: 77605109760 | elapsed time per iteration (s): 0.08 | learning rate: 2.960E-05 | global batch size: 256 | lm loss: 4.517058E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.568 | TFLOPs: 11.53 | 7: iteration 148030/ 173500 | consumed samples: 37895680 | consumed tokens: 77610352640 | elapsed time per iteration (s): 0.08 | learning rate: 2.959E-05 | global batch size: 256 | lm loss: 4.501025E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.268 | TFLOPs: 11.80 | 7: iteration 148040/ 173500 | consumed samples: 37898240 | consumed tokens: 77615595520 | elapsed time per iteration (s): 0.08 | learning rate: 2.958E-05 | global batch size: 256 | lm loss: 4.515872E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.436 | TFLOPs: 11.85 | 7: iteration 148050/ 173500 | consumed samples: 37900800 | consumed tokens: 77620838400 | elapsed time per iteration (s): 0.08 | learning rate: 2.958E-05 | global batch size: 256 | lm loss: 4.513564E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.596 | TFLOPs: 11.83 | 7: iteration 148060/ 173500 | consumed samples: 37903360 | consumed tokens: 77626081280 | elapsed time per iteration (s): 0.08 | learning rate: 2.957E-05 | global batch size: 256 | lm loss: 4.506021E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.022 | TFLOPs: 11.72 | 7: iteration 148070/ 173500 | consumed samples: 37905920 | consumed tokens: 77631324160 | elapsed time per iteration (s): 0.08 | learning rate: 2.956E-05 | global batch size: 256 | lm loss: 4.503214E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.981 | TFLOPs: 11.83 | 7: iteration 148080/ 173500 | consumed samples: 37908480 | consumed tokens: 77636567040 | elapsed time per iteration (s): 0.08 | learning rate: 2.955E-05 | global batch size: 256 | lm loss: 4.513344E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.844 | TFLOPs: 11.85 | 7: iteration 148090/ 173500 | consumed samples: 37911040 | consumed tokens: 77641809920 | elapsed time per iteration (s): 0.08 | learning rate: 2.955E-05 | global batch size: 256 | lm loss: 4.516269E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.686 | TFLOPs: 11.85 | 7: iteration 148100/ 173500 | consumed samples: 37913600 | consumed tokens: 77647052800 | elapsed time per iteration (s): 0.08 | learning rate: 2.954E-05 | global batch size: 256 | lm loss: 4.505449E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.371 | TFLOPs: 11.83 | 7: iteration 148110/ 173500 | consumed samples: 37916160 | consumed tokens: 77652295680 | elapsed time per iteration (s): 0.08 | learning rate: 2.953E-05 | global batch size: 256 | lm loss: 4.509892E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.627 | TFLOPs: 11.85 | 7: iteration 148120/ 173500 | consumed samples: 37918720 | consumed tokens: 77657538560 | elapsed time per iteration (s): 0.08 | learning rate: 2.952E-05 | global batch size: 256 | lm loss: 4.510686E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.532 | TFLOPs: 11.83 | 7: iteration 148130/ 173500 | consumed samples: 37921280 | consumed tokens: 77662781440 | elapsed time per iteration (s): 0.08 | learning rate: 2.952E-05 | global batch size: 256 | lm loss: 4.502405E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.609 | TFLOPs: 11.83 | 7: iteration 148140/ 173500 | consumed samples: 37923840 | consumed tokens: 77668024320 | elapsed time per iteration (s): 0.08 | learning rate: 2.951E-05 | global batch size: 256 | lm loss: 4.497909E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.793 | TFLOPs: 11.56 | 7: iteration 148150/ 173500 | consumed samples: 37926400 | consumed tokens: 77673267200 | elapsed time per iteration (s): 0.08 | learning rate: 2.950E-05 | global batch size: 256 | lm loss: 4.506568E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.339 | TFLOPs: 11.83 | 7: iteration 148160/ 173500 | consumed samples: 37928960 | consumed tokens: 77678510080 | elapsed time per iteration (s): 0.08 | learning rate: 2.949E-05 | global batch size: 256 | lm loss: 4.499762E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.185 | TFLOPs: 11.82 | 7: iteration 148170/ 173500 | consumed samples: 37931520 | consumed tokens: 77683752960 | elapsed time per iteration (s): 0.08 | learning rate: 2.949E-05 | global batch size: 256 | lm loss: 4.507032E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.910 | TFLOPs: 11.75 | 7: iteration 148180/ 173500 | consumed samples: 37934080 | consumed tokens: 77688995840 | elapsed time per iteration (s): 0.08 | learning rate: 2.948E-05 | global batch size: 256 | lm loss: 4.511288E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.167 | TFLOPs: 11.58 | 7: iteration 148190/ 173500 | consumed samples: 37936640 | consumed tokens: 77694238720 | elapsed time per iteration (s): 0.08 | learning rate: 2.947E-05 | global batch size: 256 | lm loss: 4.515201E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.571 | TFLOPs: 11.82 | 7: iteration 148200/ 173500 | consumed samples: 37939200 | consumed tokens: 77699481600 | elapsed time per iteration (s): 0.08 | learning rate: 2.947E-05 | global batch size: 256 | lm loss: 4.505045E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.963 | TFLOPs: 11.84 | 7: iteration 148210/ 173500 | consumed samples: 37941760 | consumed tokens: 77704724480 | elapsed time per iteration (s): 0.08 | learning rate: 2.946E-05 | global batch size: 256 | lm loss: 4.501273E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.724 | TFLOPs: 11.79 | 7: iteration 148220/ 173500 | consumed samples: 37944320 | consumed tokens: 77709967360 | elapsed time per iteration (s): 0.08 | learning rate: 2.945E-05 | global batch size: 256 | lm loss: 4.493368E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.118 | TFLOPs: 11.56 | 7: iteration 148230/ 173500 | consumed samples: 37946880 | consumed tokens: 77715210240 | elapsed time per iteration (s): 0.08 | learning rate: 2.944E-05 | global batch size: 256 | lm loss: 4.503165E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.165 | TFLOPs: 11.55 | 7: iteration 148240/ 173500 | consumed samples: 37949440 | consumed tokens: 77720453120 | elapsed time per iteration (s): 0.08 | learning rate: 2.944E-05 | global batch size: 256 | lm loss: 4.502296E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.738 | TFLOPs: 11.55 | 7: iteration 148250/ 173500 | consumed samples: 37952000 | consumed tokens: 77725696000 | elapsed time per iteration (s): 0.08 | learning rate: 2.943E-05 | global batch size: 256 | lm loss: 4.498482E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.608 | TFLOPs: 11.85 | 7: iteration 148260/ 173500 | consumed samples: 37954560 | consumed tokens: 77730938880 | elapsed time per iteration (s): 0.08 | learning rate: 2.942E-05 | global batch size: 256 | lm loss: 4.500323E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.849 | TFLOPs: 11.85 | 7: iteration 148270/ 173500 | consumed samples: 37957120 | consumed tokens: 77736181760 | elapsed time per iteration (s): 0.08 | learning rate: 2.941E-05 | global batch size: 256 | lm loss: 4.496007E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.725 | TFLOPs: 11.82 | 7: iteration 148280/ 173500 | consumed samples: 37959680 | consumed tokens: 77741424640 | elapsed time per iteration (s): 0.09 | learning rate: 2.941E-05 | global batch size: 256 | lm loss: 4.514147E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.690 | TFLOPs: 10.82 | 7: iteration 148290/ 173500 | consumed samples: 37962240 | consumed tokens: 77746667520 | elapsed time per iteration (s): 0.08 | learning rate: 2.940E-05 | global batch size: 256 | lm loss: 4.504608E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.464 | TFLOPs: 11.84 | 7: iteration 148300/ 173500 | consumed samples: 37964800 | consumed tokens: 77751910400 | elapsed time per iteration (s): 0.08 | learning rate: 2.939E-05 | global batch size: 256 | lm loss: 4.505838E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.098 | TFLOPs: 11.61 | 7: iteration 148310/ 173500 | consumed samples: 37967360 | consumed tokens: 77757153280 | elapsed time per iteration (s): 0.08 | learning rate: 2.938E-05 | global batch size: 256 | lm loss: 4.507591E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.799 | TFLOPs: 11.89 | 7: iteration 148320/ 173500 | consumed samples: 37969920 | consumed tokens: 77762396160 | elapsed time per iteration (s): 0.08 | learning rate: 2.938E-05 | global batch size: 256 | lm loss: 4.507922E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.144 | TFLOPs: 11.83 | 7: iteration 148330/ 173500 | consumed samples: 37972480 | consumed tokens: 77767639040 | elapsed time per iteration (s): 0.09 | learning rate: 2.937E-05 | global batch size: 256 | lm loss: 4.486999E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2971.889 | TFLOPs: 11.05 | 7: iteration 148340/ 173500 | consumed samples: 37975040 | consumed tokens: 77772881920 | elapsed time per iteration (s): 0.08 | learning rate: 2.936E-05 | global batch size: 256 | lm loss: 4.509693E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.330 | TFLOPs: 11.80 | 7: iteration 148350/ 173500 | consumed samples: 37977600 | consumed tokens: 77778124800 | elapsed time per iteration (s): 0.08 | learning rate: 2.936E-05 | global batch size: 256 | lm loss: 4.499935E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.709 | TFLOPs: 11.72 | 7: iteration 148360/ 173500 | consumed samples: 37980160 | consumed tokens: 77783367680 | elapsed time per iteration (s): 0.08 | learning rate: 2.935E-05 | global batch size: 256 | lm loss: 4.504551E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.773 | TFLOPs: 11.78 | 7: iteration 148370/ 173500 | consumed samples: 37982720 | consumed tokens: 77788610560 | elapsed time per iteration (s): 0.09 | learning rate: 2.934E-05 | global batch size: 256 | lm loss: 4.501572E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2904.781 | TFLOPs: 10.80 | 7: iteration 148380/ 173500 | consumed samples: 37985280 | consumed tokens: 77793853440 | elapsed time per iteration (s): 0.08 | learning rate: 2.933E-05 | global batch size: 256 | lm loss: 4.508762E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.593 | TFLOPs: 11.29 | 7: iteration 148390/ 173500 | consumed samples: 37987840 | consumed tokens: 77799096320 | elapsed time per iteration (s): 0.08 | learning rate: 2.933E-05 | global batch size: 256 | lm loss: 4.519004E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.917 | TFLOPs: 11.81 | 7: iteration 148400/ 173500 | consumed samples: 37990400 | consumed tokens: 77804339200 | elapsed time per iteration (s): 0.08 | learning rate: 2.932E-05 | global batch size: 256 | lm loss: 4.507832E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.152 | TFLOPs: 11.55 | 7: iteration 148410/ 173500 | consumed samples: 37992960 | consumed tokens: 77809582080 | elapsed time per iteration (s): 0.08 | learning rate: 2.931E-05 | global batch size: 256 | lm loss: 4.500320E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.497 | TFLOPs: 11.82 | 7: iteration 148420/ 173500 | consumed samples: 37995520 | consumed tokens: 77814824960 | elapsed time per iteration (s): 0.08 | learning rate: 2.930E-05 | global batch size: 256 | lm loss: 4.505163E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.592 | TFLOPs: 11.82 | 7: iteration 148430/ 173500 | consumed samples: 37998080 | consumed tokens: 77820067840 | elapsed time per iteration (s): 0.08 | learning rate: 2.930E-05 | global batch size: 256 | lm loss: 4.505827E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.302 | TFLOPs: 11.85 | 7: iteration 148440/ 173500 | consumed samples: 38000640 | consumed tokens: 77825310720 | elapsed time per iteration (s): 0.08 | learning rate: 2.929E-05 | global batch size: 256 | lm loss: 4.501238E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.774 | TFLOPs: 11.92 | 7: iteration 148450/ 173500 | consumed samples: 38003200 | consumed tokens: 77830553600 | elapsed time per iteration (s): 0.08 | learning rate: 2.928E-05 | global batch size: 256 | lm loss: 4.502280E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.400 | TFLOPs: 11.60 | 7: iteration 148460/ 173500 | consumed samples: 38005760 | consumed tokens: 77835796480 | elapsed time per iteration (s): 0.08 | learning rate: 2.928E-05 | global batch size: 256 | lm loss: 4.507794E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.641 | TFLOPs: 11.60 | 7: iteration 148470/ 173500 | consumed samples: 38008320 | consumed tokens: 77841039360 | elapsed time per iteration (s): 0.08 | learning rate: 2.927E-05 | global batch size: 256 | lm loss: 4.504655E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.131 | TFLOPs: 11.88 | 7: iteration 148480/ 173500 | consumed samples: 38010880 | consumed tokens: 77846282240 | elapsed time per iteration (s): 0.08 | learning rate: 2.926E-05 | global batch size: 256 | lm loss: 4.503912E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.837 | TFLOPs: 11.89 | 7: iteration 148490/ 173500 | consumed samples: 38013440 | consumed tokens: 77851525120 | elapsed time per iteration (s): 0.08 | learning rate: 2.925E-05 | global batch size: 256 | lm loss: 4.497422E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.203 | TFLOPs: 11.83 | 7: iteration 148500/ 173500 | consumed samples: 38016000 | consumed tokens: 77856768000 | elapsed time per iteration (s): 0.08 | learning rate: 2.925E-05 | global batch size: 256 | lm loss: 4.503745E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.399 | TFLOPs: 11.90 | 7: iteration 148510/ 173500 | consumed samples: 38018560 | consumed tokens: 77862010880 | elapsed time per iteration (s): 0.08 | learning rate: 2.924E-05 | global batch size: 256 | lm loss: 4.496257E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.492 | TFLOPs: 11.87 | 7: iteration 148520/ 173500 | consumed samples: 38021120 | consumed tokens: 77867253760 | elapsed time per iteration (s): 0.08 | learning rate: 2.923E-05 | global batch size: 256 | lm loss: 4.500777E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.872 | TFLOPs: 11.71 | 7: iteration 148530/ 173500 | consumed samples: 38023680 | consumed tokens: 77872496640 | elapsed time per iteration (s): 0.09 | learning rate: 2.922E-05 | global batch size: 256 | lm loss: 4.502528E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2843.634 | TFLOPs: 10.58 | 7: iteration 148540/ 173500 | consumed samples: 38026240 | consumed tokens: 77877739520 | elapsed time per iteration (s): 0.08 | learning rate: 2.922E-05 | global batch size: 256 | lm loss: 4.512849E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.818 | TFLOPs: 11.30 | 7: iteration 148550/ 173500 | consumed samples: 38028800 | consumed tokens: 77882982400 | elapsed time per iteration (s): 0.08 | learning rate: 2.921E-05 | global batch size: 256 | lm loss: 4.503542E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.084 | TFLOPs: 11.90 | 7: iteration 148560/ 173500 | consumed samples: 38031360 | consumed tokens: 77888225280 | elapsed time per iteration (s): 0.08 | learning rate: 2.920E-05 | global batch size: 256 | lm loss: 4.501272E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.389 | TFLOPs: 11.92 | 7: iteration 148570/ 173500 | consumed samples: 38033920 | consumed tokens: 77893468160 | elapsed time per iteration (s): 0.08 | learning rate: 2.920E-05 | global batch size: 256 | lm loss: 4.506865E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.464 | TFLOPs: 11.90 | 7: iteration 148580/ 173500 | consumed samples: 38036480 | consumed tokens: 77898711040 | elapsed time per iteration (s): 0.09 | learning rate: 2.919E-05 | global batch size: 256 | lm loss: 4.506169E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.334 | TFLOPs: 10.04 | 7: iteration 148590/ 173500 | consumed samples: 38039040 | consumed tokens: 77903953920 | elapsed time per iteration (s): 0.08 | learning rate: 2.918E-05 | global batch size: 256 | lm loss: 4.521885E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.183 | TFLOPs: 11.85 | 7: iteration 148600/ 173500 | consumed samples: 38041600 | consumed tokens: 77909196800 | elapsed time per iteration (s): 0.08 | learning rate: 2.917E-05 | global batch size: 256 | lm loss: 4.511356E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.545 | TFLOPs: 11.74 | 7: iteration 148610/ 173500 | consumed samples: 38044160 | consumed tokens: 77914439680 | elapsed time per iteration (s): 0.08 | learning rate: 2.917E-05 | global batch size: 256 | lm loss: 4.508297E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.443 | TFLOPs: 11.86 | 7: iteration 148620/ 173500 | consumed samples: 38046720 | consumed tokens: 77919682560 | elapsed time per iteration (s): 0.08 | learning rate: 2.916E-05 | global batch size: 256 | lm loss: 4.512100E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.186 | TFLOPs: 11.87 | 7: iteration 148630/ 173500 | consumed samples: 38049280 | consumed tokens: 77924925440 | elapsed time per iteration (s): 0.08 | learning rate: 2.915E-05 | global batch size: 256 | lm loss: 4.495636E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.206 | TFLOPs: 11.87 | 7: iteration 148640/ 173500 | consumed samples: 38051840 | consumed tokens: 77930168320 | elapsed time per iteration (s): 0.08 | learning rate: 2.914E-05 | global batch size: 256 | lm loss: 4.511712E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.000 | TFLOPs: 11.76 | 7: iteration 148650/ 173500 | consumed samples: 38054400 | consumed tokens: 77935411200 | elapsed time per iteration (s): 0.08 | learning rate: 2.914E-05 | global batch size: 256 | lm loss: 4.498388E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.819 | TFLOPs: 11.92 | 7: iteration 148660/ 173500 | consumed samples: 38056960 | consumed tokens: 77940654080 | elapsed time per iteration (s): 0.08 | learning rate: 2.913E-05 | global batch size: 256 | lm loss: 4.496870E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.525 | TFLOPs: 11.89 | 7: iteration 148670/ 173500 | consumed samples: 38059520 | consumed tokens: 77945896960 | elapsed time per iteration (s): 0.08 | learning rate: 2.912E-05 | global batch size: 256 | lm loss: 4.507232E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.351 | TFLOPs: 11.88 | 7: iteration 148680/ 173500 | consumed samples: 38062080 | consumed tokens: 77951139840 | elapsed time per iteration (s): 0.08 | learning rate: 2.912E-05 | global batch size: 256 | lm loss: 4.503437E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.589 | TFLOPs: 11.85 | 7: iteration 148690/ 173500 | consumed samples: 38064640 | consumed tokens: 77956382720 | elapsed time per iteration (s): 0.08 | learning rate: 2.911E-05 | global batch size: 256 | lm loss: 4.520655E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.339 | TFLOPs: 11.84 | 7: iteration 148700/ 173500 | consumed samples: 38067200 | consumed tokens: 77961625600 | elapsed time per iteration (s): 0.08 | learning rate: 2.910E-05 | global batch size: 256 | lm loss: 4.502670E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.002 | TFLOPs: 11.77 | 7: iteration 148710/ 173500 | consumed samples: 38069760 | consumed tokens: 77966868480 | elapsed time per iteration (s): 0.08 | learning rate: 2.909E-05 | global batch size: 256 | lm loss: 4.499285E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.815 | TFLOPs: 11.83 | 7: iteration 148720/ 173500 | consumed samples: 38072320 | consumed tokens: 77972111360 | elapsed time per iteration (s): 0.08 | learning rate: 2.909E-05 | global batch size: 256 | lm loss: 4.508159E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.141 | TFLOPs: 11.80 | 7: iteration 148730/ 173500 | consumed samples: 38074880 | consumed tokens: 77977354240 | elapsed time per iteration (s): 0.10 | learning rate: 2.908E-05 | global batch size: 256 | lm loss: 4.506809E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2465.579 | TFLOPs: 9.17 | 7: iteration 148740/ 173500 | consumed samples: 38077440 | consumed tokens: 77982597120 | elapsed time per iteration (s): 0.09 | learning rate: 2.907E-05 | global batch size: 256 | lm loss: 4.521968E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.255 | TFLOPs: 10.70 | 7: iteration 148750/ 173500 | consumed samples: 38080000 | consumed tokens: 77987840000 | elapsed time per iteration (s): 0.08 | learning rate: 2.907E-05 | global batch size: 256 | lm loss: 4.506967E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.573 | TFLOPs: 11.84 | 7: iteration 148760/ 173500 | consumed samples: 38082560 | consumed tokens: 77993082880 | elapsed time per iteration (s): 0.08 | learning rate: 2.906E-05 | global batch size: 256 | lm loss: 4.503626E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.311 | TFLOPs: 11.80 | 7: iteration 148770/ 173500 | consumed samples: 38085120 | consumed tokens: 77998325760 | elapsed time per iteration (s): 0.08 | learning rate: 2.905E-05 | global batch size: 256 | lm loss: 4.501022E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.621 | TFLOPs: 11.84 | 7: iteration 148780/ 173500 | consumed samples: 38087680 | consumed tokens: 78003568640 | elapsed time per iteration (s): 0.08 | learning rate: 2.904E-05 | global batch size: 256 | lm loss: 4.490849E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.122 | TFLOPs: 11.60 | 7: iteration 148790/ 173500 | consumed samples: 38090240 | consumed tokens: 78008811520 | elapsed time per iteration (s): 0.08 | learning rate: 2.904E-05 | global batch size: 256 | lm loss: 4.495285E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.407 | TFLOPs: 11.70 | 7: iteration 148800/ 173500 | consumed samples: 38092800 | consumed tokens: 78014054400 | elapsed time per iteration (s): 0.08 | learning rate: 2.903E-05 | global batch size: 256 | lm loss: 4.505873E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.189 | TFLOPs: 11.80 | 7: iteration 148810/ 173500 | consumed samples: 38095360 | consumed tokens: 78019297280 | elapsed time per iteration (s): 0.08 | learning rate: 2.902E-05 | global batch size: 256 | lm loss: 4.500652E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.576 | TFLOPs: 11.89 | 7: iteration 148820/ 173500 | consumed samples: 38097920 | consumed tokens: 78024540160 | elapsed time per iteration (s): 0.08 | learning rate: 2.901E-05 | global batch size: 256 | lm loss: 4.491529E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.251 | TFLOPs: 11.91 | 7: iteration 148830/ 173500 | consumed samples: 38100480 | consumed tokens: 78029783040 | elapsed time per iteration (s): 0.08 | learning rate: 2.901E-05 | global batch size: 256 | lm loss: 4.517056E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.936 | TFLOPs: 11.83 | 7: iteration 148840/ 173500 | consumed samples: 38103040 | consumed tokens: 78035025920 | elapsed time per iteration (s): 0.08 | learning rate: 2.900E-05 | global batch size: 256 | lm loss: 4.494670E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.363 | TFLOPs: 11.87 | 7: iteration 148850/ 173500 | consumed samples: 38105600 | consumed tokens: 78040268800 | elapsed time per iteration (s): 0.08 | learning rate: 2.899E-05 | global batch size: 256 | lm loss: 4.510342E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.047 | TFLOPs: 11.80 | 7: iteration 148860/ 173500 | consumed samples: 38108160 | consumed tokens: 78045511680 | elapsed time per iteration (s): 0.08 | learning rate: 2.899E-05 | global batch size: 256 | lm loss: 4.506696E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.666 | TFLOPs: 11.89 | 7: iteration 148870/ 173500 | consumed samples: 38110720 | consumed tokens: 78050754560 | elapsed time per iteration (s): 0.08 | learning rate: 2.898E-05 | global batch size: 256 | lm loss: 4.507003E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.791 | TFLOPs: 11.80 | 7: iteration 148880/ 173500 | consumed samples: 38113280 | consumed tokens: 78055997440 | elapsed time per iteration (s): 0.08 | learning rate: 2.897E-05 | global batch size: 256 | lm loss: 4.518075E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.376 | TFLOPs: 11.84 | 7: iteration 148890/ 173500 | consumed samples: 38115840 | consumed tokens: 78061240320 | elapsed time per iteration (s): 0.08 | learning rate: 2.896E-05 | global batch size: 256 | lm loss: 4.507603E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.972 | TFLOPs: 11.85 | 7: iteration 148900/ 173500 | consumed samples: 38118400 | consumed tokens: 78066483200 | elapsed time per iteration (s): 0.08 | learning rate: 2.896E-05 | global batch size: 256 | lm loss: 4.508732E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.336 | TFLOPs: 11.80 | 7: iteration 148910/ 173500 | consumed samples: 38120960 | consumed tokens: 78071726080 | elapsed time per iteration (s): 0.08 | learning rate: 2.895E-05 | global batch size: 256 | lm loss: 4.497559E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.671 | TFLOPs: 11.81 | 7: iteration 148920/ 173500 | consumed samples: 38123520 | consumed tokens: 78076968960 | elapsed time per iteration (s): 0.08 | learning rate: 2.894E-05 | global batch size: 256 | lm loss: 4.495303E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.623 | TFLOPs: 11.72 | 7: iteration 148930/ 173500 | consumed samples: 38126080 | consumed tokens: 78082211840 | elapsed time per iteration (s): 0.08 | learning rate: 2.894E-05 | global batch size: 256 | lm loss: 4.506317E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3116.147 | TFLOPs: 11.59 | 7: iteration 148940/ 173500 | consumed samples: 38128640 | consumed tokens: 78087454720 | elapsed time per iteration (s): 0.08 | learning rate: 2.893E-05 | global batch size: 256 | lm loss: 4.515253E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.030 | TFLOPs: 11.84 | 7: iteration 148950/ 173500 | consumed samples: 38131200 | consumed tokens: 78092697600 | elapsed time per iteration (s): 0.08 | learning rate: 2.892E-05 | global batch size: 256 | lm loss: 4.500362E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.256 | TFLOPs: 11.88 | 7: iteration 148960/ 173500 | consumed samples: 38133760 | consumed tokens: 78097940480 | elapsed time per iteration (s): 0.08 | learning rate: 2.891E-05 | global batch size: 256 | lm loss: 4.501861E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.341 | TFLOPs: 11.86 | 7: iteration 148970/ 173500 | consumed samples: 38136320 | consumed tokens: 78103183360 | elapsed time per iteration (s): 0.09 | learning rate: 2.891E-05 | global batch size: 256 | lm loss: 4.505370E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2813.750 | TFLOPs: 10.47 | 7: iteration 148980/ 173500 | consumed samples: 38138880 | consumed tokens: 78108426240 | elapsed time per iteration (s): 0.08 | learning rate: 2.890E-05 | global batch size: 256 | lm loss: 4.506200E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.581 | TFLOPs: 11.74 | 7: iteration 148990/ 173500 | consumed samples: 38141440 | consumed tokens: 78113669120 | elapsed time per iteration (s): 0.08 | learning rate: 2.889E-05 | global batch size: 256 | lm loss: 4.499180E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.063 | TFLOPs: 11.84 | 7: iteration 149000/ 173500 | consumed samples: 38144000 | consumed tokens: 78118912000 | elapsed time per iteration (s): 0.08 | learning rate: 2.889E-05 | global batch size: 256 | lm loss: 4.491925E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.682 | TFLOPs: 11.80 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 149000 | lm loss value: 4.390550E+00 | lm loss PPL: 8.068479E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 149000 to checkpoints_14m91b100m 0: [2023-03-17 03:54:47,398] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step149000 is begin to save! 0: [2023-03-17 03:54:47,401] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:54:47,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:54:47,426] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:54:47,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:54:47,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:54:47,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:54:47,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:54:47,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:54:47,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:54:47,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:54:47,441] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:54:47,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:54:47,442] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step149000/mp_rank_00_model_states.pt 0: [2023-03-17 03:54:47,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:54:47,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:54:47,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:54:47,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:54:47,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 1: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:54:47,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:54:47,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 5: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 2: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 4: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 7: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 6: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step149000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 3: [2023-03-17 03:54:47,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step149000 is ready now! 0: successfully saved checkpoint at iteration 149000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.85 7: iteration 149010/ 173500 | consumed samples: 38146560 | consumed tokens: 78124154880 | elapsed time per iteration (s): 0.09 | learning rate: 2.888E-05 | global batch size: 256 | lm loss: 4.505157E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2798.558 | TFLOPs: 10.41 | 7: iteration 149020/ 173500 | consumed samples: 38149120 | consumed tokens: 78129397760 | elapsed time per iteration (s): 0.08 | learning rate: 2.887E-05 | global batch size: 256 | lm loss: 4.505107E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.689 | TFLOPs: 11.83 | 7: iteration 149030/ 173500 | consumed samples: 38151680 | consumed tokens: 78134640640 | elapsed time per iteration (s): 0.08 | learning rate: 2.886E-05 | global batch size: 256 | lm loss: 4.514863E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.115 | TFLOPs: 11.85 | 7: iteration 149040/ 173500 | consumed samples: 38154240 | consumed tokens: 78139883520 | elapsed time per iteration (s): 0.08 | learning rate: 2.886E-05 | global batch size: 256 | lm loss: 4.506870E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.622 | TFLOPs: 11.82 | 7: iteration 149050/ 173500 | consumed samples: 38156800 | consumed tokens: 78145126400 | elapsed time per iteration (s): 0.08 | learning rate: 2.885E-05 | global batch size: 256 | lm loss: 4.504654E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.358 | TFLOPs: 11.81 | 7: iteration 149060/ 173500 | consumed samples: 38159360 | consumed tokens: 78150369280 | elapsed time per iteration (s): 0.08 | learning rate: 2.884E-05 | global batch size: 256 | lm loss: 4.486724E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.235 | TFLOPs: 11.85 | 7: iteration 149070/ 173500 | consumed samples: 38161920 | consumed tokens: 78155612160 | elapsed time per iteration (s): 0.08 | learning rate: 2.884E-05 | global batch size: 256 | lm loss: 4.519982E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.712 | TFLOPs: 11.87 | 7: iteration 149080/ 173500 | consumed samples: 38164480 | consumed tokens: 78160855040 | elapsed time per iteration (s): 0.08 | learning rate: 2.883E-05 | global batch size: 256 | lm loss: 4.495765E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.685 | TFLOPs: 11.89 | 7: iteration 149090/ 173500 | consumed samples: 38167040 | consumed tokens: 78166097920 | elapsed time per iteration (s): 0.08 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 4.505847E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.914 | TFLOPs: 11.86 | 7: iteration 149100/ 173500 | consumed samples: 38169600 | consumed tokens: 78171340800 | elapsed time per iteration (s): 0.08 | learning rate: 2.881E-05 | global batch size: 256 | lm loss: 4.504749E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.332 | TFLOPs: 11.88 | 7: iteration 149110/ 173500 | consumed samples: 38172160 | consumed tokens: 78176583680 | elapsed time per iteration (s): 0.08 | learning rate: 2.881E-05 | global batch size: 256 | lm loss: 4.503522E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.804 | TFLOPs: 11.91 | 7: iteration 149120/ 173500 | consumed samples: 38174720 | consumed tokens: 78181826560 | elapsed time per iteration (s): 0.08 | learning rate: 2.880E-05 | global batch size: 256 | lm loss: 4.518927E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.830 | TFLOPs: 11.90 | 7: iteration 149130/ 173500 | consumed samples: 38177280 | consumed tokens: 78187069440 | elapsed time per iteration (s): 0.08 | learning rate: 2.879E-05 | global batch size: 256 | lm loss: 4.507226E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.340 | TFLOPs: 11.89 | 7: iteration 149140/ 173500 | consumed samples: 38179840 | consumed tokens: 78192312320 | elapsed time per iteration (s): 0.08 | learning rate: 2.879E-05 | global batch size: 256 | lm loss: 4.508863E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.456 | TFLOPs: 11.90 | 7: iteration 149150/ 173500 | consumed samples: 38182400 | consumed tokens: 78197555200 | elapsed time per iteration (s): 0.08 | learning rate: 2.878E-05 | global batch size: 256 | lm loss: 4.503243E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.854 | TFLOPs: 11.85 | 7: iteration 149160/ 173500 | consumed samples: 38184960 | consumed tokens: 78202798080 | elapsed time per iteration (s): 0.08 | learning rate: 2.877E-05 | global batch size: 256 | lm loss: 4.499541E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.109 | TFLOPs: 11.88 | 7: iteration 149170/ 173500 | consumed samples: 38187520 | consumed tokens: 78208040960 | elapsed time per iteration (s): 0.08 | learning rate: 2.877E-05 | global batch size: 256 | lm loss: 4.502959E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.593 | TFLOPs: 11.83 | 7: iteration 149180/ 173500 | consumed samples: 38190080 | consumed tokens: 78213283840 | elapsed time per iteration (s): 0.08 | learning rate: 2.876E-05 | global batch size: 256 | lm loss: 4.507792E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.614 | TFLOPs: 11.80 | 7: iteration 149190/ 173500 | consumed samples: 38192640 | consumed tokens: 78218526720 | elapsed time per iteration (s): 0.08 | learning rate: 2.875E-05 | global batch size: 256 | lm loss: 4.512207E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.479 | TFLOPs: 11.85 | 7: iteration 149200/ 173500 | consumed samples: 38195200 | consumed tokens: 78223769600 | elapsed time per iteration (s): 0.08 | learning rate: 2.874E-05 | global batch size: 256 | lm loss: 4.503372E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.871 | TFLOPs: 11.88 | 7: iteration 149210/ 173500 | consumed samples: 38197760 | consumed tokens: 78229012480 | elapsed time per iteration (s): 0.08 | learning rate: 2.874E-05 | global batch size: 256 | lm loss: 4.518109E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.665 | TFLOPs: 11.90 | 7: iteration 149220/ 173500 | consumed samples: 38200320 | consumed tokens: 78234255360 | elapsed time per iteration (s): 0.08 | learning rate: 2.873E-05 | global batch size: 256 | lm loss: 4.510438E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.442 | TFLOPs: 11.81 | 7: iteration 149230/ 173500 | consumed samples: 38202880 | consumed tokens: 78239498240 | elapsed time per iteration (s): 0.08 | learning rate: 2.872E-05 | global batch size: 256 | lm loss: 4.508526E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.719 | TFLOPs: 11.88 | 7: iteration 149240/ 173500 | consumed samples: 38205440 | consumed tokens: 78244741120 | elapsed time per iteration (s): 0.08 | learning rate: 2.872E-05 | global batch size: 256 | lm loss: 4.504590E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.681 | TFLOPs: 11.86 | 7: iteration 149250/ 173500 | consumed samples: 38208000 | consumed tokens: 78249984000 | elapsed time per iteration (s): 0.10 | learning rate: 2.871E-05 | global batch size: 256 | lm loss: 4.496770E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2563.487 | TFLOPs: 9.54 | 7: iteration 149260/ 173500 | consumed samples: 38210560 | consumed tokens: 78255226880 | elapsed time per iteration (s): 0.10 | learning rate: 2.870E-05 | global batch size: 256 | lm loss: 4.498295E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.506 | TFLOPs: 9.41 | 7: iteration 149270/ 173500 | consumed samples: 38213120 | consumed tokens: 78260469760 | elapsed time per iteration (s): 0.08 | learning rate: 2.869E-05 | global batch size: 256 | lm loss: 4.504881E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.974 | TFLOPs: 11.83 | 7: iteration 149280/ 173500 | consumed samples: 38215680 | consumed tokens: 78265712640 | elapsed time per iteration (s): 0.08 | learning rate: 2.869E-05 | global batch size: 256 | lm loss: 4.514130E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.859 | TFLOPs: 11.57 | 7: iteration 149290/ 173500 | consumed samples: 38218240 | consumed tokens: 78270955520 | elapsed time per iteration (s): 0.08 | learning rate: 2.868E-05 | global batch size: 256 | lm loss: 4.502042E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3109.743 | TFLOPs: 11.57 | 7: iteration 149300/ 173500 | consumed samples: 38220800 | consumed tokens: 78276198400 | elapsed time per iteration (s): 0.09 | learning rate: 2.867E-05 | global batch size: 256 | lm loss: 4.503919E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2918.222 | TFLOPs: 10.85 | 7: iteration 149310/ 173500 | consumed samples: 38223360 | consumed tokens: 78281441280 | elapsed time per iteration (s): 0.08 | learning rate: 2.867E-05 | global batch size: 256 | lm loss: 4.503795E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.988 | TFLOPs: 11.68 | 7: iteration 149320/ 173500 | consumed samples: 38225920 | consumed tokens: 78286684160 | elapsed time per iteration (s): 0.08 | learning rate: 2.866E-05 | global batch size: 256 | lm loss: 4.510909E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.344 | TFLOPs: 11.84 | 7: iteration 149330/ 173500 | consumed samples: 38228480 | consumed tokens: 78291927040 | elapsed time per iteration (s): 0.08 | learning rate: 2.865E-05 | global batch size: 256 | lm loss: 4.500628E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.972 | TFLOPs: 11.90 | 7: iteration 149340/ 173500 | consumed samples: 38231040 | consumed tokens: 78297169920 | elapsed time per iteration (s): 0.08 | learning rate: 2.865E-05 | global batch size: 256 | lm loss: 4.501659E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.315 | TFLOPs: 11.90 | 7: iteration 149350/ 173500 | consumed samples: 38233600 | consumed tokens: 78302412800 | elapsed time per iteration (s): 0.08 | learning rate: 2.864E-05 | global batch size: 256 | lm loss: 4.491634E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.961 | TFLOPs: 11.85 | 7: iteration 149360/ 173500 | consumed samples: 38236160 | consumed tokens: 78307655680 | elapsed time per iteration (s): 0.08 | learning rate: 2.863E-05 | global batch size: 256 | lm loss: 4.501775E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.398 | TFLOPs: 11.93 | 7: iteration 149370/ 173500 | consumed samples: 38238720 | consumed tokens: 78312898560 | elapsed time per iteration (s): 0.08 | learning rate: 2.862E-05 | global batch size: 256 | lm loss: 4.503035E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.898 | TFLOPs: 11.89 | 7: iteration 149380/ 173500 | consumed samples: 38241280 | consumed tokens: 78318141440 | elapsed time per iteration (s): 0.08 | learning rate: 2.862E-05 | global batch size: 256 | lm loss: 4.512420E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.718 | TFLOPs: 11.85 | 7: iteration 149390/ 173500 | consumed samples: 38243840 | consumed tokens: 78323384320 | elapsed time per iteration (s): 0.08 | learning rate: 2.861E-05 | global batch size: 256 | lm loss: 4.516060E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.773 | TFLOPs: 11.72 | 7: iteration 149400/ 173500 | consumed samples: 38246400 | consumed tokens: 78328627200 | elapsed time per iteration (s): 0.08 | learning rate: 2.860E-05 | global batch size: 256 | lm loss: 4.505903E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.428 | TFLOPs: 11.93 | 7: iteration 149410/ 173500 | consumed samples: 38248960 | consumed tokens: 78333870080 | elapsed time per iteration (s): 0.08 | learning rate: 2.860E-05 | global batch size: 256 | lm loss: 4.517455E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.752 | TFLOPs: 11.62 | 7: iteration 149420/ 173500 | consumed samples: 38251520 | consumed tokens: 78339112960 | elapsed time per iteration (s): 0.08 | learning rate: 2.859E-05 | global batch size: 256 | lm loss: 4.513994E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.190 | TFLOPs: 11.91 | 7: iteration 149430/ 173500 | consumed samples: 38254080 | consumed tokens: 78344355840 | elapsed time per iteration (s): 0.08 | learning rate: 2.858E-05 | global batch size: 256 | lm loss: 4.509181E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.412 | TFLOPs: 11.91 | 7: iteration 149440/ 173500 | consumed samples: 38256640 | consumed tokens: 78349598720 | elapsed time per iteration (s): 0.08 | learning rate: 2.857E-05 | global batch size: 256 | lm loss: 4.501628E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.836 | TFLOPs: 11.89 | 7: iteration 149450/ 173500 | consumed samples: 38259200 | consumed tokens: 78354841600 | elapsed time per iteration (s): 0.08 | learning rate: 2.857E-05 | global batch size: 256 | lm loss: 4.503563E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.833 | TFLOPs: 11.91 | 7: iteration 149460/ 173500 | consumed samples: 38261760 | consumed tokens: 78360084480 | elapsed time per iteration (s): 0.08 | learning rate: 2.856E-05 | global batch size: 256 | lm loss: 4.496063E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.322 | TFLOPs: 11.44 | 7: iteration 149470/ 173500 | consumed samples: 38264320 | consumed tokens: 78365327360 | elapsed time per iteration (s): 0.08 | learning rate: 2.855E-05 | global batch size: 256 | lm loss: 4.492405E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.067 | TFLOPs: 11.93 | 7: iteration 149480/ 173500 | consumed samples: 38266880 | consumed tokens: 78370570240 | elapsed time per iteration (s): 0.08 | learning rate: 2.855E-05 | global batch size: 256 | lm loss: 4.507837E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.046 | TFLOPs: 11.84 | 7: iteration 149490/ 173500 | consumed samples: 38269440 | consumed tokens: 78375813120 | elapsed time per iteration (s): 0.08 | learning rate: 2.854E-05 | global batch size: 256 | lm loss: 4.498759E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.110 | TFLOPs: 11.84 | 7: iteration 149500/ 173500 | consumed samples: 38272000 | consumed tokens: 78381056000 | elapsed time per iteration (s): 0.08 | learning rate: 2.853E-05 | global batch size: 256 | lm loss: 4.502408E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.531 | TFLOPs: 11.60 | 7: iteration 149510/ 173500 | consumed samples: 38274560 | consumed tokens: 78386298880 | elapsed time per iteration (s): 0.08 | learning rate: 2.853E-05 | global batch size: 256 | lm loss: 4.506046E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.094 | TFLOPs: 11.78 | 7: iteration 149520/ 173500 | consumed samples: 38277120 | consumed tokens: 78391541760 | elapsed time per iteration (s): 0.08 | learning rate: 2.852E-05 | global batch size: 256 | lm loss: 4.512586E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.190 | TFLOPs: 11.81 | 7: iteration 149530/ 173500 | consumed samples: 38279680 | consumed tokens: 78396784640 | elapsed time per iteration (s): 0.08 | learning rate: 2.851E-05 | global batch size: 256 | lm loss: 4.515875E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.935 | TFLOPs: 11.69 | 7: iteration 149540/ 173500 | consumed samples: 38282240 | consumed tokens: 78402027520 | elapsed time per iteration (s): 0.08 | learning rate: 2.850E-05 | global batch size: 256 | lm loss: 4.504511E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.086 | TFLOPs: 11.65 | 7: iteration 149550/ 173500 | consumed samples: 38284800 | consumed tokens: 78407270400 | elapsed time per iteration (s): 0.08 | learning rate: 2.850E-05 | global batch size: 256 | lm loss: 4.484142E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.375 | TFLOPs: 11.78 | 7: iteration 149560/ 173500 | consumed samples: 38287360 | consumed tokens: 78412513280 | elapsed time per iteration (s): 0.08 | learning rate: 2.849E-05 | global batch size: 256 | lm loss: 4.499828E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.554 | TFLOPs: 11.80 | 7: iteration 149570/ 173500 | consumed samples: 38289920 | consumed tokens: 78417756160 | elapsed time per iteration (s): 0.08 | learning rate: 2.848E-05 | global batch size: 256 | lm loss: 4.504933E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.231 | TFLOPs: 11.82 | 7: iteration 149580/ 173500 | consumed samples: 38292480 | consumed tokens: 78422999040 | elapsed time per iteration (s): 0.08 | learning rate: 2.848E-05 | global batch size: 256 | lm loss: 4.503551E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.683 | TFLOPs: 11.76 | 7: iteration 149590/ 173500 | consumed samples: 38295040 | consumed tokens: 78428241920 | elapsed time per iteration (s): 0.08 | learning rate: 2.847E-05 | global batch size: 256 | lm loss: 4.510076E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.346 | TFLOPs: 11.78 | 7: iteration 149600/ 173500 | consumed samples: 38297600 | consumed tokens: 78433484800 | elapsed time per iteration (s): 0.08 | learning rate: 2.846E-05 | global batch size: 256 | lm loss: 4.512431E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.594 | TFLOPs: 11.80 | 7: iteration 149610/ 173500 | consumed samples: 38300160 | consumed tokens: 78438727680 | elapsed time per iteration (s): 0.08 | learning rate: 2.846E-05 | global batch size: 256 | lm loss: 4.500259E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.464 | TFLOPs: 11.81 | 7: iteration 149620/ 173500 | consumed samples: 38302720 | consumed tokens: 78443970560 | elapsed time per iteration (s): 0.08 | learning rate: 2.845E-05 | global batch size: 256 | lm loss: 4.508151E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.657 | TFLOPs: 11.83 | 7: iteration 149630/ 173500 | consumed samples: 38305280 | consumed tokens: 78449213440 | elapsed time per iteration (s): 0.10 | learning rate: 2.844E-05 | global batch size: 256 | lm loss: 4.514766E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2636.710 | TFLOPs: 9.81 | 7: iteration 149640/ 173500 | consumed samples: 38307840 | consumed tokens: 78454456320 | elapsed time per iteration (s): 0.08 | learning rate: 2.844E-05 | global batch size: 256 | lm loss: 4.501825E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.167 | TFLOPs: 11.78 | 7: iteration 149650/ 173500 | consumed samples: 38310400 | consumed tokens: 78459699200 | elapsed time per iteration (s): 0.08 | learning rate: 2.843E-05 | global batch size: 256 | lm loss: 4.512691E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.899 | TFLOPs: 11.82 | 7: iteration 149660/ 173500 | consumed samples: 38312960 | consumed tokens: 78464942080 | elapsed time per iteration (s): 0.08 | learning rate: 2.842E-05 | global batch size: 256 | lm loss: 4.500298E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.051 | TFLOPs: 11.82 | 7: iteration 149670/ 173500 | consumed samples: 38315520 | consumed tokens: 78470184960 | elapsed time per iteration (s): 0.08 | learning rate: 2.841E-05 | global batch size: 256 | lm loss: 4.519120E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.718 | TFLOPs: 11.78 | 7: iteration 149680/ 173500 | consumed samples: 38318080 | consumed tokens: 78475427840 | elapsed time per iteration (s): 0.08 | learning rate: 2.841E-05 | global batch size: 256 | lm loss: 4.504776E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.050 | TFLOPs: 11.84 | 7: iteration 149690/ 173500 | consumed samples: 38320640 | consumed tokens: 78480670720 | elapsed time per iteration (s): 0.08 | learning rate: 2.840E-05 | global batch size: 256 | lm loss: 4.504356E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.832 | TFLOPs: 11.79 | 7: iteration 149700/ 173500 | consumed samples: 38323200 | consumed tokens: 78485913600 | elapsed time per iteration (s): 0.08 | learning rate: 2.839E-05 | global batch size: 256 | lm loss: 4.509783E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.153 | TFLOPs: 11.78 | 7: iteration 149710/ 173500 | consumed samples: 38325760 | consumed tokens: 78491156480 | elapsed time per iteration (s): 0.08 | learning rate: 2.839E-05 | global batch size: 256 | lm loss: 4.503767E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.634 | TFLOPs: 11.85 | 7: iteration 149720/ 173500 | consumed samples: 38328320 | consumed tokens: 78496399360 | elapsed time per iteration (s): 0.08 | learning rate: 2.838E-05 | global batch size: 256 | lm loss: 4.513186E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.412 | TFLOPs: 11.84 | 7: iteration 149730/ 173500 | consumed samples: 38330880 | consumed tokens: 78501642240 | elapsed time per iteration (s): 0.08 | learning rate: 2.837E-05 | global batch size: 256 | lm loss: 4.507736E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.141 | TFLOPs: 11.76 | 7: iteration 149740/ 173500 | consumed samples: 38333440 | consumed tokens: 78506885120 | elapsed time per iteration (s): 0.08 | learning rate: 2.837E-05 | global batch size: 256 | lm loss: 4.496629E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.928 | TFLOPs: 11.78 | 7: iteration 149750/ 173500 | consumed samples: 38336000 | consumed tokens: 78512128000 | elapsed time per iteration (s): 0.08 | learning rate: 2.836E-05 | global batch size: 256 | lm loss: 4.502696E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.464 | TFLOPs: 11.82 | 7: iteration 149760/ 173500 | consumed samples: 38338560 | consumed tokens: 78517370880 | elapsed time per iteration (s): 0.08 | learning rate: 2.835E-05 | global batch size: 256 | lm loss: 4.515366E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.419 | TFLOPs: 11.47 | 7: iteration 149770/ 173500 | consumed samples: 38341120 | consumed tokens: 78522613760 | elapsed time per iteration (s): 0.08 | learning rate: 2.835E-05 | global batch size: 256 | lm loss: 4.497259E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.088 | TFLOPs: 11.84 | 7: iteration 149780/ 173500 | consumed samples: 38343680 | consumed tokens: 78527856640 | elapsed time per iteration (s): 0.08 | learning rate: 2.834E-05 | global batch size: 256 | lm loss: 4.508742E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.882 | TFLOPs: 11.81 | 7: iteration 149790/ 173500 | consumed samples: 38346240 | consumed tokens: 78533099520 | elapsed time per iteration (s): 0.09 | learning rate: 2.833E-05 | global batch size: 256 | lm loss: 4.495755E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.663 | TFLOPs: 10.78 | 7: iteration 149800/ 173500 | consumed samples: 38348800 | consumed tokens: 78538342400 | elapsed time per iteration (s): 0.08 | learning rate: 2.832E-05 | global batch size: 256 | lm loss: 4.501684E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.748 | TFLOPs: 11.83 | 7: iteration 149810/ 173500 | consumed samples: 38351360 | consumed tokens: 78543585280 | elapsed time per iteration (s): 0.08 | learning rate: 2.832E-05 | global batch size: 256 | lm loss: 4.507946E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.974 | TFLOPs: 11.84 | 7: iteration 149820/ 173500 | consumed samples: 38353920 | consumed tokens: 78548828160 | elapsed time per iteration (s): 0.08 | learning rate: 2.831E-05 | global batch size: 256 | lm loss: 4.511650E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.062 | TFLOPs: 11.84 | 7: iteration 149830/ 173500 | consumed samples: 38356480 | consumed tokens: 78554071040 | elapsed time per iteration (s): 0.08 | learning rate: 2.830E-05 | global batch size: 256 | lm loss: 4.506602E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.676 | TFLOPs: 11.85 | 7: iteration 149840/ 173500 | consumed samples: 38359040 | consumed tokens: 78559313920 | elapsed time per iteration (s): 0.08 | learning rate: 2.830E-05 | global batch size: 256 | lm loss: 4.504744E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.657 | TFLOPs: 11.89 | 7: iteration 149850/ 173500 | consumed samples: 38361600 | consumed tokens: 78564556800 | elapsed time per iteration (s): 0.08 | learning rate: 2.829E-05 | global batch size: 256 | lm loss: 4.495563E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.017 | TFLOPs: 11.90 | 7: iteration 149860/ 173500 | consumed samples: 38364160 | consumed tokens: 78569799680 | elapsed time per iteration (s): 0.08 | learning rate: 2.828E-05 | global batch size: 256 | lm loss: 4.511268E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.780 | TFLOPs: 11.90 | 7: iteration 149870/ 173500 | consumed samples: 38366720 | consumed tokens: 78575042560 | elapsed time per iteration (s): 0.08 | learning rate: 2.828E-05 | global batch size: 256 | lm loss: 4.501457E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.581 | TFLOPs: 11.89 | 7: iteration 149880/ 173500 | consumed samples: 38369280 | consumed tokens: 78580285440 | elapsed time per iteration (s): 0.08 | learning rate: 2.827E-05 | global batch size: 256 | lm loss: 4.501502E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.044 | TFLOPs: 11.82 | 7: iteration 149890/ 173500 | consumed samples: 38371840 | consumed tokens: 78585528320 | elapsed time per iteration (s): 0.08 | learning rate: 2.826E-05 | global batch size: 256 | lm loss: 4.505509E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.704 | TFLOPs: 11.96 | 7: iteration 149900/ 173500 | consumed samples: 38374400 | consumed tokens: 78590771200 | elapsed time per iteration (s): 0.08 | learning rate: 2.826E-05 | global batch size: 256 | lm loss: 4.500097E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.148 | TFLOPs: 11.96 | 7: iteration 149910/ 173500 | consumed samples: 38376960 | consumed tokens: 78596014080 | elapsed time per iteration (s): 0.08 | learning rate: 2.825E-05 | global batch size: 256 | lm loss: 4.508788E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.588 | TFLOPs: 11.67 | 7: iteration 149920/ 173500 | consumed samples: 38379520 | consumed tokens: 78601256960 | elapsed time per iteration (s): 0.08 | learning rate: 2.824E-05 | global batch size: 256 | lm loss: 4.512968E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.185 | TFLOPs: 11.93 | 7: iteration 149930/ 173500 | consumed samples: 38382080 | consumed tokens: 78606499840 | elapsed time per iteration (s): 0.08 | learning rate: 2.823E-05 | global batch size: 256 | lm loss: 4.499286E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.601 | TFLOPs: 11.86 | 7: iteration 149940/ 173500 | consumed samples: 38384640 | consumed tokens: 78611742720 | elapsed time per iteration (s): 0.08 | learning rate: 2.823E-05 | global batch size: 256 | lm loss: 4.522312E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.805 | TFLOPs: 11.59 | 7: iteration 149950/ 173500 | consumed samples: 38387200 | consumed tokens: 78616985600 | elapsed time per iteration (s): 0.08 | learning rate: 2.822E-05 | global batch size: 256 | lm loss: 4.517580E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.210 | TFLOPs: 11.92 | 7: iteration 149960/ 173500 | consumed samples: 38389760 | consumed tokens: 78622228480 | elapsed time per iteration (s): 0.08 | learning rate: 2.821E-05 | global batch size: 256 | lm loss: 4.508766E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.718 | TFLOPs: 11.91 | 7: iteration 149970/ 173500 | consumed samples: 38392320 | consumed tokens: 78627471360 | elapsed time per iteration (s): 0.08 | learning rate: 2.821E-05 | global batch size: 256 | lm loss: 4.502320E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.606 | TFLOPs: 11.95 | 7: iteration 149980/ 173500 | consumed samples: 38394880 | consumed tokens: 78632714240 | elapsed time per iteration (s): 0.08 | learning rate: 2.820E-05 | global batch size: 256 | lm loss: 4.504609E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.139 | TFLOPs: 11.95 | 7: iteration 149990/ 173500 | consumed samples: 38397440 | consumed tokens: 78637957120 | elapsed time per iteration (s): 0.08 | learning rate: 2.819E-05 | global batch size: 256 | lm loss: 4.511543E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.762 | TFLOPs: 11.92 | 0: [2023-03-17 03:56:08,737] [INFO] [logging.py:68:log_dist] [Rank 0] step=150000, skipped=0, lr=[2.8186529571359086e-05, 2.8186529571359086e-05, 2.8186529571359086e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 150000/ 173500 | consumed samples: 38400000 | consumed tokens: 78643200000 | elapsed time per iteration (s): 0.08 | learning rate: 2.819E-05 | global batch size: 256 | lm loss: 4.500436E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.743 | TFLOPs: 11.63 | 0: steps: 150000 loss: 4.5172 iter time (s): 0.081 samples/sec: 3166.427 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 150000 | lm loss value: 4.399611E+00 | lm loss PPL: 8.141919E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 150000 to checkpoints_14m91b100m 0: [2023-03-17 03:56:08,796] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step150000 is begin to save! 0: [2023-03-17 03:56:08,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:56:08,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:56:08,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:56:08,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:56:08,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:56:08,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:56:08,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:56:08,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:56:08,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:56:08,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:56:08,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:56:08,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:56:08,840] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step150000/mp_rank_00_model_states.pt 0: [2023-03-17 03:56:08,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:56:08,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:56:08,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:56:08,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 4: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 3: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 6: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 1: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 5: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 7: [2023-03-17 03:56:08,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: [2023-03-17 03:56:08,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:56:08,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:56:08,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 2: [2023-03-17 03:56:08,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:56:08,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step150000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:56:08,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step150000 is ready now! 0: successfully saved checkpoint at iteration 150000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.05 7: iteration 150010/ 173500 | consumed samples: 38402560 | consumed tokens: 78648442880 | elapsed time per iteration (s): 0.09 | learning rate: 2.818E-05 | global batch size: 256 | lm loss: 4.506734E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2739.998 | TFLOPs: 10.19 | 7: iteration 150020/ 173500 | consumed samples: 38405120 | consumed tokens: 78653685760 | elapsed time per iteration (s): 0.08 | learning rate: 2.817E-05 | global batch size: 256 | lm loss: 4.492947E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.192 | TFLOPs: 11.93 | 7: iteration 150030/ 173500 | consumed samples: 38407680 | consumed tokens: 78658928640 | elapsed time per iteration (s): 0.08 | learning rate: 2.817E-05 | global batch size: 256 | lm loss: 4.512586E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.132 | TFLOPs: 11.90 | 7: iteration 150040/ 173500 | consumed samples: 38410240 | consumed tokens: 78664171520 | elapsed time per iteration (s): 0.08 | learning rate: 2.816E-05 | global batch size: 256 | lm loss: 4.511769E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.461 | TFLOPs: 11.86 | 7: iteration 150050/ 173500 | consumed samples: 38412800 | consumed tokens: 78669414400 | elapsed time per iteration (s): 0.08 | learning rate: 2.815E-05 | global batch size: 256 | lm loss: 4.493882E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.146 | TFLOPs: 11.48 | 7: iteration 150060/ 173500 | consumed samples: 38415360 | consumed tokens: 78674657280 | elapsed time per iteration (s): 0.10 | learning rate: 2.815E-05 | global batch size: 256 | lm loss: 4.501772E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.861 | TFLOPs: 9.94 | 7: iteration 150070/ 173500 | consumed samples: 38417920 | consumed tokens: 78679900160 | elapsed time per iteration (s): 0.08 | learning rate: 2.814E-05 | global batch size: 256 | lm loss: 4.501202E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.628 | TFLOPs: 11.65 | 7: iteration 150080/ 173500 | consumed samples: 38420480 | consumed tokens: 78685143040 | elapsed time per iteration (s): 0.08 | learning rate: 2.813E-05 | global batch size: 256 | lm loss: 4.501330E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.068 | TFLOPs: 11.73 | 7: iteration 150090/ 173500 | consumed samples: 38423040 | consumed tokens: 78690385920 | elapsed time per iteration (s): 0.08 | learning rate: 2.812E-05 | global batch size: 256 | lm loss: 4.501749E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.572 | TFLOPs: 11.74 | 7: iteration 150100/ 173500 | consumed samples: 38425600 | consumed tokens: 78695628800 | elapsed time per iteration (s): 0.08 | learning rate: 2.812E-05 | global batch size: 256 | lm loss: 4.500389E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.666 | TFLOPs: 11.83 | 7: iteration 150110/ 173500 | consumed samples: 38428160 | consumed tokens: 78700871680 | elapsed time per iteration (s): 0.08 | learning rate: 2.811E-05 | global batch size: 256 | lm loss: 4.503699E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.617 | TFLOPs: 11.76 | 7: iteration 150120/ 173500 | consumed samples: 38430720 | consumed tokens: 78706114560 | elapsed time per iteration (s): 0.08 | learning rate: 2.810E-05 | global batch size: 256 | lm loss: 4.515449E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.089 | TFLOPs: 11.72 | 7: iteration 150130/ 173500 | consumed samples: 38433280 | consumed tokens: 78711357440 | elapsed time per iteration (s): 0.08 | learning rate: 2.810E-05 | global batch size: 256 | lm loss: 4.514256E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3050.709 | TFLOPs: 11.35 | 7: iteration 150140/ 173500 | consumed samples: 38435840 | consumed tokens: 78716600320 | elapsed time per iteration (s): 0.10 | learning rate: 2.809E-05 | global batch size: 256 | lm loss: 4.507566E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.532 | TFLOPs: 9.83 | 7: iteration 150150/ 173500 | consumed samples: 38438400 | consumed tokens: 78721843200 | elapsed time per iteration (s): 0.08 | learning rate: 2.808E-05 | global batch size: 256 | lm loss: 4.513445E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.191 | TFLOPs: 11.55 | 7: iteration 150160/ 173500 | consumed samples: 38440960 | consumed tokens: 78727086080 | elapsed time per iteration (s): 0.08 | learning rate: 2.808E-05 | global batch size: 256 | lm loss: 4.509236E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.585 | TFLOPs: 11.86 | 7: iteration 150170/ 173500 | consumed samples: 38443520 | consumed tokens: 78732328960 | elapsed time per iteration (s): 0.09 | learning rate: 2.807E-05 | global batch size: 256 | lm loss: 4.495470E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2882.234 | TFLOPs: 10.72 | 7: iteration 150180/ 173500 | consumed samples: 38446080 | consumed tokens: 78737571840 | elapsed time per iteration (s): 0.09 | learning rate: 2.806E-05 | global batch size: 256 | lm loss: 4.507652E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2846.156 | TFLOPs: 10.59 | 7: iteration 150190/ 173500 | consumed samples: 38448640 | consumed tokens: 78742814720 | elapsed time per iteration (s): 0.08 | learning rate: 2.806E-05 | global batch size: 256 | lm loss: 4.505654E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.915 | TFLOPs: 11.30 | 7: iteration 150200/ 173500 | consumed samples: 38451200 | consumed tokens: 78748057600 | elapsed time per iteration (s): 0.08 | learning rate: 2.805E-05 | global batch size: 256 | lm loss: 4.502280E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.677 | TFLOPs: 11.56 | 7: iteration 150210/ 173500 | consumed samples: 38453760 | consumed tokens: 78753300480 | elapsed time per iteration (s): 0.08 | learning rate: 2.804E-05 | global batch size: 256 | lm loss: 4.509826E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.583 | TFLOPs: 11.56 | 7: iteration 150220/ 173500 | consumed samples: 38456320 | consumed tokens: 78758543360 | elapsed time per iteration (s): 0.08 | learning rate: 2.804E-05 | global batch size: 256 | lm loss: 4.511476E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.694 | TFLOPs: 11.84 | 7: iteration 150230/ 173500 | consumed samples: 38458880 | consumed tokens: 78763786240 | elapsed time per iteration (s): 0.08 | learning rate: 2.803E-05 | global batch size: 256 | lm loss: 4.507483E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.906 | TFLOPs: 11.25 | 7: iteration 150240/ 173500 | consumed samples: 38461440 | consumed tokens: 78769029120 | elapsed time per iteration (s): 0.08 | learning rate: 2.802E-05 | global batch size: 256 | lm loss: 4.503419E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.130 | TFLOPs: 11.75 | 7: iteration 150250/ 173500 | consumed samples: 38464000 | consumed tokens: 78774272000 | elapsed time per iteration (s): 0.08 | learning rate: 2.802E-05 | global batch size: 256 | lm loss: 4.533145E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.546 | TFLOPs: 11.53 | 7: iteration 150260/ 173500 | consumed samples: 38466560 | consumed tokens: 78779514880 | elapsed time per iteration (s): 0.08 | learning rate: 2.801E-05 | global batch size: 256 | lm loss: 4.511800E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.852 | TFLOPs: 11.79 | 7: iteration 150270/ 173500 | consumed samples: 38469120 | consumed tokens: 78784757760 | elapsed time per iteration (s): 0.08 | learning rate: 2.800E-05 | global batch size: 256 | lm loss: 4.498243E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.171 | TFLOPs: 11.84 | 7: iteration 150280/ 173500 | consumed samples: 38471680 | consumed tokens: 78790000640 | elapsed time per iteration (s): 0.08 | learning rate: 2.800E-05 | global batch size: 256 | lm loss: 4.503444E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.844 | TFLOPs: 11.86 | 7: iteration 150290/ 173500 | consumed samples: 38474240 | consumed tokens: 78795243520 | elapsed time per iteration (s): 0.08 | learning rate: 2.799E-05 | global batch size: 256 | lm loss: 4.513876E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.934 | TFLOPs: 11.53 | 7: iteration 150300/ 173500 | consumed samples: 38476800 | consumed tokens: 78800486400 | elapsed time per iteration (s): 0.08 | learning rate: 2.798E-05 | global batch size: 256 | lm loss: 4.503778E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.436 | TFLOPs: 11.89 | 7: iteration 150310/ 173500 | consumed samples: 38479360 | consumed tokens: 78805729280 | elapsed time per iteration (s): 0.08 | learning rate: 2.798E-05 | global batch size: 256 | lm loss: 4.519224E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.261 | TFLOPs: 11.86 | 7: iteration 150320/ 173500 | consumed samples: 38481920 | consumed tokens: 78810972160 | elapsed time per iteration (s): 0.08 | learning rate: 2.797E-05 | global batch size: 256 | lm loss: 4.495675E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.248 | TFLOPs: 11.88 | 7: iteration 150330/ 173500 | consumed samples: 38484480 | consumed tokens: 78816215040 | elapsed time per iteration (s): 0.08 | learning rate: 2.796E-05 | global batch size: 256 | lm loss: 4.507097E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.098 | TFLOPs: 11.90 | 7: iteration 150340/ 173500 | consumed samples: 38487040 | consumed tokens: 78821457920 | elapsed time per iteration (s): 0.08 | learning rate: 2.795E-05 | global batch size: 256 | lm loss: 4.509937E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.690 | TFLOPs: 11.89 | 7: iteration 150350/ 173500 | consumed samples: 38489600 | consumed tokens: 78826700800 | elapsed time per iteration (s): 0.08 | learning rate: 2.795E-05 | global batch size: 256 | lm loss: 4.499102E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.806 | TFLOPs: 11.89 | 7: iteration 150360/ 173500 | consumed samples: 38492160 | consumed tokens: 78831943680 | elapsed time per iteration (s): 0.08 | learning rate: 2.794E-05 | global batch size: 256 | lm loss: 4.505433E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.265 | TFLOPs: 11.85 | 7: iteration 150370/ 173500 | consumed samples: 38494720 | consumed tokens: 78837186560 | elapsed time per iteration (s): 0.08 | learning rate: 2.793E-05 | global batch size: 256 | lm loss: 4.495736E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.436 | TFLOPs: 11.88 | 7: iteration 150380/ 173500 | consumed samples: 38497280 | consumed tokens: 78842429440 | elapsed time per iteration (s): 0.11 | learning rate: 2.793E-05 | global batch size: 256 | lm loss: 4.504427E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2384.300 | TFLOPs: 8.87 | 7: iteration 150390/ 173500 | consumed samples: 38499840 | consumed tokens: 78847672320 | elapsed time per iteration (s): 0.08 | learning rate: 2.792E-05 | global batch size: 256 | lm loss: 4.499839E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.898 | TFLOPs: 11.82 | 7: iteration 150400/ 173500 | consumed samples: 38502400 | consumed tokens: 78852915200 | elapsed time per iteration (s): 0.08 | learning rate: 2.791E-05 | global batch size: 256 | lm loss: 4.504656E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.724 | TFLOPs: 11.80 | 7: iteration 150410/ 173500 | consumed samples: 38504960 | consumed tokens: 78858158080 | elapsed time per iteration (s): 0.08 | learning rate: 2.791E-05 | global batch size: 256 | lm loss: 4.512323E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.827 | TFLOPs: 11.78 | 7: iteration 150420/ 173500 | consumed samples: 38507520 | consumed tokens: 78863400960 | elapsed time per iteration (s): 0.08 | learning rate: 2.790E-05 | global batch size: 256 | lm loss: 4.511012E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.615 | TFLOPs: 11.90 | 7: iteration 150430/ 173500 | consumed samples: 38510080 | consumed tokens: 78868643840 | elapsed time per iteration (s): 0.08 | learning rate: 2.789E-05 | global batch size: 256 | lm loss: 4.505627E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.355 | TFLOPs: 11.90 | 7: iteration 150440/ 173500 | consumed samples: 38512640 | consumed tokens: 78873886720 | elapsed time per iteration (s): 0.08 | learning rate: 2.789E-05 | global batch size: 256 | lm loss: 4.506152E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.898 | TFLOPs: 11.62 | 7: iteration 150450/ 173500 | consumed samples: 38515200 | consumed tokens: 78879129600 | elapsed time per iteration (s): 0.08 | learning rate: 2.788E-05 | global batch size: 256 | lm loss: 4.497612E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.442 | TFLOPs: 11.81 | 7: iteration 150460/ 173500 | consumed samples: 38517760 | consumed tokens: 78884372480 | elapsed time per iteration (s): 0.08 | learning rate: 2.787E-05 | global batch size: 256 | lm loss: 4.503152E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.101 | TFLOPs: 11.91 | 7: iteration 150470/ 173500 | consumed samples: 38520320 | consumed tokens: 78889615360 | elapsed time per iteration (s): 0.08 | learning rate: 2.787E-05 | global batch size: 256 | lm loss: 4.498627E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.943 | TFLOPs: 11.81 | 7: iteration 150480/ 173500 | consumed samples: 38522880 | consumed tokens: 78894858240 | elapsed time per iteration (s): 0.08 | learning rate: 2.786E-05 | global batch size: 256 | lm loss: 4.507084E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.917 | TFLOPs: 11.82 | 7: iteration 150490/ 173500 | consumed samples: 38525440 | consumed tokens: 78900101120 | elapsed time per iteration (s): 0.08 | learning rate: 2.785E-05 | global batch size: 256 | lm loss: 4.496534E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.952 | TFLOPs: 11.81 | 7: iteration 150500/ 173500 | consumed samples: 38528000 | consumed tokens: 78905344000 | elapsed time per iteration (s): 0.08 | learning rate: 2.785E-05 | global batch size: 256 | lm loss: 4.511438E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.987 | TFLOPs: 11.81 | 7: iteration 150510/ 173500 | consumed samples: 38530560 | consumed tokens: 78910586880 | elapsed time per iteration (s): 0.08 | learning rate: 2.784E-05 | global batch size: 256 | lm loss: 4.513887E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.252 | TFLOPs: 11.77 | 7: iteration 150520/ 173500 | consumed samples: 38533120 | consumed tokens: 78915829760 | elapsed time per iteration (s): 0.08 | learning rate: 2.783E-05 | global batch size: 256 | lm loss: 4.511465E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.626 | TFLOPs: 11.84 | 7: iteration 150530/ 173500 | consumed samples: 38535680 | consumed tokens: 78921072640 | elapsed time per iteration (s): 0.08 | learning rate: 2.783E-05 | global batch size: 256 | lm loss: 4.504509E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.825 | TFLOPs: 11.84 | 7: iteration 150540/ 173500 | consumed samples: 38538240 | consumed tokens: 78926315520 | elapsed time per iteration (s): 0.08 | learning rate: 2.782E-05 | global batch size: 256 | lm loss: 4.507707E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.790 | TFLOPs: 11.82 | 7: iteration 150550/ 173500 | consumed samples: 38540800 | consumed tokens: 78931558400 | elapsed time per iteration (s): 0.09 | learning rate: 2.781E-05 | global batch size: 256 | lm loss: 4.511209E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2828.981 | TFLOPs: 10.52 | 7: iteration 150560/ 173500 | consumed samples: 38543360 | consumed tokens: 78936801280 | elapsed time per iteration (s): 0.08 | learning rate: 2.781E-05 | global batch size: 256 | lm loss: 4.525749E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.514 | TFLOPs: 11.80 | 7: iteration 150570/ 173500 | consumed samples: 38545920 | consumed tokens: 78942044160 | elapsed time per iteration (s): 0.08 | learning rate: 2.780E-05 | global batch size: 256 | lm loss: 4.507689E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.770 | TFLOPs: 11.82 | 7: iteration 150580/ 173500 | consumed samples: 38548480 | consumed tokens: 78947287040 | elapsed time per iteration (s): 0.08 | learning rate: 2.779E-05 | global batch size: 256 | lm loss: 4.511811E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.734 | TFLOPs: 11.85 | 7: iteration 150590/ 173500 | consumed samples: 38551040 | consumed tokens: 78952529920 | elapsed time per iteration (s): 0.08 | learning rate: 2.779E-05 | global batch size: 256 | lm loss: 4.511188E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.157 | TFLOPs: 11.90 | 7: iteration 150600/ 173500 | consumed samples: 38553600 | consumed tokens: 78957772800 | elapsed time per iteration (s): 0.08 | learning rate: 2.778E-05 | global batch size: 256 | lm loss: 4.506738E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.279 | TFLOPs: 11.90 | 7: iteration 150610/ 173500 | consumed samples: 38556160 | consumed tokens: 78963015680 | elapsed time per iteration (s): 0.08 | learning rate: 2.777E-05 | global batch size: 256 | lm loss: 4.509314E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.683 | TFLOPs: 11.64 | 7: iteration 150620/ 173500 | consumed samples: 38558720 | consumed tokens: 78968258560 | elapsed time per iteration (s): 0.08 | learning rate: 2.777E-05 | global batch size: 256 | lm loss: 4.501324E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3064.648 | TFLOPs: 11.40 | 7: iteration 150630/ 173500 | consumed samples: 38561280 | consumed tokens: 78973501440 | elapsed time per iteration (s): 0.08 | learning rate: 2.776E-05 | global batch size: 256 | lm loss: 4.503131E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.875 | TFLOPs: 11.69 | 7: iteration 150640/ 173500 | consumed samples: 38563840 | consumed tokens: 78978744320 | elapsed time per iteration (s): 0.08 | learning rate: 2.775E-05 | global batch size: 256 | lm loss: 4.496600E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.008 | TFLOPs: 11.65 | 7: iteration 150650/ 173500 | consumed samples: 38566400 | consumed tokens: 78983987200 | elapsed time per iteration (s): 0.08 | learning rate: 2.775E-05 | global batch size: 256 | lm loss: 4.510205E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.364 | TFLOPs: 11.68 | 7: iteration 150660/ 173500 | consumed samples: 38568960 | consumed tokens: 78989230080 | elapsed time per iteration (s): 0.08 | learning rate: 2.774E-05 | global batch size: 256 | lm loss: 4.506016E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.991 | TFLOPs: 11.93 | 7: iteration 150670/ 173500 | consumed samples: 38571520 | consumed tokens: 78994472960 | elapsed time per iteration (s): 0.08 | learning rate: 2.773E-05 | global batch size: 256 | lm loss: 4.505897E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3135.010 | TFLOPs: 11.66 | 7: iteration 150680/ 173500 | consumed samples: 38574080 | consumed tokens: 78999715840 | elapsed time per iteration (s): 0.08 | learning rate: 2.773E-05 | global batch size: 256 | lm loss: 4.503393E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.658 | TFLOPs: 11.68 | 7: iteration 150690/ 173500 | consumed samples: 38576640 | consumed tokens: 79004958720 | elapsed time per iteration (s): 0.08 | learning rate: 2.772E-05 | global batch size: 256 | lm loss: 4.515448E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.409 | TFLOPs: 11.55 | 7: iteration 150700/ 173500 | consumed samples: 38579200 | consumed tokens: 79010201600 | elapsed time per iteration (s): 0.08 | learning rate: 2.771E-05 | global batch size: 256 | lm loss: 4.513775E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.049 | TFLOPs: 11.78 | 7: iteration 150710/ 173500 | consumed samples: 38581760 | consumed tokens: 79015444480 | elapsed time per iteration (s): 0.08 | learning rate: 2.771E-05 | global batch size: 256 | lm loss: 4.495809E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.684 | TFLOPs: 11.83 | 7: iteration 150720/ 173500 | consumed samples: 38584320 | consumed tokens: 79020687360 | elapsed time per iteration (s): 0.08 | learning rate: 2.770E-05 | global batch size: 256 | lm loss: 4.508825E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.237 | TFLOPs: 11.85 | 7: iteration 150730/ 173500 | consumed samples: 38586880 | consumed tokens: 79025930240 | elapsed time per iteration (s): 0.09 | learning rate: 2.769E-05 | global batch size: 256 | lm loss: 4.509982E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.709 | TFLOPs: 11.18 | 7: iteration 150740/ 173500 | consumed samples: 38589440 | consumed tokens: 79031173120 | elapsed time per iteration (s): 0.08 | learning rate: 2.769E-05 | global batch size: 256 | lm loss: 4.490910E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.603 | TFLOPs: 11.85 | 7: iteration 150750/ 173500 | consumed samples: 38592000 | consumed tokens: 79036416000 | elapsed time per iteration (s): 0.08 | learning rate: 2.768E-05 | global batch size: 256 | lm loss: 4.498510E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.989 | TFLOPs: 11.80 | 7: iteration 150760/ 173500 | consumed samples: 38594560 | consumed tokens: 79041658880 | elapsed time per iteration (s): 0.08 | learning rate: 2.767E-05 | global batch size: 256 | lm loss: 4.506709E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.557 | TFLOPs: 11.46 | 7: iteration 150770/ 173500 | consumed samples: 38597120 | consumed tokens: 79046901760 | elapsed time per iteration (s): 0.08 | learning rate: 2.767E-05 | global batch size: 256 | lm loss: 4.507396E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.610 | TFLOPs: 11.58 | 7: iteration 150780/ 173500 | consumed samples: 38599680 | consumed tokens: 79052144640 | elapsed time per iteration (s): 0.08 | learning rate: 2.766E-05 | global batch size: 256 | lm loss: 4.491789E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.108 | TFLOPs: 11.75 | 7: iteration 150790/ 173500 | consumed samples: 38602240 | consumed tokens: 79057387520 | elapsed time per iteration (s): 0.08 | learning rate: 2.765E-05 | global batch size: 256 | lm loss: 4.507642E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3083.486 | TFLOPs: 11.47 | 7: iteration 150800/ 173500 | consumed samples: 38604800 | consumed tokens: 79062630400 | elapsed time per iteration (s): 0.08 | learning rate: 2.765E-05 | global batch size: 256 | lm loss: 4.514539E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.758 | TFLOPs: 11.78 | 7: iteration 150810/ 173500 | consumed samples: 38607360 | consumed tokens: 79067873280 | elapsed time per iteration (s): 0.08 | learning rate: 2.764E-05 | global batch size: 256 | lm loss: 4.508781E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.300 | TFLOPs: 11.79 | 7: iteration 150820/ 173500 | consumed samples: 38609920 | consumed tokens: 79073116160 | elapsed time per iteration (s): 0.08 | learning rate: 2.763E-05 | global batch size: 256 | lm loss: 4.498360E+00 | grad norm: 0.478 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.821 | TFLOPs: 11.73 | 7: iteration 150830/ 173500 | consumed samples: 38612480 | consumed tokens: 79078359040 | elapsed time per iteration (s): 0.08 | learning rate: 2.763E-05 | global batch size: 256 | lm loss: 4.504068E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.590 | TFLOPs: 11.83 | 7: iteration 150840/ 173500 | consumed samples: 38615040 | consumed tokens: 79083601920 | elapsed time per iteration (s): 0.08 | learning rate: 2.762E-05 | global batch size: 256 | lm loss: 4.503155E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.659 | TFLOPs: 11.84 | 7: iteration 150850/ 173500 | consumed samples: 38617600 | consumed tokens: 79088844800 | elapsed time per iteration (s): 0.08 | learning rate: 2.761E-05 | global batch size: 256 | lm loss: 4.516864E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.529 | TFLOPs: 11.84 | 7: iteration 150860/ 173500 | consumed samples: 38620160 | consumed tokens: 79094087680 | elapsed time per iteration (s): 0.08 | learning rate: 2.761E-05 | global batch size: 256 | lm loss: 4.517093E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.428 | TFLOPs: 11.86 | 7: iteration 150870/ 173500 | consumed samples: 38622720 | consumed tokens: 79099330560 | elapsed time per iteration (s): 0.08 | learning rate: 2.760E-05 | global batch size: 256 | lm loss: 4.518038E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.862 | TFLOPs: 11.79 | 7: iteration 150880/ 173500 | consumed samples: 38625280 | consumed tokens: 79104573440 | elapsed time per iteration (s): 0.08 | learning rate: 2.759E-05 | global batch size: 256 | lm loss: 4.523689E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.295 | TFLOPs: 11.86 | 7: iteration 150890/ 173500 | consumed samples: 38627840 | consumed tokens: 79109816320 | elapsed time per iteration (s): 0.08 | learning rate: 2.759E-05 | global batch size: 256 | lm loss: 4.513185E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.916 | TFLOPs: 11.82 | 7: iteration 150900/ 173500 | consumed samples: 38630400 | consumed tokens: 79115059200 | elapsed time per iteration (s): 0.08 | learning rate: 2.758E-05 | global batch size: 256 | lm loss: 4.506289E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.472 | TFLOPs: 11.85 | 7: iteration 150910/ 173500 | consumed samples: 38632960 | consumed tokens: 79120302080 | elapsed time per iteration (s): 0.08 | learning rate: 2.757E-05 | global batch size: 256 | lm loss: 4.514473E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.315 | TFLOPs: 11.81 | 7: iteration 150920/ 173500 | consumed samples: 38635520 | consumed tokens: 79125544960 | elapsed time per iteration (s): 0.10 | learning rate: 2.757E-05 | global batch size: 256 | lm loss: 4.509615E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2526.273 | TFLOPs: 9.40 | 7: iteration 150930/ 173500 | consumed samples: 38638080 | consumed tokens: 79130787840 | elapsed time per iteration (s): 0.08 | learning rate: 2.756E-05 | global batch size: 256 | lm loss: 4.508189E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.037 | TFLOPs: 11.79 | 7: iteration 150940/ 173500 | consumed samples: 38640640 | consumed tokens: 79136030720 | elapsed time per iteration (s): 0.08 | learning rate: 2.755E-05 | global batch size: 256 | lm loss: 4.494210E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.566 | TFLOPs: 11.81 | 7: iteration 150950/ 173500 | consumed samples: 38643200 | consumed tokens: 79141273600 | elapsed time per iteration (s): 0.08 | learning rate: 2.755E-05 | global batch size: 256 | lm loss: 4.501012E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.396 | TFLOPs: 11.81 | 7: iteration 150960/ 173500 | consumed samples: 38645760 | consumed tokens: 79146516480 | elapsed time per iteration (s): 0.08 | learning rate: 2.754E-05 | global batch size: 256 | lm loss: 4.496621E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.951 | TFLOPs: 11.84 | 7: iteration 150970/ 173500 | consumed samples: 38648320 | consumed tokens: 79151759360 | elapsed time per iteration (s): 0.08 | learning rate: 2.753E-05 | global batch size: 256 | lm loss: 4.501281E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.283 | TFLOPs: 11.85 | 7: iteration 150980/ 173500 | consumed samples: 38650880 | consumed tokens: 79157002240 | elapsed time per iteration (s): 0.08 | learning rate: 2.753E-05 | global batch size: 256 | lm loss: 4.512563E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.550 | TFLOPs: 11.82 | 7: iteration 150990/ 173500 | consumed samples: 38653440 | consumed tokens: 79162245120 | elapsed time per iteration (s): 0.08 | learning rate: 2.752E-05 | global batch size: 256 | lm loss: 4.509720E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.876 | TFLOPs: 11.83 | 7: iteration 151000/ 173500 | consumed samples: 38656000 | consumed tokens: 79167488000 | elapsed time per iteration (s): 0.08 | learning rate: 2.751E-05 | global batch size: 256 | lm loss: 4.498223E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.174 | TFLOPs: 11.80 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 151000 | lm loss value: 4.401951E+00 | lm loss PPL: 8.160996E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 151000 to checkpoints_14m91b100m 0: [2023-03-17 03:57:30,970] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step151000 is begin to save! 0: [2023-03-17 03:57:30,973] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:57:30,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:57:31,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:57:31,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:57:31,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:57:31,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:57:31,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:57:31,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:57:31,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:57:31,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:57:31,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:57:31,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:57:31,013] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step151000/mp_rank_00_model_states.pt 0: [2023-03-17 03:57:31,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:57:31,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:57:31,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:57:31,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 4: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 6: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2023-03-17 03:57:31,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 6: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 1: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 7: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 2: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 5: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:57:31,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 03:57:31,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 3: [2023-03-17 03:57:31,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:57:31,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step151000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 03:57:31,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step151000 is ready now! 0: successfully saved checkpoint at iteration 151000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.22 7: iteration 151010/ 173500 | consumed samples: 38658560 | consumed tokens: 79172730880 | elapsed time per iteration (s): 0.09 | learning rate: 2.751E-05 | global batch size: 256 | lm loss: 4.505497E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2785.481 | TFLOPs: 10.36 | 7: iteration 151020/ 173500 | consumed samples: 38661120 | consumed tokens: 79177973760 | elapsed time per iteration (s): 0.08 | learning rate: 2.750E-05 | global batch size: 256 | lm loss: 4.512369E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.325 | TFLOPs: 11.81 | 7: iteration 151030/ 173500 | consumed samples: 38663680 | consumed tokens: 79183216640 | elapsed time per iteration (s): 0.08 | learning rate: 2.749E-05 | global batch size: 256 | lm loss: 4.510166E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.579 | TFLOPs: 11.83 | 7: iteration 151040/ 173500 | consumed samples: 38666240 | consumed tokens: 79188459520 | elapsed time per iteration (s): 0.08 | learning rate: 2.749E-05 | global batch size: 256 | lm loss: 4.504107E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.618 | TFLOPs: 11.79 | 7: iteration 151050/ 173500 | consumed samples: 38668800 | consumed tokens: 79193702400 | elapsed time per iteration (s): 0.08 | learning rate: 2.748E-05 | global batch size: 256 | lm loss: 4.504941E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.867 | TFLOPs: 11.84 | 7: iteration 151060/ 173500 | consumed samples: 38671360 | consumed tokens: 79198945280 | elapsed time per iteration (s): 0.08 | learning rate: 2.747E-05 | global batch size: 256 | lm loss: 4.502976E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.593 | TFLOPs: 11.79 | 7: iteration 151070/ 173500 | consumed samples: 38673920 | consumed tokens: 79204188160 | elapsed time per iteration (s): 0.08 | learning rate: 2.747E-05 | global batch size: 256 | lm loss: 4.509101E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.100 | TFLOPs: 11.79 | 7: iteration 151080/ 173500 | consumed samples: 38676480 | consumed tokens: 79209431040 | elapsed time per iteration (s): 0.08 | learning rate: 2.746E-05 | global batch size: 256 | lm loss: 4.497551E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.340 | TFLOPs: 11.90 | 7: iteration 151090/ 173500 | consumed samples: 38679040 | consumed tokens: 79214673920 | elapsed time per iteration (s): 0.08 | learning rate: 2.746E-05 | global batch size: 256 | lm loss: 4.509439E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.493 | TFLOPs: 11.84 | 7: iteration 151100/ 173500 | consumed samples: 38681600 | consumed tokens: 79219916800 | elapsed time per iteration (s): 0.08 | learning rate: 2.745E-05 | global batch size: 256 | lm loss: 4.499100E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.352 | TFLOPs: 11.95 | 7: iteration 151110/ 173500 | consumed samples: 38684160 | consumed tokens: 79225159680 | elapsed time per iteration (s): 0.08 | learning rate: 2.744E-05 | global batch size: 256 | lm loss: 4.497371E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.157 | TFLOPs: 11.84 | 7: iteration 151120/ 173500 | consumed samples: 38686720 | consumed tokens: 79230402560 | elapsed time per iteration (s): 0.08 | learning rate: 2.744E-05 | global batch size: 256 | lm loss: 4.508843E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.443 | TFLOPs: 11.91 | 7: iteration 151130/ 173500 | consumed samples: 38689280 | consumed tokens: 79235645440 | elapsed time per iteration (s): 0.08 | learning rate: 2.743E-05 | global batch size: 256 | lm loss: 4.508472E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.291 | TFLOPs: 11.92 | 7: iteration 151140/ 173500 | consumed samples: 38691840 | consumed tokens: 79240888320 | elapsed time per iteration (s): 0.08 | learning rate: 2.742E-05 | global batch size: 256 | lm loss: 4.493167E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.567 | TFLOPs: 11.92 | 7: iteration 151150/ 173500 | consumed samples: 38694400 | consumed tokens: 79246131200 | elapsed time per iteration (s): 0.08 | learning rate: 2.742E-05 | global batch size: 256 | lm loss: 4.516959E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.512 | TFLOPs: 11.86 | 7: iteration 151160/ 173500 | consumed samples: 38696960 | consumed tokens: 79251374080 | elapsed time per iteration (s): 0.08 | learning rate: 2.741E-05 | global batch size: 256 | lm loss: 4.507728E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.237 | TFLOPs: 11.94 | 7: iteration 151170/ 173500 | consumed samples: 38699520 | consumed tokens: 79256616960 | elapsed time per iteration (s): 0.08 | learning rate: 2.740E-05 | global batch size: 256 | lm loss: 4.503824E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.736 | TFLOPs: 11.78 | 7: iteration 151180/ 173500 | consumed samples: 38702080 | consumed tokens: 79261859840 | elapsed time per iteration (s): 0.08 | learning rate: 2.740E-05 | global batch size: 256 | lm loss: 4.505827E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.304 | TFLOPs: 11.87 | 7: iteration 151190/ 173500 | consumed samples: 38704640 | consumed tokens: 79267102720 | elapsed time per iteration (s): 0.08 | learning rate: 2.739E-05 | global batch size: 256 | lm loss: 4.505201E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.094 | TFLOPs: 11.87 | 7: iteration 151200/ 173500 | consumed samples: 38707200 | consumed tokens: 79272345600 | elapsed time per iteration (s): 0.08 | learning rate: 2.738E-05 | global batch size: 256 | lm loss: 4.491036E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.713 | TFLOPs: 11.88 | 7: iteration 151210/ 173500 | consumed samples: 38709760 | consumed tokens: 79277588480 | elapsed time per iteration (s): 0.08 | learning rate: 2.738E-05 | global batch size: 256 | lm loss: 4.502481E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.549 | TFLOPs: 11.86 | 7: iteration 151220/ 173500 | consumed samples: 38712320 | consumed tokens: 79282831360 | elapsed time per iteration (s): 0.08 | learning rate: 2.737E-05 | global batch size: 256 | lm loss: 4.509886E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.397 | TFLOPs: 11.86 | 7: iteration 151230/ 173500 | consumed samples: 38714880 | consumed tokens: 79288074240 | elapsed time per iteration (s): 0.08 | learning rate: 2.736E-05 | global batch size: 256 | lm loss: 4.498189E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.644 | TFLOPs: 11.99 | 7: iteration 151240/ 173500 | consumed samples: 38717440 | consumed tokens: 79293317120 | elapsed time per iteration (s): 0.08 | learning rate: 2.736E-05 | global batch size: 256 | lm loss: 4.520302E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.702 | TFLOPs: 11.99 | 7: iteration 151250/ 173500 | consumed samples: 38720000 | consumed tokens: 79298560000 | elapsed time per iteration (s): 0.08 | learning rate: 2.735E-05 | global batch size: 256 | lm loss: 4.495930E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.665 | TFLOPs: 11.98 | 7: iteration 151260/ 173500 | consumed samples: 38722560 | consumed tokens: 79303802880 | elapsed time per iteration (s): 0.08 | learning rate: 2.734E-05 | global batch size: 256 | lm loss: 4.511121E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.178 | TFLOPs: 11.97 | 7: iteration 151270/ 173500 | consumed samples: 38725120 | consumed tokens: 79309045760 | elapsed time per iteration (s): 0.08 | learning rate: 2.734E-05 | global batch size: 256 | lm loss: 4.513039E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.611 | TFLOPs: 11.97 | 7: iteration 151280/ 173500 | consumed samples: 38727680 | consumed tokens: 79314288640 | elapsed time per iteration (s): 0.08 | learning rate: 2.733E-05 | global batch size: 256 | lm loss: 4.507528E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.141 | TFLOPs: 11.93 | 7: iteration 151290/ 173500 | consumed samples: 38730240 | consumed tokens: 79319531520 | elapsed time per iteration (s): 0.08 | learning rate: 2.732E-05 | global batch size: 256 | lm loss: 4.511841E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.673 | TFLOPs: 11.89 | 7: iteration 151300/ 173500 | consumed samples: 38732800 | consumed tokens: 79324774400 | elapsed time per iteration (s): 0.08 | learning rate: 2.732E-05 | global batch size: 256 | lm loss: 4.501810E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.805 | TFLOPs: 11.90 | 7: iteration 151310/ 173500 | consumed samples: 38735360 | consumed tokens: 79330017280 | elapsed time per iteration (s): 0.08 | learning rate: 2.731E-05 | global batch size: 256 | lm loss: 4.497941E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.891 | TFLOPs: 11.94 | 7: iteration 151320/ 173500 | consumed samples: 38737920 | consumed tokens: 79335260160 | elapsed time per iteration (s): 0.08 | learning rate: 2.731E-05 | global batch size: 256 | lm loss: 4.500653E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.123 | TFLOPs: 11.96 | 7: iteration 151330/ 173500 | consumed samples: 38740480 | consumed tokens: 79340503040 | elapsed time per iteration (s): 0.08 | learning rate: 2.730E-05 | global batch size: 256 | lm loss: 4.506202E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.136 | TFLOPs: 11.84 | 7: iteration 151340/ 173500 | consumed samples: 38743040 | consumed tokens: 79345745920 | elapsed time per iteration (s): 0.08 | learning rate: 2.729E-05 | global batch size: 256 | lm loss: 4.503387E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.860 | TFLOPs: 11.92 | 7: iteration 151350/ 173500 | consumed samples: 38745600 | consumed tokens: 79350988800 | elapsed time per iteration (s): 0.08 | learning rate: 2.729E-05 | global batch size: 256 | lm loss: 4.493156E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.304 | TFLOPs: 11.90 | 7: iteration 151360/ 173500 | consumed samples: 38748160 | consumed tokens: 79356231680 | elapsed time per iteration (s): 0.08 | learning rate: 2.728E-05 | global batch size: 256 | lm loss: 4.505215E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.548 | TFLOPs: 11.85 | 7: iteration 151370/ 173500 | consumed samples: 38750720 | consumed tokens: 79361474560 | elapsed time per iteration (s): 0.08 | learning rate: 2.727E-05 | global batch size: 256 | lm loss: 4.505075E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.976 | TFLOPs: 11.98 | 7: iteration 151380/ 173500 | consumed samples: 38753280 | consumed tokens: 79366717440 | elapsed time per iteration (s): 0.08 | learning rate: 2.727E-05 | global batch size: 256 | lm loss: 4.505214E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.854 | TFLOPs: 11.90 | 7: iteration 151390/ 173500 | consumed samples: 38755840 | consumed tokens: 79371960320 | elapsed time per iteration (s): 0.09 | learning rate: 2.726E-05 | global batch size: 256 | lm loss: 4.501498E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2695.666 | TFLOPs: 10.03 | 7: iteration 151400/ 173500 | consumed samples: 38758400 | consumed tokens: 79377203200 | elapsed time per iteration (s): 0.09 | learning rate: 2.725E-05 | global batch size: 256 | lm loss: 4.510207E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2738.185 | TFLOPs: 10.18 | 7: iteration 151410/ 173500 | consumed samples: 38760960 | consumed tokens: 79382446080 | elapsed time per iteration (s): 0.08 | learning rate: 2.725E-05 | global batch size: 256 | lm loss: 4.504063E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.181 | TFLOPs: 11.94 | 7: iteration 151420/ 173500 | consumed samples: 38763520 | consumed tokens: 79387688960 | elapsed time per iteration (s): 0.08 | learning rate: 2.724E-05 | global batch size: 256 | lm loss: 4.493997E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.763 | TFLOPs: 11.91 | 7: iteration 151430/ 173500 | consumed samples: 38766080 | consumed tokens: 79392931840 | elapsed time per iteration (s): 0.08 | learning rate: 2.723E-05 | global batch size: 256 | lm loss: 4.505981E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.587 | TFLOPs: 11.89 | 7: iteration 151440/ 173500 | consumed samples: 38768640 | consumed tokens: 79398174720 | elapsed time per iteration (s): 0.08 | learning rate: 2.723E-05 | global batch size: 256 | lm loss: 4.506464E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.441 | TFLOPs: 11.90 | 7: iteration 151450/ 173500 | consumed samples: 38771200 | consumed tokens: 79403417600 | elapsed time per iteration (s): 0.08 | learning rate: 2.722E-05 | global batch size: 256 | lm loss: 4.502093E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.775 | TFLOPs: 11.87 | 7: iteration 151460/ 173500 | consumed samples: 38773760 | consumed tokens: 79408660480 | elapsed time per iteration (s): 0.08 | learning rate: 2.721E-05 | global batch size: 256 | lm loss: 4.492883E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.912 | TFLOPs: 11.82 | 7: iteration 151470/ 173500 | consumed samples: 38776320 | consumed tokens: 79413903360 | elapsed time per iteration (s): 0.08 | learning rate: 2.721E-05 | global batch size: 256 | lm loss: 4.497243E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.405 | TFLOPs: 11.81 | 7: iteration 151480/ 173500 | consumed samples: 38778880 | consumed tokens: 79419146240 | elapsed time per iteration (s): 0.08 | learning rate: 2.720E-05 | global batch size: 256 | lm loss: 4.500389E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.761 | TFLOPs: 11.79 | 7: iteration 151490/ 173500 | consumed samples: 38781440 | consumed tokens: 79424389120 | elapsed time per iteration (s): 0.08 | learning rate: 2.719E-05 | global batch size: 256 | lm loss: 4.505346E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.407 | TFLOPs: 11.90 | 7: iteration 151500/ 173500 | consumed samples: 38784000 | consumed tokens: 79429632000 | elapsed time per iteration (s): 0.08 | learning rate: 2.719E-05 | global batch size: 256 | lm loss: 4.507980E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.212 | TFLOPs: 11.86 | 7: iteration 151510/ 173500 | consumed samples: 38786560 | consumed tokens: 79434874880 | elapsed time per iteration (s): 0.08 | learning rate: 2.718E-05 | global batch size: 256 | lm loss: 4.511966E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.549 | TFLOPs: 11.96 | 7: iteration 151520/ 173500 | consumed samples: 38789120 | consumed tokens: 79440117760 | elapsed time per iteration (s): 0.08 | learning rate: 2.718E-05 | global batch size: 256 | lm loss: 4.495019E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.828 | TFLOPs: 11.65 | 7: iteration 151530/ 173500 | consumed samples: 38791680 | consumed tokens: 79445360640 | elapsed time per iteration (s): 0.08 | learning rate: 2.717E-05 | global batch size: 256 | lm loss: 4.510049E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.433 | TFLOPs: 11.88 | 7: iteration 151540/ 173500 | consumed samples: 38794240 | consumed tokens: 79450603520 | elapsed time per iteration (s): 0.08 | learning rate: 2.716E-05 | global batch size: 256 | lm loss: 4.497727E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.086 | TFLOPs: 11.89 | 7: iteration 151550/ 173500 | consumed samples: 38796800 | consumed tokens: 79455846400 | elapsed time per iteration (s): 0.08 | learning rate: 2.716E-05 | global batch size: 256 | lm loss: 4.510474E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.030 | TFLOPs: 11.92 | 7: iteration 151560/ 173500 | consumed samples: 38799360 | consumed tokens: 79461089280 | elapsed time per iteration (s): 0.08 | learning rate: 2.715E-05 | global batch size: 256 | lm loss: 4.502509E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.128 | TFLOPs: 11.84 | 7: iteration 151570/ 173500 | consumed samples: 38801920 | consumed tokens: 79466332160 | elapsed time per iteration (s): 0.08 | learning rate: 2.714E-05 | global batch size: 256 | lm loss: 4.503096E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.262 | TFLOPs: 11.90 | 7: iteration 151580/ 173500 | consumed samples: 38804480 | consumed tokens: 79471575040 | elapsed time per iteration (s): 0.08 | learning rate: 2.714E-05 | global batch size: 256 | lm loss: 4.515083E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.895 | TFLOPs: 11.92 | 7: iteration 151590/ 173500 | consumed samples: 38807040 | consumed tokens: 79476817920 | elapsed time per iteration (s): 0.08 | learning rate: 2.713E-05 | global batch size: 256 | lm loss: 4.516830E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.955 | TFLOPs: 11.87 | 7: iteration 151600/ 173500 | consumed samples: 38809600 | consumed tokens: 79482060800 | elapsed time per iteration (s): 0.08 | learning rate: 2.712E-05 | global batch size: 256 | lm loss: 4.514047E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.233 | TFLOPs: 11.96 | 7: iteration 151610/ 173500 | consumed samples: 38812160 | consumed tokens: 79487303680 | elapsed time per iteration (s): 0.08 | learning rate: 2.712E-05 | global batch size: 256 | lm loss: 4.509630E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.174 | TFLOPs: 11.93 | 7: iteration 151620/ 173500 | consumed samples: 38814720 | consumed tokens: 79492546560 | elapsed time per iteration (s): 0.08 | learning rate: 2.711E-05 | global batch size: 256 | lm loss: 4.498261E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.445 | TFLOPs: 11.89 | 7: iteration 151630/ 173500 | consumed samples: 38817280 | consumed tokens: 79497789440 | elapsed time per iteration (s): 0.08 | learning rate: 2.710E-05 | global batch size: 256 | lm loss: 4.513271E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.642 | TFLOPs: 11.89 | 7: iteration 151640/ 173500 | consumed samples: 38819840 | consumed tokens: 79503032320 | elapsed time per iteration (s): 0.08 | learning rate: 2.710E-05 | global batch size: 256 | lm loss: 4.501081E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.681 | TFLOPs: 11.95 | 7: iteration 151650/ 173500 | consumed samples: 38822400 | consumed tokens: 79508275200 | elapsed time per iteration (s): 0.08 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 4.516354E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.500 | TFLOPs: 11.92 | 7: iteration 151660/ 173500 | consumed samples: 38824960 | consumed tokens: 79513518080 | elapsed time per iteration (s): 0.08 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 4.511324E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.872 | TFLOPs: 11.93 | 7: iteration 151670/ 173500 | consumed samples: 38827520 | consumed tokens: 79518760960 | elapsed time per iteration (s): 0.08 | learning rate: 2.708E-05 | global batch size: 256 | lm loss: 4.505135E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.653 | TFLOPs: 11.91 | 7: iteration 151680/ 173500 | consumed samples: 38830080 | consumed tokens: 79524003840 | elapsed time per iteration (s): 0.08 | learning rate: 2.707E-05 | global batch size: 256 | lm loss: 4.493085E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.573 | TFLOPs: 11.93 | 7: iteration 151690/ 173500 | consumed samples: 38832640 | consumed tokens: 79529246720 | elapsed time per iteration (s): 0.08 | learning rate: 2.707E-05 | global batch size: 256 | lm loss: 4.501081E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.118 | TFLOPs: 11.92 | 7: iteration 151700/ 173500 | consumed samples: 38835200 | consumed tokens: 79534489600 | elapsed time per iteration (s): 0.08 | learning rate: 2.706E-05 | global batch size: 256 | lm loss: 4.509937E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.216 | TFLOPs: 11.97 | 7: iteration 151710/ 173500 | consumed samples: 38837760 | consumed tokens: 79539732480 | elapsed time per iteration (s): 0.08 | learning rate: 2.705E-05 | global batch size: 256 | lm loss: 4.513428E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.695 | TFLOPs: 11.93 | 7: iteration 151720/ 173500 | consumed samples: 38840320 | consumed tokens: 79544975360 | elapsed time per iteration (s): 0.08 | learning rate: 2.705E-05 | global batch size: 256 | lm loss: 4.498571E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.969 | TFLOPs: 11.95 | 7: iteration 151730/ 173500 | consumed samples: 38842880 | consumed tokens: 79550218240 | elapsed time per iteration (s): 0.08 | learning rate: 2.704E-05 | global batch size: 256 | lm loss: 4.520054E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.312 | TFLOPs: 11.89 | 7: iteration 151740/ 173500 | consumed samples: 38845440 | consumed tokens: 79555461120 | elapsed time per iteration (s): 0.08 | learning rate: 2.703E-05 | global batch size: 256 | lm loss: 4.502774E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.613 | TFLOPs: 11.96 | 7: iteration 151750/ 173500 | consumed samples: 38848000 | consumed tokens: 79560704000 | elapsed time per iteration (s): 0.08 | learning rate: 2.703E-05 | global batch size: 256 | lm loss: 4.508866E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.278 | TFLOPs: 11.84 | 7: iteration 151760/ 173500 | consumed samples: 38850560 | consumed tokens: 79565946880 | elapsed time per iteration (s): 0.08 | learning rate: 2.702E-05 | global batch size: 256 | lm loss: 4.495673E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.038 | TFLOPs: 11.91 | 7: iteration 151770/ 173500 | consumed samples: 38853120 | consumed tokens: 79571189760 | elapsed time per iteration (s): 0.08 | learning rate: 2.702E-05 | global batch size: 256 | lm loss: 4.510577E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.972 | TFLOPs: 11.84 | 7: iteration 151780/ 173500 | consumed samples: 38855680 | consumed tokens: 79576432640 | elapsed time per iteration (s): 0.08 | learning rate: 2.701E-05 | global batch size: 256 | lm loss: 4.502789E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.903 | TFLOPs: 11.88 | 7: iteration 151790/ 173500 | consumed samples: 38858240 | consumed tokens: 79581675520 | elapsed time per iteration (s): 0.08 | learning rate: 2.700E-05 | global batch size: 256 | lm loss: 4.502900E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.636 | TFLOPs: 11.80 | 7: iteration 151800/ 173500 | consumed samples: 38860800 | consumed tokens: 79586918400 | elapsed time per iteration (s): 0.08 | learning rate: 2.700E-05 | global batch size: 256 | lm loss: 4.511337E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.938 | TFLOPs: 11.78 | 7: iteration 151810/ 173500 | consumed samples: 38863360 | consumed tokens: 79592161280 | elapsed time per iteration (s): 0.08 | learning rate: 2.699E-05 | global batch size: 256 | lm loss: 4.508411E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.012 | TFLOPs: 11.81 | 7: iteration 151820/ 173500 | consumed samples: 38865920 | consumed tokens: 79597404160 | elapsed time per iteration (s): 0.08 | learning rate: 2.698E-05 | global batch size: 256 | lm loss: 4.505721E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.027 | TFLOPs: 11.81 | 7: iteration 151830/ 173500 | consumed samples: 38868480 | consumed tokens: 79602647040 | elapsed time per iteration (s): 0.08 | learning rate: 2.698E-05 | global batch size: 256 | lm loss: 4.505870E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.540 | TFLOPs: 11.82 | 7: iteration 151840/ 173500 | consumed samples: 38871040 | consumed tokens: 79607889920 | elapsed time per iteration (s): 0.08 | learning rate: 2.697E-05 | global batch size: 256 | lm loss: 4.512537E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.198 | TFLOPs: 11.77 | 7: iteration 151850/ 173500 | consumed samples: 38873600 | consumed tokens: 79613132800 | elapsed time per iteration (s): 0.08 | learning rate: 2.696E-05 | global batch size: 256 | lm loss: 4.508901E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.260 | TFLOPs: 11.82 | 7: iteration 151860/ 173500 | consumed samples: 38876160 | consumed tokens: 79618375680 | elapsed time per iteration (s): 0.08 | learning rate: 2.696E-05 | global batch size: 256 | lm loss: 4.503487E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.863 | TFLOPs: 11.46 | 7: iteration 151870/ 173500 | consumed samples: 38878720 | consumed tokens: 79623618560 | elapsed time per iteration (s): 0.08 | learning rate: 2.695E-05 | global batch size: 256 | lm loss: 4.509507E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.302 | TFLOPs: 11.87 | 7: iteration 151880/ 173500 | consumed samples: 38881280 | consumed tokens: 79628861440 | elapsed time per iteration (s): 0.08 | learning rate: 2.695E-05 | global batch size: 256 | lm loss: 4.510532E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.159 | TFLOPs: 11.82 | 7: iteration 151890/ 173500 | consumed samples: 38883840 | consumed tokens: 79634104320 | elapsed time per iteration (s): 0.09 | learning rate: 2.694E-05 | global batch size: 256 | lm loss: 4.509337E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2730.573 | TFLOPs: 10.16 | 7: iteration 151900/ 173500 | consumed samples: 38886400 | consumed tokens: 79639347200 | elapsed time per iteration (s): 0.08 | learning rate: 2.693E-05 | global batch size: 256 | lm loss: 4.505660E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.932 | TFLOPs: 11.90 | 7: iteration 151910/ 173500 | consumed samples: 38888960 | consumed tokens: 79644590080 | elapsed time per iteration (s): 0.08 | learning rate: 2.693E-05 | global batch size: 256 | lm loss: 4.518329E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.745 | TFLOPs: 11.90 | 7: iteration 151920/ 173500 | consumed samples: 38891520 | consumed tokens: 79649832960 | elapsed time per iteration (s): 0.08 | learning rate: 2.692E-05 | global batch size: 256 | lm loss: 4.495972E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.940 | TFLOPs: 11.97 | 7: iteration 151930/ 173500 | consumed samples: 38894080 | consumed tokens: 79655075840 | elapsed time per iteration (s): 0.08 | learning rate: 2.691E-05 | global batch size: 256 | lm loss: 4.486243E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.752 | TFLOPs: 11.95 | 7: iteration 151940/ 173500 | consumed samples: 38896640 | consumed tokens: 79660318720 | elapsed time per iteration (s): 0.08 | learning rate: 2.691E-05 | global batch size: 256 | lm loss: 4.501217E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.464 | TFLOPs: 11.93 | 7: iteration 151950/ 173500 | consumed samples: 38899200 | consumed tokens: 79665561600 | elapsed time per iteration (s): 0.08 | learning rate: 2.690E-05 | global batch size: 256 | lm loss: 4.500317E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.024 | TFLOPs: 11.92 | 7: iteration 151960/ 173500 | consumed samples: 38901760 | consumed tokens: 79670804480 | elapsed time per iteration (s): 0.08 | learning rate: 2.689E-05 | global batch size: 256 | lm loss: 4.504324E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.031 | TFLOPs: 11.96 | 7: iteration 151970/ 173500 | consumed samples: 38904320 | consumed tokens: 79676047360 | elapsed time per iteration (s): 0.08 | learning rate: 2.689E-05 | global batch size: 256 | lm loss: 4.499749E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.785 | TFLOPs: 11.87 | 7: iteration 151980/ 173500 | consumed samples: 38906880 | consumed tokens: 79681290240 | elapsed time per iteration (s): 0.08 | learning rate: 2.688E-05 | global batch size: 256 | lm loss: 4.516420E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.984 | TFLOPs: 11.94 | 7: iteration 151990/ 173500 | consumed samples: 38909440 | consumed tokens: 79686533120 | elapsed time per iteration (s): 0.08 | learning rate: 2.688E-05 | global batch size: 256 | lm loss: 4.506338E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.706 | TFLOPs: 11.91 | 0: [2023-03-17 03:58:51,596] [INFO] [logging.py:68:log_dist] [Rank 0] step=152000, skipped=0, lr=[2.6869667028068037e-05, 2.6869667028068037e-05, 2.6869667028068037e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 152000/ 173500 | consumed samples: 38912000 | consumed tokens: 79691776000 | elapsed time per iteration (s): 0.08 | learning rate: 2.687E-05 | global batch size: 256 | lm loss: 4.510111E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.436 | TFLOPs: 11.94 | 0: steps: 152000 loss: 4.5146 iter time (s): 0.081 samples/sec: 3173.987 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 152000 | lm loss value: 4.405458E+00 | lm loss PPL: 8.189664E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 152000 to checkpoints_14m91b100m 0: [2023-03-17 03:58:51,656] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step152000 is begin to save! 0: [2023-03-17 03:58:51,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_01-model_00-model_states.pt... 0: [2023-03-17 03:58:51,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_01-model_00-model_states.pt. 0: [2023-03-17 03:58:51,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_03-model_00-model_states.pt... 0: [2023-03-17 03:58:51,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_03-model_00-model_states.pt. 0: [2023-03-17 03:58:51,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_04-model_00-model_states.pt... 0: [2023-03-17 03:58:51,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_04-model_00-model_states.pt. 0: [2023-03-17 03:58:51,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_05-model_00-model_states.pt... 0: [2023-03-17 03:58:51,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_05-model_00-model_states.pt. 0: [2023-03-17 03:58:51,785] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_06-model_00-model_states.pt... 0: [2023-03-17 03:58:51,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_06-model_00-model_states.pt. 0: [2023-03-17 03:58:51,788] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/layer_08-model_00-model_states.pt... 0: [2023-03-17 03:58:51,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/layer_08-model_00-model_states.pt. 0: [2023-03-17 03:58:51,789] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step152000/mp_rank_00_model_states.pt 0: [2023-03-17 03:58:51,789] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/mp_rank_00_model_states.pt... 0: [2023-03-17 03:58:51,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/mp_rank_00_model_states.pt. 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 03:58:51,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:58:51,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: [2023-03-17 03:58:51,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 7: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 6: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 2: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 3: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 3: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 4: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 1: [2023-03-17 03:58:51,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 03:58:51,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 5: [2023-03-17 03:58:51,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 03:58:51,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step152000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 03:58:51,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step152000 is ready now! 0: successfully saved checkpoint at iteration 152000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 172.01 7: iteration 152010/ 173500 | consumed samples: 38914560 | consumed tokens: 79697018880 | elapsed time per iteration (s): 0.10 | learning rate: 2.686E-05 | global batch size: 256 | lm loss: 4.501013E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.948 | TFLOPs: 9.33 | 7: iteration 152020/ 173500 | consumed samples: 38917120 | consumed tokens: 79702261760 | elapsed time per iteration (s): 0.08 | learning rate: 2.686E-05 | global batch size: 256 | lm loss: 4.500344E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.061 | TFLOPs: 11.99 | 7: iteration 152030/ 173500 | consumed samples: 38919680 | consumed tokens: 79707504640 | elapsed time per iteration (s): 0.08 | learning rate: 2.685E-05 | global batch size: 256 | lm loss: 4.505178E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.718 | TFLOPs: 11.96 | 7: iteration 152040/ 173500 | consumed samples: 38922240 | consumed tokens: 79712747520 | elapsed time per iteration (s): 0.08 | learning rate: 2.684E-05 | global batch size: 256 | lm loss: 4.501928E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.525 | TFLOPs: 11.92 | 7: iteration 152050/ 173500 | consumed samples: 38924800 | consumed tokens: 79717990400 | elapsed time per iteration (s): 0.08 | learning rate: 2.684E-05 | global batch size: 256 | lm loss: 4.517590E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.988 | TFLOPs: 11.97 | 7: iteration 152060/ 173500 | consumed samples: 38927360 | consumed tokens: 79723233280 | elapsed time per iteration (s): 0.08 | learning rate: 2.683E-05 | global batch size: 256 | lm loss: 4.521519E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.663 | TFLOPs: 11.96 | 7: iteration 152070/ 173500 | consumed samples: 38929920 | consumed tokens: 79728476160 | elapsed time per iteration (s): 0.08 | learning rate: 2.683E-05 | global batch size: 256 | lm loss: 4.508582E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.113 | TFLOPs: 12.00 | 7: iteration 152080/ 173500 | consumed samples: 38932480 | consumed tokens: 79733719040 | elapsed time per iteration (s): 0.08 | learning rate: 2.682E-05 | global batch size: 256 | lm loss: 4.518900E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.717 | TFLOPs: 11.88 | 7: iteration 152090/ 173500 | consumed samples: 38935040 | consumed tokens: 79738961920 | elapsed time per iteration (s): 0.08 | learning rate: 2.681E-05 | global batch size: 256 | lm loss: 4.505608E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.084 | TFLOPs: 11.98 | 7: iteration 152100/ 173500 | consumed samples: 38937600 | consumed tokens: 79744204800 | elapsed time per iteration (s): 0.08 | learning rate: 2.681E-05 | global batch size: 256 | lm loss: 4.506029E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.037 | TFLOPs: 11.99 | 7: iteration 152110/ 173500 | consumed samples: 38940160 | consumed tokens: 79749447680 | elapsed time per iteration (s): 0.08 | learning rate: 2.680E-05 | global batch size: 256 | lm loss: 4.516172E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.218 | TFLOPs: 11.96 | 7: iteration 152120/ 173500 | consumed samples: 38942720 | consumed tokens: 79754690560 | elapsed time per iteration (s): 0.08 | learning rate: 2.679E-05 | global batch size: 256 | lm loss: 4.509493E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.594 | TFLOPs: 12.05 | 7: iteration 152130/ 173500 | consumed samples: 38945280 | consumed tokens: 79759933440 | elapsed time per iteration (s): 0.08 | learning rate: 2.679E-05 | global batch size: 256 | lm loss: 4.502931E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3246.027 | TFLOPs: 12.07 | 7: iteration 152140/ 173500 | consumed samples: 38947840 | consumed tokens: 79765176320 | elapsed time per iteration (s): 0.08 | learning rate: 2.678E-05 | global batch size: 256 | lm loss: 4.506573E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.063 | TFLOPs: 11.98 | 7: iteration 152150/ 173500 | consumed samples: 38950400 | consumed tokens: 79770419200 | elapsed time per iteration (s): 0.08 | learning rate: 2.678E-05 | global batch size: 256 | lm loss: 4.493337E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.560 | TFLOPs: 11.99 | 7: iteration 152160/ 173500 | consumed samples: 38952960 | consumed tokens: 79775662080 | elapsed time per iteration (s): 0.08 | learning rate: 2.677E-05 | global batch size: 256 | lm loss: 4.497055E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.901 | TFLOPs: 11.95 | 7: iteration 152170/ 173500 | consumed samples: 38955520 | consumed tokens: 79780904960 | elapsed time per iteration (s): 0.08 | learning rate: 2.676E-05 | global batch size: 256 | lm loss: 4.512734E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.205 | TFLOPs: 11.96 | 7: iteration 152180/ 173500 | consumed samples: 38958080 | consumed tokens: 79786147840 | elapsed time per iteration (s): 0.08 | learning rate: 2.676E-05 | global batch size: 256 | lm loss: 4.501480E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.556 | TFLOPs: 11.96 | 7: iteration 152190/ 173500 | consumed samples: 38960640 | consumed tokens: 79791390720 | elapsed time per iteration (s): 0.08 | learning rate: 2.675E-05 | global batch size: 256 | lm loss: 4.510872E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.840 | TFLOPs: 11.95 | 7: iteration 152200/ 173500 | consumed samples: 38963200 | consumed tokens: 79796633600 | elapsed time per iteration (s): 0.08 | learning rate: 2.674E-05 | global batch size: 256 | lm loss: 4.502519E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.430 | TFLOPs: 11.94 | 7: iteration 152210/ 173500 | consumed samples: 38965760 | consumed tokens: 79801876480 | elapsed time per iteration (s): 0.08 | learning rate: 2.674E-05 | global batch size: 256 | lm loss: 4.498834E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.422 | TFLOPs: 11.92 | 7: iteration 152220/ 173500 | consumed samples: 38968320 | consumed tokens: 79807119360 | elapsed time per iteration (s): 0.08 | learning rate: 2.673E-05 | global batch size: 256 | lm loss: 4.497113E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.157 | TFLOPs: 11.79 | 7: iteration 152230/ 173500 | consumed samples: 38970880 | consumed tokens: 79812362240 | elapsed time per iteration (s): 0.08 | learning rate: 2.673E-05 | global batch size: 256 | lm loss: 4.521679E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.133 | TFLOPs: 11.93 | 7: iteration 152240/ 173500 | consumed samples: 38973440 | consumed tokens: 79817605120 | elapsed time per iteration (s): 0.08 | learning rate: 2.672E-05 | global batch size: 256 | lm loss: 4.509917E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.901 | TFLOPs: 11.94 | 7: iteration 152250/ 173500 | consumed samples: 38976000 | consumed tokens: 79822848000 | elapsed time per iteration (s): 0.08 | learning rate: 2.671E-05 | global batch size: 256 | lm loss: 4.496333E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.231 | TFLOPs: 11.92 | 7: iteration 152260/ 173500 | consumed samples: 38978560 | consumed tokens: 79828090880 | elapsed time per iteration (s): 0.08 | learning rate: 2.671E-05 | global batch size: 256 | lm loss: 4.499780E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.400 | TFLOPs: 11.77 | 7: iteration 152270/ 173500 | consumed samples: 38981120 | consumed tokens: 79833333760 | elapsed time per iteration (s): 0.08 | learning rate: 2.670E-05 | global batch size: 256 | lm loss: 4.505893E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.329 | TFLOPs: 11.87 | 7: iteration 152280/ 173500 | consumed samples: 38983680 | consumed tokens: 79838576640 | elapsed time per iteration (s): 0.08 | learning rate: 2.669E-05 | global batch size: 256 | lm loss: 4.498061E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.663 | TFLOPs: 11.86 | 7: iteration 152290/ 173500 | consumed samples: 38986240 | consumed tokens: 79843819520 | elapsed time per iteration (s): 0.08 | learning rate: 2.669E-05 | global batch size: 256 | lm loss: 4.517237E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.614 | TFLOPs: 11.90 | 7: iteration 152300/ 173500 | consumed samples: 38988800 | consumed tokens: 79849062400 | elapsed time per iteration (s): 0.08 | learning rate: 2.668E-05 | global batch size: 256 | lm loss: 4.515043E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.513 | TFLOPs: 11.87 | 7: iteration 152310/ 173500 | consumed samples: 38991360 | consumed tokens: 79854305280 | elapsed time per iteration (s): 0.08 | learning rate: 2.668E-05 | global batch size: 256 | lm loss: 4.517156E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.805 | TFLOPs: 11.90 | 7: iteration 152320/ 173500 | consumed samples: 38993920 | consumed tokens: 79859548160 | elapsed time per iteration (s): 0.08 | learning rate: 2.667E-05 | global batch size: 256 | lm loss: 4.515259E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.205 | TFLOPs: 11.88 | 7: iteration 152330/ 173500 | consumed samples: 38996480 | consumed tokens: 79864791040 | elapsed time per iteration (s): 0.08 | learning rate: 2.666E-05 | global batch size: 256 | lm loss: 4.501101E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.170 | TFLOPs: 11.92 | 7: iteration 152340/ 173500 | consumed samples: 38999040 | consumed tokens: 79870033920 | elapsed time per iteration (s): 0.08 | learning rate: 2.666E-05 | global batch size: 256 | lm loss: 4.496413E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.858 | TFLOPs: 11.88 | 7: iteration 152350/ 173500 | consumed samples: 39001600 | consumed tokens: 79875276800 | elapsed time per iteration (s): 0.08 | learning rate: 2.665E-05 | global batch size: 256 | lm loss: 4.512354E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.872 | TFLOPs: 11.88 | 7: iteration 152360/ 173500 | consumed samples: 39004160 | consumed tokens: 79880519680 | elapsed time per iteration (s): 0.08 | learning rate: 2.664E-05 | global batch size: 256 | lm loss: 4.502190E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.265 | TFLOPs: 11.82 | 7: iteration 152370/ 173500 | consumed samples: 39006720 | consumed tokens: 79885762560 | elapsed time per iteration (s): 0.08 | learning rate: 2.664E-05 | global batch size: 256 | lm loss: 4.505059E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.767 | TFLOPs: 11.83 | 7: iteration 152380/ 173500 | consumed samples: 39009280 | consumed tokens: 79891005440 | elapsed time per iteration (s): 0.08 | learning rate: 2.663E-05 | global batch size: 256 | lm loss: 4.510673E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.499 | TFLOPs: 11.82 | 7: iteration 152390/ 173500 | consumed samples: 39011840 | consumed tokens: 79896248320 | elapsed time per iteration (s): 0.08 | learning rate: 2.663E-05 | global batch size: 256 | lm loss: 4.505973E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.101 | TFLOPs: 11.76 | 7: iteration 152400/ 173500 | consumed samples: 39014400 | consumed tokens: 79901491200 | elapsed time per iteration (s): 0.08 | learning rate: 2.662E-05 | global batch size: 256 | lm loss: 4.500118E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.021 | TFLOPs: 11.89 | 7: iteration 152410/ 173500 | consumed samples: 39016960 | consumed tokens: 79906734080 | elapsed time per iteration (s): 0.08 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 4.492770E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.341 | TFLOPs: 11.87 | 7: iteration 152420/ 173500 | consumed samples: 39019520 | consumed tokens: 79911976960 | elapsed time per iteration (s): 0.08 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 4.497549E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.175 | TFLOPs: 11.88 | 7: iteration 152430/ 173500 | consumed samples: 39022080 | consumed tokens: 79917219840 | elapsed time per iteration (s): 0.08 | learning rate: 2.660E-05 | global batch size: 256 | lm loss: 4.496064E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.858 | TFLOPs: 11.88 | 7: iteration 152440/ 173500 | consumed samples: 39024640 | consumed tokens: 79922462720 | elapsed time per iteration (s): 0.08 | learning rate: 2.659E-05 | global batch size: 256 | lm loss: 4.495866E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.443 | TFLOPs: 11.76 | 7: iteration 152450/ 173500 | consumed samples: 39027200 | consumed tokens: 79927705600 | elapsed time per iteration (s): 0.08 | learning rate: 2.659E-05 | global batch size: 256 | lm loss: 4.510468E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.849 | TFLOPs: 11.89 | 7: iteration 152460/ 173500 | consumed samples: 39029760 | consumed tokens: 79932948480 | elapsed time per iteration (s): 0.08 | learning rate: 2.658E-05 | global batch size: 256 | lm loss: 4.514979E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.402 | TFLOPs: 11.82 | 7: iteration 152470/ 173500 | consumed samples: 39032320 | consumed tokens: 79938191360 | elapsed time per iteration (s): 0.08 | learning rate: 2.658E-05 | global batch size: 256 | lm loss: 4.505523E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.484 | TFLOPs: 11.73 | 7: iteration 152480/ 173500 | consumed samples: 39034880 | consumed tokens: 79943434240 | elapsed time per iteration (s): 0.08 | learning rate: 2.657E-05 | global batch size: 256 | lm loss: 4.508056E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.823 | TFLOPs: 11.88 | 7: iteration 152490/ 173500 | consumed samples: 39037440 | consumed tokens: 79948677120 | elapsed time per iteration (s): 0.08 | learning rate: 2.656E-05 | global batch size: 256 | lm loss: 4.501081E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.382 | TFLOPs: 11.86 | 7: iteration 152500/ 173500 | consumed samples: 39040000 | consumed tokens: 79953920000 | elapsed time per iteration (s): 0.08 | learning rate: 2.656E-05 | global batch size: 256 | lm loss: 4.506609E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.818 | TFLOPs: 11.94 | 7: iteration 152510/ 173500 | consumed samples: 39042560 | consumed tokens: 79959162880 | elapsed time per iteration (s): 0.08 | learning rate: 2.655E-05 | global batch size: 256 | lm loss: 4.508696E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.871 | TFLOPs: 11.94 | 7: iteration 152520/ 173500 | consumed samples: 39045120 | consumed tokens: 79964405760 | elapsed time per iteration (s): 0.08 | learning rate: 2.655E-05 | global batch size: 256 | lm loss: 4.497071E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.627 | TFLOPs: 11.94 | 7: iteration 152530/ 173500 | consumed samples: 39047680 | consumed tokens: 79969648640 | elapsed time per iteration (s): 0.08 | learning rate: 2.654E-05 | global batch size: 256 | lm loss: 4.486537E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.471 | TFLOPs: 11.93 | 7: iteration 152540/ 173500 | consumed samples: 39050240 | consumed tokens: 79974891520 | elapsed time per iteration (s): 0.08 | learning rate: 2.653E-05 | global batch size: 256 | lm loss: 4.511161E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.501 | TFLOPs: 11.94 | 7: iteration 152550/ 173500 | consumed samples: 39052800 | consumed tokens: 79980134400 | elapsed time per iteration (s): 0.08 | learning rate: 2.653E-05 | global batch size: 256 | lm loss: 4.508237E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.794 | TFLOPs: 11.92 | 7: iteration 152560/ 173500 | consumed samples: 39055360 | consumed tokens: 79985377280 | elapsed time per iteration (s): 0.08 | learning rate: 2.652E-05 | global batch size: 256 | lm loss: 4.524050E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.949 | TFLOPs: 11.93 | 7: iteration 152570/ 173500 | consumed samples: 39057920 | consumed tokens: 79990620160 | elapsed time per iteration (s): 0.08 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 4.491484E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.108 | TFLOPs: 11.80 | 7: iteration 152580/ 173500 | consumed samples: 39060480 | consumed tokens: 79995863040 | elapsed time per iteration (s): 0.08 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 4.501101E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.831 | TFLOPs: 11.94 | 7: iteration 152590/ 173500 | consumed samples: 39063040 | consumed tokens: 80001105920 | elapsed time per iteration (s): 0.08 | learning rate: 2.650E-05 | global batch size: 256 | lm loss: 4.494738E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.346 | TFLOPs: 11.88 | 7: iteration 152600/ 173500 | consumed samples: 39065600 | consumed tokens: 80006348800 | elapsed time per iteration (s): 0.08 | learning rate: 2.650E-05 | global batch size: 256 | lm loss: 4.516704E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.802 | TFLOPs: 11.85 | 7: iteration 152610/ 173500 | consumed samples: 39068160 | consumed tokens: 80011591680 | elapsed time per iteration (s): 0.08 | learning rate: 2.649E-05 | global batch size: 256 | lm loss: 4.495886E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.142 | TFLOPs: 11.93 | 7: iteration 152620/ 173500 | consumed samples: 39070720 | consumed tokens: 80016834560 | elapsed time per iteration (s): 0.08 | learning rate: 2.648E-05 | global batch size: 256 | lm loss: 4.511680E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.393 | TFLOPs: 11.90 | 7: iteration 152630/ 173500 | consumed samples: 39073280 | consumed tokens: 80022077440 | elapsed time per iteration (s): 0.08 | learning rate: 2.648E-05 | global batch size: 256 | lm loss: 4.508566E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.103 | TFLOPs: 11.89 | 7: iteration 152640/ 173500 | consumed samples: 39075840 | consumed tokens: 80027320320 | elapsed time per iteration (s): 0.08 | learning rate: 2.647E-05 | global batch size: 256 | lm loss: 4.500296E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.518 | TFLOPs: 11.80 | 7: iteration 152650/ 173500 | consumed samples: 39078400 | consumed tokens: 80032563200 | elapsed time per iteration (s): 0.08 | learning rate: 2.647E-05 | global batch size: 256 | lm loss: 4.509451E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.445 | TFLOPs: 11.87 | 7: iteration 152660/ 173500 | consumed samples: 39080960 | consumed tokens: 80037806080 | elapsed time per iteration (s): 0.08 | learning rate: 2.646E-05 | global batch size: 256 | lm loss: 4.509603E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.852 | TFLOPs: 11.89 | 7: iteration 152670/ 173500 | consumed samples: 39083520 | consumed tokens: 80043048960 | elapsed time per iteration (s): 0.08 | learning rate: 2.645E-05 | global batch size: 256 | lm loss: 4.511360E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.537 | TFLOPs: 11.88 | 7: iteration 152680/ 173500 | consumed samples: 39086080 | consumed tokens: 80048291840 | elapsed time per iteration (s): 0.08 | learning rate: 2.645E-05 | global batch size: 256 | lm loss: 4.505387E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.568 | TFLOPs: 11.80 | 7: iteration 152690/ 173500 | consumed samples: 39088640 | consumed tokens: 80053534720 | elapsed time per iteration (s): 0.08 | learning rate: 2.644E-05 | global batch size: 256 | lm loss: 4.512357E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3118.849 | TFLOPs: 11.60 | 7: iteration 152700/ 173500 | consumed samples: 39091200 | consumed tokens: 80058777600 | elapsed time per iteration (s): 0.08 | learning rate: 2.643E-05 | global batch size: 256 | lm loss: 4.500048E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.426 | TFLOPs: 11.93 | 7: iteration 152710/ 173500 | consumed samples: 39093760 | consumed tokens: 80064020480 | elapsed time per iteration (s): 0.08 | learning rate: 2.643E-05 | global batch size: 256 | lm loss: 4.495896E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3080.116 | TFLOPs: 11.46 | 7: iteration 152720/ 173500 | consumed samples: 39096320 | consumed tokens: 80069263360 | elapsed time per iteration (s): 0.08 | learning rate: 2.642E-05 | global batch size: 256 | lm loss: 4.512106E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.707 | TFLOPs: 11.88 | 7: iteration 152730/ 173500 | consumed samples: 39098880 | consumed tokens: 80074506240 | elapsed time per iteration (s): 0.08 | learning rate: 2.642E-05 | global batch size: 256 | lm loss: 4.506158E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.456 | TFLOPs: 11.96 | 7: iteration 152740/ 173500 | consumed samples: 39101440 | consumed tokens: 80079749120 | elapsed time per iteration (s): 0.08 | learning rate: 2.641E-05 | global batch size: 256 | lm loss: 4.504929E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.845 | TFLOPs: 11.95 | 7: iteration 152750/ 173500 | consumed samples: 39104000 | consumed tokens: 80084992000 | elapsed time per iteration (s): 0.10 | learning rate: 2.640E-05 | global batch size: 256 | lm loss: 4.512654E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.337 | TFLOPs: 9.37 | 7: iteration 152760/ 173500 | consumed samples: 39106560 | consumed tokens: 80090234880 | elapsed time per iteration (s): 0.13 | learning rate: 2.640E-05 | global batch size: 256 | lm loss: 4.512318E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1922.990 | TFLOPs: 7.15 | 7: iteration 152770/ 173500 | consumed samples: 39109120 | consumed tokens: 80095477760 | elapsed time per iteration (s): 0.08 | learning rate: 2.639E-05 | global batch size: 256 | lm loss: 4.501184E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.037 | TFLOPs: 11.89 | 7: iteration 152780/ 173500 | consumed samples: 39111680 | consumed tokens: 80100720640 | elapsed time per iteration (s): 0.09 | learning rate: 2.639E-05 | global batch size: 256 | lm loss: 4.507217E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.297 | TFLOPs: 11.06 | 7: iteration 152790/ 173500 | consumed samples: 39114240 | consumed tokens: 80105963520 | elapsed time per iteration (s): 0.08 | learning rate: 2.638E-05 | global batch size: 256 | lm loss: 4.489415E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.225 | TFLOPs: 11.90 | 7: iteration 152800/ 173500 | consumed samples: 39116800 | consumed tokens: 80111206400 | elapsed time per iteration (s): 0.08 | learning rate: 2.637E-05 | global batch size: 256 | lm loss: 4.500161E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.061 | TFLOPs: 11.94 | 7: iteration 152810/ 173500 | consumed samples: 39119360 | consumed tokens: 80116449280 | elapsed time per iteration (s): 0.08 | learning rate: 2.637E-05 | global batch size: 256 | lm loss: 4.507293E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.055 | TFLOPs: 11.94 | 7: iteration 152820/ 173500 | consumed samples: 39121920 | consumed tokens: 80121692160 | elapsed time per iteration (s): 0.08 | learning rate: 2.636E-05 | global batch size: 256 | lm loss: 4.505828E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.341 | TFLOPs: 11.92 | 7: iteration 152830/ 173500 | consumed samples: 39124480 | consumed tokens: 80126935040 | elapsed time per iteration (s): 0.08 | learning rate: 2.636E-05 | global batch size: 256 | lm loss: 4.495861E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.789 | TFLOPs: 11.90 | 7: iteration 152840/ 173500 | consumed samples: 39127040 | consumed tokens: 80132177920 | elapsed time per iteration (s): 0.08 | learning rate: 2.635E-05 | global batch size: 256 | lm loss: 4.501150E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.177 | TFLOPs: 11.90 | 7: iteration 152850/ 173500 | consumed samples: 39129600 | consumed tokens: 80137420800 | elapsed time per iteration (s): 0.08 | learning rate: 2.634E-05 | global batch size: 256 | lm loss: 4.505103E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.789 | TFLOPs: 11.93 | 7: iteration 152860/ 173500 | consumed samples: 39132160 | consumed tokens: 80142663680 | elapsed time per iteration (s): 0.08 | learning rate: 2.634E-05 | global batch size: 256 | lm loss: 4.511560E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.166 | TFLOPs: 11.92 | 7: iteration 152870/ 173500 | consumed samples: 39134720 | consumed tokens: 80147906560 | elapsed time per iteration (s): 0.08 | learning rate: 2.633E-05 | global batch size: 256 | lm loss: 4.499454E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.703 | TFLOPs: 11.51 | 7: iteration 152880/ 173500 | consumed samples: 39137280 | consumed tokens: 80153149440 | elapsed time per iteration (s): 0.08 | learning rate: 2.633E-05 | global batch size: 256 | lm loss: 4.507323E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.919 | TFLOPs: 11.83 | 7: iteration 152890/ 173500 | consumed samples: 39139840 | consumed tokens: 80158392320 | elapsed time per iteration (s): 0.08 | learning rate: 2.632E-05 | global batch size: 256 | lm loss: 4.497552E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.156 | TFLOPs: 11.81 | 7: iteration 152900/ 173500 | consumed samples: 39142400 | consumed tokens: 80163635200 | elapsed time per iteration (s): 0.08 | learning rate: 2.631E-05 | global batch size: 256 | lm loss: 4.508799E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.122 | TFLOPs: 11.83 | 7: iteration 152910/ 173500 | consumed samples: 39144960 | consumed tokens: 80168878080 | elapsed time per iteration (s): 0.08 | learning rate: 2.631E-05 | global batch size: 256 | lm loss: 4.497453E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.106 | TFLOPs: 11.84 | 7: iteration 152920/ 173500 | consumed samples: 39147520 | consumed tokens: 80174120960 | elapsed time per iteration (s): 0.08 | learning rate: 2.630E-05 | global batch size: 256 | lm loss: 4.501992E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.426 | TFLOPs: 11.84 | 7: iteration 152930/ 173500 | consumed samples: 39150080 | consumed tokens: 80179363840 | elapsed time per iteration (s): 0.08 | learning rate: 2.630E-05 | global batch size: 256 | lm loss: 4.501670E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.601 | TFLOPs: 11.86 | 7: iteration 152940/ 173500 | consumed samples: 39152640 | consumed tokens: 80184606720 | elapsed time per iteration (s): 0.08 | learning rate: 2.629E-05 | global batch size: 256 | lm loss: 4.511759E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.789 | TFLOPs: 11.87 | 7: iteration 152950/ 173500 | consumed samples: 39155200 | consumed tokens: 80189849600 | elapsed time per iteration (s): 0.08 | learning rate: 2.628E-05 | global batch size: 256 | lm loss: 4.489147E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.358 | TFLOPs: 11.89 | 7: iteration 152960/ 173500 | consumed samples: 39157760 | consumed tokens: 80195092480 | elapsed time per iteration (s): 0.08 | learning rate: 2.628E-05 | global batch size: 256 | lm loss: 4.498775E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.726 | TFLOPs: 11.88 | 7: iteration 152970/ 173500 | consumed samples: 39160320 | consumed tokens: 80200335360 | elapsed time per iteration (s): 0.08 | learning rate: 2.627E-05 | global batch size: 256 | lm loss: 4.501649E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.149 | TFLOPs: 11.84 | 7: iteration 152980/ 173500 | consumed samples: 39162880 | consumed tokens: 80205578240 | elapsed time per iteration (s): 0.08 | learning rate: 2.626E-05 | global batch size: 256 | lm loss: 4.508913E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.470 | TFLOPs: 11.85 | 7: iteration 152990/ 173500 | consumed samples: 39165440 | consumed tokens: 80210821120 | elapsed time per iteration (s): 0.08 | learning rate: 2.626E-05 | global batch size: 256 | lm loss: 4.516404E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.725 | TFLOPs: 11.76 | 7: iteration 153000/ 173500 | consumed samples: 39168000 | consumed tokens: 80216064000 | elapsed time per iteration (s): 0.08 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 4.517801E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.745 | TFLOPs: 11.88 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 153000 | lm loss value: 4.418114E+00 | lm loss PPL: 8.293973E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 153000 to checkpoints_14m91b100m 0: [2023-03-17 04:00:12,789] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step153000 is begin to save! 0: [2023-03-17 04:00:12,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:00:12,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:00:12,819] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:00:12,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:00:12,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:00:12,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:00:12,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:00:12,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:00:12,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:00:12,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:00:12,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:00:12,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:00:12,832] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step153000/mp_rank_00_model_states.pt 0: [2023-03-17 04:00:12,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:00:12,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:00:12,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 2: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 5: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 6: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 04:00:12,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 7: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 4: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 3: [2023-03-17 04:00:12,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 1: [2023-03-17 04:00:12,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:00:12,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step153000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:00:12,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step153000 is ready now! 0: successfully saved checkpoint at iteration 153000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.38 7: iteration 153010/ 173500 | consumed samples: 39170560 | consumed tokens: 80221306880 | elapsed time per iteration (s): 0.12 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 4.503627E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2058.814 | TFLOPs: 7.66 | 7: iteration 153020/ 173500 | consumed samples: 39173120 | consumed tokens: 80226549760 | elapsed time per iteration (s): 0.08 | learning rate: 2.624E-05 | global batch size: 256 | lm loss: 4.501875E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.456 | TFLOPs: 11.82 | 7: iteration 153030/ 173500 | consumed samples: 39175680 | consumed tokens: 80231792640 | elapsed time per iteration (s): 0.08 | learning rate: 2.623E-05 | global batch size: 256 | lm loss: 4.505007E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.458 | TFLOPs: 11.82 | 7: iteration 153040/ 173500 | consumed samples: 39178240 | consumed tokens: 80237035520 | elapsed time per iteration (s): 0.08 | learning rate: 2.623E-05 | global batch size: 256 | lm loss: 4.510461E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.658 | TFLOPs: 11.86 | 7: iteration 153050/ 173500 | consumed samples: 39180800 | consumed tokens: 80242278400 | elapsed time per iteration (s): 0.08 | learning rate: 2.622E-05 | global batch size: 256 | lm loss: 4.504469E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.634 | TFLOPs: 11.86 | 7: iteration 153060/ 173500 | consumed samples: 39183360 | consumed tokens: 80247521280 | elapsed time per iteration (s): 0.08 | learning rate: 2.622E-05 | global batch size: 256 | lm loss: 4.505955E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.144 | TFLOPs: 11.73 | 7: iteration 153070/ 173500 | consumed samples: 39185920 | consumed tokens: 80252764160 | elapsed time per iteration (s): 0.08 | learning rate: 2.621E-05 | global batch size: 256 | lm loss: 4.506093E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.661 | TFLOPs: 11.89 | 7: iteration 153080/ 173500 | consumed samples: 39188480 | consumed tokens: 80258007040 | elapsed time per iteration (s): 0.08 | learning rate: 2.620E-05 | global batch size: 256 | lm loss: 4.506579E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.230 | TFLOPs: 11.87 | 7: iteration 153090/ 173500 | consumed samples: 39191040 | consumed tokens: 80263249920 | elapsed time per iteration (s): 0.08 | learning rate: 2.620E-05 | global batch size: 256 | lm loss: 4.503142E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.996 | TFLOPs: 11.94 | 7: iteration 153100/ 173500 | consumed samples: 39193600 | consumed tokens: 80268492800 | elapsed time per iteration (s): 0.08 | learning rate: 2.619E-05 | global batch size: 256 | lm loss: 4.519050E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.888 | TFLOPs: 11.91 | 7: iteration 153110/ 173500 | consumed samples: 39196160 | consumed tokens: 80273735680 | elapsed time per iteration (s): 0.08 | learning rate: 2.619E-05 | global batch size: 256 | lm loss: 4.514039E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.202 | TFLOPs: 11.86 | 7: iteration 153120/ 173500 | consumed samples: 39198720 | consumed tokens: 80278978560 | elapsed time per iteration (s): 0.08 | learning rate: 2.618E-05 | global batch size: 256 | lm loss: 4.503706E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.108 | TFLOPs: 11.90 | 7: iteration 153130/ 173500 | consumed samples: 39201280 | consumed tokens: 80284221440 | elapsed time per iteration (s): 0.08 | learning rate: 2.617E-05 | global batch size: 256 | lm loss: 4.506872E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.351 | TFLOPs: 11.90 | 7: iteration 153140/ 173500 | consumed samples: 39203840 | consumed tokens: 80289464320 | elapsed time per iteration (s): 0.08 | learning rate: 2.617E-05 | global batch size: 256 | lm loss: 4.510065E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.775 | TFLOPs: 11.91 | 7: iteration 153150/ 173500 | consumed samples: 39206400 | consumed tokens: 80294707200 | elapsed time per iteration (s): 0.08 | learning rate: 2.616E-05 | global batch size: 256 | lm loss: 4.512139E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.082 | TFLOPs: 11.91 | 7: iteration 153160/ 173500 | consumed samples: 39208960 | consumed tokens: 80299950080 | elapsed time per iteration (s): 0.08 | learning rate: 2.616E-05 | global batch size: 256 | lm loss: 4.506979E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.955 | TFLOPs: 11.97 | 7: iteration 153170/ 173500 | consumed samples: 39211520 | consumed tokens: 80305192960 | elapsed time per iteration (s): 0.08 | learning rate: 2.615E-05 | global batch size: 256 | lm loss: 4.496113E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.539 | TFLOPs: 11.86 | 7: iteration 153180/ 173500 | consumed samples: 39214080 | consumed tokens: 80310435840 | elapsed time per iteration (s): 0.08 | learning rate: 2.614E-05 | global batch size: 256 | lm loss: 4.498931E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.757 | TFLOPs: 11.85 | 7: iteration 153190/ 173500 | consumed samples: 39216640 | consumed tokens: 80315678720 | elapsed time per iteration (s): 0.08 | learning rate: 2.614E-05 | global batch size: 256 | lm loss: 4.509007E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.462 | TFLOPs: 11.76 | 7: iteration 153200/ 173500 | consumed samples: 39219200 | consumed tokens: 80320921600 | elapsed time per iteration (s): 0.08 | learning rate: 2.613E-05 | global batch size: 256 | lm loss: 4.507717E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.088 | TFLOPs: 11.81 | 7: iteration 153210/ 173500 | consumed samples: 39221760 | consumed tokens: 80326164480 | elapsed time per iteration (s): 0.08 | learning rate: 2.613E-05 | global batch size: 256 | lm loss: 4.498626E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.253 | TFLOPs: 11.92 | 7: iteration 153220/ 173500 | consumed samples: 39224320 | consumed tokens: 80331407360 | elapsed time per iteration (s): 0.08 | learning rate: 2.612E-05 | global batch size: 256 | lm loss: 4.492843E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.682 | TFLOPs: 11.84 | 7: iteration 153230/ 173500 | consumed samples: 39226880 | consumed tokens: 80336650240 | elapsed time per iteration (s): 0.08 | learning rate: 2.611E-05 | global batch size: 256 | lm loss: 4.503983E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.638 | TFLOPs: 11.96 | 7: iteration 153240/ 173500 | consumed samples: 39229440 | consumed tokens: 80341893120 | elapsed time per iteration (s): 0.08 | learning rate: 2.611E-05 | global batch size: 256 | lm loss: 4.501281E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.329 | TFLOPs: 11.86 | 7: iteration 153250/ 173500 | consumed samples: 39232000 | consumed tokens: 80347136000 | elapsed time per iteration (s): 0.08 | learning rate: 2.610E-05 | global batch size: 256 | lm loss: 4.496396E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.602 | TFLOPs: 11.89 | 7: iteration 153260/ 173500 | consumed samples: 39234560 | consumed tokens: 80352378880 | elapsed time per iteration (s): 0.15 | learning rate: 2.610E-05 | global batch size: 256 | lm loss: 4.503151E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1753.761 | TFLOPs: 6.52 | 7: iteration 153270/ 173500 | consumed samples: 39237120 | consumed tokens: 80357621760 | elapsed time per iteration (s): 0.08 | learning rate: 2.609E-05 | global batch size: 256 | lm loss: 4.507682E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.040 | TFLOPs: 11.42 | 7: iteration 153280/ 173500 | consumed samples: 39239680 | consumed tokens: 80362864640 | elapsed time per iteration (s): 0.10 | learning rate: 2.609E-05 | global batch size: 256 | lm loss: 4.505256E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2524.653 | TFLOPs: 9.39 | 7: iteration 153290/ 173500 | consumed samples: 39242240 | consumed tokens: 80368107520 | elapsed time per iteration (s): 0.09 | learning rate: 2.608E-05 | global batch size: 256 | lm loss: 4.509066E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2970.179 | TFLOPs: 11.05 | 7: iteration 153300/ 173500 | consumed samples: 39244800 | consumed tokens: 80373350400 | elapsed time per iteration (s): 0.08 | learning rate: 2.607E-05 | global batch size: 256 | lm loss: 4.497144E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.863 | TFLOPs: 11.92 | 7: iteration 153310/ 173500 | consumed samples: 39247360 | consumed tokens: 80378593280 | elapsed time per iteration (s): 0.08 | learning rate: 2.607E-05 | global batch size: 256 | lm loss: 4.506201E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.179 | TFLOPs: 11.90 | 7: iteration 153320/ 173500 | consumed samples: 39249920 | consumed tokens: 80383836160 | elapsed time per iteration (s): 0.08 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 4.512809E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.559 | TFLOPs: 11.77 | 7: iteration 153330/ 173500 | consumed samples: 39252480 | consumed tokens: 80389079040 | elapsed time per iteration (s): 0.08 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 4.505468E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.951 | TFLOPs: 11.88 | 7: iteration 153340/ 173500 | consumed samples: 39255040 | consumed tokens: 80394321920 | elapsed time per iteration (s): 0.08 | learning rate: 2.605E-05 | global batch size: 256 | lm loss: 4.527998E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.850 | TFLOPs: 11.93 | 7: iteration 153350/ 173500 | consumed samples: 39257600 | consumed tokens: 80399564800 | elapsed time per iteration (s): 0.08 | learning rate: 2.604E-05 | global batch size: 256 | lm loss: 4.507285E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.559 | TFLOPs: 11.87 | 7: iteration 153360/ 173500 | consumed samples: 39260160 | consumed tokens: 80404807680 | elapsed time per iteration (s): 0.08 | learning rate: 2.604E-05 | global batch size: 256 | lm loss: 4.514668E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.663 | TFLOPs: 11.86 | 7: iteration 153370/ 173500 | consumed samples: 39262720 | consumed tokens: 80410050560 | elapsed time per iteration (s): 0.09 | learning rate: 2.603E-05 | global batch size: 256 | lm loss: 4.501690E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.863 | TFLOPs: 10.56 | 7: iteration 153380/ 173500 | consumed samples: 39265280 | consumed tokens: 80415293440 | elapsed time per iteration (s): 0.08 | learning rate: 2.603E-05 | global batch size: 256 | lm loss: 4.502546E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.650 | TFLOPs: 11.89 | 7: iteration 153390/ 173500 | consumed samples: 39267840 | consumed tokens: 80420536320 | elapsed time per iteration (s): 0.08 | learning rate: 2.602E-05 | global batch size: 256 | lm loss: 4.504523E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.743 | TFLOPs: 11.91 | 7: iteration 153400/ 173500 | consumed samples: 39270400 | consumed tokens: 80425779200 | elapsed time per iteration (s): 0.08 | learning rate: 2.601E-05 | global batch size: 256 | lm loss: 4.494461E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.449 | TFLOPs: 11.91 | 7: iteration 153410/ 173500 | consumed samples: 39272960 | consumed tokens: 80431022080 | elapsed time per iteration (s): 0.08 | learning rate: 2.601E-05 | global batch size: 256 | lm loss: 4.511779E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.296 | TFLOPs: 11.89 | 7: iteration 153420/ 173500 | consumed samples: 39275520 | consumed tokens: 80436264960 | elapsed time per iteration (s): 0.08 | learning rate: 2.600E-05 | global batch size: 256 | lm loss: 4.506021E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.712 | TFLOPs: 11.91 | 7: iteration 153430/ 173500 | consumed samples: 39278080 | consumed tokens: 80441507840 | elapsed time per iteration (s): 0.08 | learning rate: 2.600E-05 | global batch size: 256 | lm loss: 4.499823E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.498 | TFLOPs: 11.90 | 7: iteration 153440/ 173500 | consumed samples: 39280640 | consumed tokens: 80446750720 | elapsed time per iteration (s): 0.08 | learning rate: 2.599E-05 | global batch size: 256 | lm loss: 4.497158E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.739 | TFLOPs: 11.86 | 7: iteration 153450/ 173500 | consumed samples: 39283200 | consumed tokens: 80451993600 | elapsed time per iteration (s): 0.08 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 4.520945E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.361 | TFLOPs: 11.89 | 7: iteration 153460/ 173500 | consumed samples: 39285760 | consumed tokens: 80457236480 | elapsed time per iteration (s): 0.08 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 4.508946E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.800 | TFLOPs: 11.91 | 7: iteration 153470/ 173500 | consumed samples: 39288320 | consumed tokens: 80462479360 | elapsed time per iteration (s): 0.08 | learning rate: 2.597E-05 | global batch size: 256 | lm loss: 4.511008E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.568 | TFLOPs: 11.96 | 7: iteration 153480/ 173500 | consumed samples: 39290880 | consumed tokens: 80467722240 | elapsed time per iteration (s): 0.08 | learning rate: 2.597E-05 | global batch size: 256 | lm loss: 4.497911E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.967 | TFLOPs: 11.97 | 7: iteration 153490/ 173500 | consumed samples: 39293440 | consumed tokens: 80472965120 | elapsed time per iteration (s): 0.08 | learning rate: 2.596E-05 | global batch size: 256 | lm loss: 4.505124E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.188 | TFLOPs: 12.00 | 7: iteration 153500/ 173500 | consumed samples: 39296000 | consumed tokens: 80478208000 | elapsed time per iteration (s): 0.08 | learning rate: 2.595E-05 | global batch size: 256 | lm loss: 4.496083E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.711 | TFLOPs: 12.01 | 7: iteration 153510/ 173500 | consumed samples: 39298560 | consumed tokens: 80483450880 | elapsed time per iteration (s): 0.08 | learning rate: 2.595E-05 | global batch size: 256 | lm loss: 4.506443E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.336 | TFLOPs: 11.99 | 7: iteration 153520/ 173500 | consumed samples: 39301120 | consumed tokens: 80488693760 | elapsed time per iteration (s): 0.08 | learning rate: 2.594E-05 | global batch size: 256 | lm loss: 4.514187E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.112 | TFLOPs: 11.97 | 7: iteration 153530/ 173500 | consumed samples: 39303680 | consumed tokens: 80493936640 | elapsed time per iteration (s): 0.08 | learning rate: 2.594E-05 | global batch size: 256 | lm loss: 4.513546E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.939 | TFLOPs: 11.97 | 7: iteration 153540/ 173500 | consumed samples: 39306240 | consumed tokens: 80499179520 | elapsed time per iteration (s): 0.08 | learning rate: 2.593E-05 | global batch size: 256 | lm loss: 4.516370E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.561 | TFLOPs: 11.99 | 7: iteration 153550/ 173500 | consumed samples: 39308800 | consumed tokens: 80504422400 | elapsed time per iteration (s): 0.08 | learning rate: 2.593E-05 | global batch size: 256 | lm loss: 4.490479E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.002 | TFLOPs: 11.97 | 7: iteration 153560/ 173500 | consumed samples: 39311360 | consumed tokens: 80509665280 | elapsed time per iteration (s): 0.08 | learning rate: 2.592E-05 | global batch size: 256 | lm loss: 4.503347E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.176 | TFLOPs: 11.79 | 7: iteration 153570/ 173500 | consumed samples: 39313920 | consumed tokens: 80514908160 | elapsed time per iteration (s): 0.08 | learning rate: 2.591E-05 | global batch size: 256 | lm loss: 4.501652E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.837 | TFLOPs: 11.90 | 7: iteration 153580/ 173500 | consumed samples: 39316480 | consumed tokens: 80520151040 | elapsed time per iteration (s): 0.09 | learning rate: 2.591E-05 | global batch size: 256 | lm loss: 4.508660E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2974.509 | TFLOPs: 11.06 | 7: iteration 153590/ 173500 | consumed samples: 39319040 | consumed tokens: 80525393920 | elapsed time per iteration (s): 0.08 | learning rate: 2.590E-05 | global batch size: 256 | lm loss: 4.501931E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.440 | TFLOPs: 11.99 | 7: iteration 153600/ 173500 | consumed samples: 39321600 | consumed tokens: 80530636800 | elapsed time per iteration (s): 0.08 | learning rate: 2.590E-05 | global batch size: 256 | lm loss: 4.505234E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.621 | TFLOPs: 11.96 | 7: iteration 153610/ 173500 | consumed samples: 39324160 | consumed tokens: 80535879680 | elapsed time per iteration (s): 0.08 | learning rate: 2.589E-05 | global batch size: 256 | lm loss: 4.506224E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.549 | TFLOPs: 11.96 | 7: iteration 153620/ 173500 | consumed samples: 39326720 | consumed tokens: 80541122560 | elapsed time per iteration (s): 0.08 | learning rate: 2.588E-05 | global batch size: 256 | lm loss: 4.505473E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.560 | TFLOPs: 11.97 | 7: iteration 153630/ 173500 | consumed samples: 39329280 | consumed tokens: 80546365440 | elapsed time per iteration (s): 0.08 | learning rate: 2.588E-05 | global batch size: 256 | lm loss: 4.502450E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.887 | TFLOPs: 11.90 | 7: iteration 153640/ 173500 | consumed samples: 39331840 | consumed tokens: 80551608320 | elapsed time per iteration (s): 0.08 | learning rate: 2.587E-05 | global batch size: 256 | lm loss: 4.500243E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.778 | TFLOPs: 11.84 | 7: iteration 153650/ 173500 | consumed samples: 39334400 | consumed tokens: 80556851200 | elapsed time per iteration (s): 0.08 | learning rate: 2.587E-05 | global batch size: 256 | lm loss: 4.488453E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.493 | TFLOPs: 11.78 | 7: iteration 153660/ 173500 | consumed samples: 39336960 | consumed tokens: 80562094080 | elapsed time per iteration (s): 0.08 | learning rate: 2.586E-05 | global batch size: 256 | lm loss: 4.506282E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.036 | TFLOPs: 11.84 | 7: iteration 153670/ 173500 | consumed samples: 39339520 | consumed tokens: 80567336960 | elapsed time per iteration (s): 0.08 | learning rate: 2.586E-05 | global batch size: 256 | lm loss: 4.494228E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.141 | TFLOPs: 11.83 | 7: iteration 153680/ 173500 | consumed samples: 39342080 | consumed tokens: 80572579840 | elapsed time per iteration (s): 0.08 | learning rate: 2.585E-05 | global batch size: 256 | lm loss: 4.498127E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.662 | TFLOPs: 11.91 | 7: iteration 153690/ 173500 | consumed samples: 39344640 | consumed tokens: 80577822720 | elapsed time per iteration (s): 0.08 | learning rate: 2.584E-05 | global batch size: 256 | lm loss: 4.511644E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.631 | TFLOPs: 11.89 | 7: iteration 153700/ 173500 | consumed samples: 39347200 | consumed tokens: 80583065600 | elapsed time per iteration (s): 0.08 | learning rate: 2.584E-05 | global batch size: 256 | lm loss: 4.509291E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.073 | TFLOPs: 11.87 | 7: iteration 153710/ 173500 | consumed samples: 39349760 | consumed tokens: 80588308480 | elapsed time per iteration (s): 0.08 | learning rate: 2.583E-05 | global batch size: 256 | lm loss: 4.511954E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.690 | TFLOPs: 11.90 | 7: iteration 153720/ 173500 | consumed samples: 39352320 | consumed tokens: 80593551360 | elapsed time per iteration (s): 0.08 | learning rate: 2.583E-05 | global batch size: 256 | lm loss: 4.512858E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.557 | TFLOPs: 11.78 | 7: iteration 153730/ 173500 | consumed samples: 39354880 | consumed tokens: 80598794240 | elapsed time per iteration (s): 0.08 | learning rate: 2.582E-05 | global batch size: 256 | lm loss: 4.489532E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.585 | TFLOPs: 11.88 | 7: iteration 153740/ 173500 | consumed samples: 39357440 | consumed tokens: 80604037120 | elapsed time per iteration (s): 0.08 | learning rate: 2.581E-05 | global batch size: 256 | lm loss: 4.507291E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.134 | TFLOPs: 11.76 | 7: iteration 153750/ 173500 | consumed samples: 39360000 | consumed tokens: 80609280000 | elapsed time per iteration (s): 0.08 | learning rate: 2.581E-05 | global batch size: 256 | lm loss: 4.497758E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.293 | TFLOPs: 11.86 | 7: iteration 153760/ 173500 | consumed samples: 39362560 | consumed tokens: 80614522880 | elapsed time per iteration (s): 0.08 | learning rate: 2.580E-05 | global batch size: 256 | lm loss: 4.508066E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.997 | TFLOPs: 11.88 | 7: iteration 153770/ 173500 | consumed samples: 39365120 | consumed tokens: 80619765760 | elapsed time per iteration (s): 0.08 | learning rate: 2.580E-05 | global batch size: 256 | lm loss: 4.504296E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.359 | TFLOPs: 11.86 | 7: iteration 153780/ 173500 | consumed samples: 39367680 | consumed tokens: 80625008640 | elapsed time per iteration (s): 0.08 | learning rate: 2.579E-05 | global batch size: 256 | lm loss: 4.521602E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.504 | TFLOPs: 11.74 | 7: iteration 153790/ 173500 | consumed samples: 39370240 | consumed tokens: 80630251520 | elapsed time per iteration (s): 0.08 | learning rate: 2.579E-05 | global batch size: 256 | lm loss: 4.506707E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.161 | TFLOPs: 11.86 | 7: iteration 153800/ 173500 | consumed samples: 39372800 | consumed tokens: 80635494400 | elapsed time per iteration (s): 0.08 | learning rate: 2.578E-05 | global batch size: 256 | lm loss: 4.503273E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.959 | TFLOPs: 11.94 | 7: iteration 153810/ 173500 | consumed samples: 39375360 | consumed tokens: 80640737280 | elapsed time per iteration (s): 0.11 | learning rate: 2.577E-05 | global batch size: 256 | lm loss: 4.509450E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2382.487 | TFLOPs: 8.86 | 7: iteration 153820/ 173500 | consumed samples: 39377920 | consumed tokens: 80645980160 | elapsed time per iteration (s): 0.08 | learning rate: 2.577E-05 | global batch size: 256 | lm loss: 4.508548E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.419 | TFLOPs: 11.95 | 7: iteration 153830/ 173500 | consumed samples: 39380480 | consumed tokens: 80651223040 | elapsed time per iteration (s): 0.08 | learning rate: 2.576E-05 | global batch size: 256 | lm loss: 4.502195E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.029 | TFLOPs: 11.34 | 7: iteration 153840/ 173500 | consumed samples: 39383040 | consumed tokens: 80656465920 | elapsed time per iteration (s): 0.08 | learning rate: 2.576E-05 | global batch size: 256 | lm loss: 4.516745E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.799 | TFLOPs: 11.80 | 7: iteration 153850/ 173500 | consumed samples: 39385600 | consumed tokens: 80661708800 | elapsed time per iteration (s): 0.08 | learning rate: 2.575E-05 | global batch size: 256 | lm loss: 4.507547E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.261 | TFLOPs: 11.98 | 7: iteration 153860/ 173500 | consumed samples: 39388160 | consumed tokens: 80666951680 | elapsed time per iteration (s): 0.08 | learning rate: 2.574E-05 | global batch size: 256 | lm loss: 4.498084E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.072 | TFLOPs: 11.88 | 7: iteration 153870/ 173500 | consumed samples: 39390720 | consumed tokens: 80672194560 | elapsed time per iteration (s): 0.09 | learning rate: 2.574E-05 | global batch size: 256 | lm loss: 4.502977E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2775.573 | TFLOPs: 10.32 | 7: iteration 153880/ 173500 | consumed samples: 39393280 | consumed tokens: 80677437440 | elapsed time per iteration (s): 0.08 | learning rate: 2.573E-05 | global batch size: 256 | lm loss: 4.514265E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.696 | TFLOPs: 11.78 | 7: iteration 153890/ 173500 | consumed samples: 39395840 | consumed tokens: 80682680320 | elapsed time per iteration (s): 0.08 | learning rate: 2.573E-05 | global batch size: 256 | lm loss: 4.498066E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.904 | TFLOPs: 11.79 | 7: iteration 153900/ 173500 | consumed samples: 39398400 | consumed tokens: 80687923200 | elapsed time per iteration (s): 0.08 | learning rate: 2.572E-05 | global batch size: 256 | lm loss: 4.500043E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.766 | TFLOPs: 11.39 | 7: iteration 153910/ 173500 | consumed samples: 39400960 | consumed tokens: 80693166080 | elapsed time per iteration (s): 0.09 | learning rate: 2.572E-05 | global batch size: 256 | lm loss: 4.494737E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2931.685 | TFLOPs: 10.90 | 7: iteration 153920/ 173500 | consumed samples: 39403520 | consumed tokens: 80698408960 | elapsed time per iteration (s): 0.08 | learning rate: 2.571E-05 | global batch size: 256 | lm loss: 4.507775E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.595 | TFLOPs: 11.58 | 7: iteration 153930/ 173500 | consumed samples: 39406080 | consumed tokens: 80703651840 | elapsed time per iteration (s): 0.08 | learning rate: 2.570E-05 | global batch size: 256 | lm loss: 4.507459E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.979 | TFLOPs: 11.98 | 7: iteration 153940/ 173500 | consumed samples: 39408640 | consumed tokens: 80708894720 | elapsed time per iteration (s): 0.08 | learning rate: 2.570E-05 | global batch size: 256 | lm loss: 4.513380E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.996 | TFLOPs: 11.43 | 7: iteration 153950/ 173500 | consumed samples: 39411200 | consumed tokens: 80714137600 | elapsed time per iteration (s): 0.08 | learning rate: 2.569E-05 | global batch size: 256 | lm loss: 4.502123E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.456 | TFLOPs: 11.71 | 7: iteration 153960/ 173500 | consumed samples: 39413760 | consumed tokens: 80719380480 | elapsed time per iteration (s): 0.08 | learning rate: 2.569E-05 | global batch size: 256 | lm loss: 4.506362E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.359 | TFLOPs: 11.92 | 7: iteration 153970/ 173500 | consumed samples: 39416320 | consumed tokens: 80724623360 | elapsed time per iteration (s): 0.12 | learning rate: 2.568E-05 | global batch size: 256 | lm loss: 4.517065E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2139.651 | TFLOPs: 7.96 | 7: iteration 153980/ 173500 | consumed samples: 39418880 | consumed tokens: 80729866240 | elapsed time per iteration (s): 0.08 | learning rate: 2.568E-05 | global batch size: 256 | lm loss: 4.514332E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.404 | TFLOPs: 11.58 | 7: iteration 153990/ 173500 | consumed samples: 39421440 | consumed tokens: 80735109120 | elapsed time per iteration (s): 0.08 | learning rate: 2.567E-05 | global batch size: 256 | lm loss: 4.496161E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.809 | TFLOPs: 11.68 | 0: [2023-03-17 04:01:35,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=154000, skipped=0, lr=[2.5664028527469924e-05, 2.5664028527469924e-05, 2.5664028527469924e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 154000/ 173500 | consumed samples: 39424000 | consumed tokens: 80740352000 | elapsed time per iteration (s): 0.10 | learning rate: 2.566E-05 | global batch size: 256 | lm loss: 4.512714E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2521.009 | TFLOPs: 9.38 | 0: steps: 154000 loss: 4.5143 iter time (s): 0.081 samples/sec: 3162.328 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 154000 | lm loss value: 4.368020E+00 | lm loss PPL: 7.888728E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 154000 to checkpoints_14m91b100m 0: [2023-03-17 04:01:35,734] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step154000 is begin to save! 0: [2023-03-17 04:01:35,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:01:35,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:01:35,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:01:35,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:01:35,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:01:35,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:01:35,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:01:35,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:01:35,773] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:01:35,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:01:35,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:01:35,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:01:35,777] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step154000/mp_rank_00_model_states.pt 0: [2023-03-17 04:01:35,777] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:01:35,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:01:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:01:35,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:01:35,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 7: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:01:35,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 7: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,804] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,804] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:01:35,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 7: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,806] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,806] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:01:35,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 7: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,807] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,807] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 7: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,809] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,809] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 5: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 4: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 6: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 1: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,810] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 2: [2023-03-17 04:01:35,810] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 3: [2023-03-17 04:01:35,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step154000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:01:35,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step154000 is ready now! 0: successfully saved checkpoint at iteration 154000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 90.97 7: iteration 154010/ 173500 | consumed samples: 39426560 | consumed tokens: 80745594880 | elapsed time per iteration (s): 0.12 | learning rate: 2.566E-05 | global batch size: 256 | lm loss: 4.512619E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2111.855 | TFLOPs: 7.86 | 7: iteration 154020/ 173500 | consumed samples: 39429120 | consumed tokens: 80750837760 | elapsed time per iteration (s): 0.08 | learning rate: 2.565E-05 | global batch size: 256 | lm loss: 4.516713E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3042.449 | TFLOPs: 11.32 | 7: iteration 154030/ 173500 | consumed samples: 39431680 | consumed tokens: 80756080640 | elapsed time per iteration (s): 0.08 | learning rate: 2.565E-05 | global batch size: 256 | lm loss: 4.506650E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.165 | TFLOPs: 11.64 | 7: iteration 154040/ 173500 | consumed samples: 39434240 | consumed tokens: 80761323520 | elapsed time per iteration (s): 0.08 | learning rate: 2.564E-05 | global batch size: 256 | lm loss: 4.504021E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.318 | TFLOPs: 11.71 | 7: iteration 154050/ 173500 | consumed samples: 39436800 | consumed tokens: 80766566400 | elapsed time per iteration (s): 0.08 | learning rate: 2.564E-05 | global batch size: 256 | lm loss: 4.517165E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.556 | TFLOPs: 11.60 | 7: iteration 154060/ 173500 | consumed samples: 39439360 | consumed tokens: 80771809280 | elapsed time per iteration (s): 0.08 | learning rate: 2.563E-05 | global batch size: 256 | lm loss: 4.504440E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.005 | TFLOPs: 11.90 | 7: iteration 154070/ 173500 | consumed samples: 39441920 | consumed tokens: 80777052160 | elapsed time per iteration (s): 0.08 | learning rate: 2.562E-05 | global batch size: 256 | lm loss: 4.516249E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.474 | TFLOPs: 11.46 | 7: iteration 154080/ 173500 | consumed samples: 39444480 | consumed tokens: 80782295040 | elapsed time per iteration (s): 0.08 | learning rate: 2.562E-05 | global batch size: 256 | lm loss: 4.503741E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.282 | TFLOPs: 11.89 | 7: iteration 154090/ 173500 | consumed samples: 39447040 | consumed tokens: 80787537920 | elapsed time per iteration (s): 0.08 | learning rate: 2.561E-05 | global batch size: 256 | lm loss: 4.510744E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.517 | TFLOPs: 11.99 | 7: iteration 154100/ 173500 | consumed samples: 39449600 | consumed tokens: 80792780800 | elapsed time per iteration (s): 0.08 | learning rate: 2.561E-05 | global batch size: 256 | lm loss: 4.514526E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.638 | TFLOPs: 11.95 | 7: iteration 154110/ 173500 | consumed samples: 39452160 | consumed tokens: 80798023680 | elapsed time per iteration (s): 0.08 | learning rate: 2.560E-05 | global batch size: 256 | lm loss: 4.510830E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.742 | TFLOPs: 11.72 | 7: iteration 154120/ 173500 | consumed samples: 39454720 | consumed tokens: 80803266560 | elapsed time per iteration (s): 0.08 | learning rate: 2.560E-05 | global batch size: 256 | lm loss: 4.512681E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.086 | TFLOPs: 11.99 | 7: iteration 154130/ 173500 | consumed samples: 39457280 | consumed tokens: 80808509440 | elapsed time per iteration (s): 0.08 | learning rate: 2.559E-05 | global batch size: 256 | lm loss: 4.490374E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.613 | TFLOPs: 11.92 | 7: iteration 154140/ 173500 | consumed samples: 39459840 | consumed tokens: 80813752320 | elapsed time per iteration (s): 0.08 | learning rate: 2.558E-05 | global batch size: 256 | lm loss: 4.492156E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.701 | TFLOPs: 12.04 | 7: iteration 154150/ 173500 | consumed samples: 39462400 | consumed tokens: 80818995200 | elapsed time per iteration (s): 0.08 | learning rate: 2.558E-05 | global batch size: 256 | lm loss: 4.495982E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.699 | TFLOPs: 11.71 | 7: iteration 154160/ 173500 | consumed samples: 39464960 | consumed tokens: 80824238080 | elapsed time per iteration (s): 0.08 | learning rate: 2.557E-05 | global batch size: 256 | lm loss: 4.508158E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.733 | TFLOPs: 12.02 | 7: iteration 154170/ 173500 | consumed samples: 39467520 | consumed tokens: 80829480960 | elapsed time per iteration (s): 0.08 | learning rate: 2.557E-05 | global batch size: 256 | lm loss: 4.503617E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.784 | TFLOPs: 12.02 | 7: iteration 154180/ 173500 | consumed samples: 39470080 | consumed tokens: 80834723840 | elapsed time per iteration (s): 0.08 | learning rate: 2.556E-05 | global batch size: 256 | lm loss: 4.510022E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.842 | TFLOPs: 11.95 | 7: iteration 154190/ 173500 | consumed samples: 39472640 | consumed tokens: 80839966720 | elapsed time per iteration (s): 0.08 | learning rate: 2.556E-05 | global batch size: 256 | lm loss: 4.511020E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.842 | TFLOPs: 11.64 | 7: iteration 154200/ 173500 | consumed samples: 39475200 | consumed tokens: 80845209600 | elapsed time per iteration (s): 0.08 | learning rate: 2.555E-05 | global batch size: 256 | lm loss: 4.503082E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.349 | TFLOPs: 11.77 | 7: iteration 154210/ 173500 | consumed samples: 39477760 | consumed tokens: 80850452480 | elapsed time per iteration (s): 0.08 | learning rate: 2.554E-05 | global batch size: 256 | lm loss: 4.506045E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.237 | TFLOPs: 11.91 | 7: iteration 154220/ 173500 | consumed samples: 39480320 | consumed tokens: 80855695360 | elapsed time per iteration (s): 0.08 | learning rate: 2.554E-05 | global batch size: 256 | lm loss: 4.496744E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.666 | TFLOPs: 11.73 | 7: iteration 154230/ 173500 | consumed samples: 39482880 | consumed tokens: 80860938240 | elapsed time per iteration (s): 0.08 | learning rate: 2.553E-05 | global batch size: 256 | lm loss: 4.499715E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.418 | TFLOPs: 11.97 | 7: iteration 154240/ 173500 | consumed samples: 39485440 | consumed tokens: 80866181120 | elapsed time per iteration (s): 0.08 | learning rate: 2.553E-05 | global batch size: 256 | lm loss: 4.506707E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.394 | TFLOPs: 11.83 | 7: iteration 154250/ 173500 | consumed samples: 39488000 | consumed tokens: 80871424000 | elapsed time per iteration (s): 0.08 | learning rate: 2.552E-05 | global batch size: 256 | lm loss: 4.510183E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.126 | TFLOPs: 11.98 | 7: iteration 154260/ 173500 | consumed samples: 39490560 | consumed tokens: 80876666880 | elapsed time per iteration (s): 0.08 | learning rate: 2.552E-05 | global batch size: 256 | lm loss: 4.502031E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.899 | TFLOPs: 11.94 | 7: iteration 154270/ 173500 | consumed samples: 39493120 | consumed tokens: 80881909760 | elapsed time per iteration (s): 0.08 | learning rate: 2.551E-05 | global batch size: 256 | lm loss: 4.501302E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.525 | TFLOPs: 11.96 | 7: iteration 154280/ 173500 | consumed samples: 39495680 | consumed tokens: 80887152640 | elapsed time per iteration (s): 0.08 | learning rate: 2.550E-05 | global batch size: 256 | lm loss: 4.497384E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.610 | TFLOPs: 11.97 | 7: iteration 154290/ 173500 | consumed samples: 39498240 | consumed tokens: 80892395520 | elapsed time per iteration (s): 0.08 | learning rate: 2.550E-05 | global batch size: 256 | lm loss: 4.505107E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.025 | TFLOPs: 11.95 | 7: iteration 154300/ 173500 | consumed samples: 39500800 | consumed tokens: 80897638400 | elapsed time per iteration (s): 0.08 | learning rate: 2.549E-05 | global batch size: 256 | lm loss: 4.492471E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.154 | TFLOPs: 11.83 | 7: iteration 154310/ 173500 | consumed samples: 39503360 | consumed tokens: 80902881280 | elapsed time per iteration (s): 0.08 | learning rate: 2.549E-05 | global batch size: 256 | lm loss: 4.495656E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.419 | TFLOPs: 12.00 | 7: iteration 154320/ 173500 | consumed samples: 39505920 | consumed tokens: 80908124160 | elapsed time per iteration (s): 0.08 | learning rate: 2.548E-05 | global batch size: 256 | lm loss: 4.510497E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.393 | TFLOPs: 12.01 | 7: iteration 154330/ 173500 | consumed samples: 39508480 | consumed tokens: 80913367040 | elapsed time per iteration (s): 0.08 | learning rate: 2.548E-05 | global batch size: 256 | lm loss: 4.505275E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.302 | TFLOPs: 11.77 | 7: iteration 154340/ 173500 | consumed samples: 39511040 | consumed tokens: 80918609920 | elapsed time per iteration (s): 0.08 | learning rate: 2.547E-05 | global batch size: 256 | lm loss: 4.506794E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.136 | TFLOPs: 11.97 | 7: iteration 154350/ 173500 | consumed samples: 39513600 | consumed tokens: 80923852800 | elapsed time per iteration (s): 0.08 | learning rate: 2.546E-05 | global batch size: 256 | lm loss: 4.502294E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.123 | TFLOPs: 11.97 | 7: iteration 154360/ 173500 | consumed samples: 39516160 | consumed tokens: 80929095680 | elapsed time per iteration (s): 0.08 | learning rate: 2.546E-05 | global batch size: 256 | lm loss: 4.489313E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.970 | TFLOPs: 12.00 | 7: iteration 154370/ 173500 | consumed samples: 39518720 | consumed tokens: 80934338560 | elapsed time per iteration (s): 0.08 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 4.518961E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.193 | TFLOPs: 11.92 | 7: iteration 154380/ 173500 | consumed samples: 39521280 | consumed tokens: 80939581440 | elapsed time per iteration (s): 0.08 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 4.503453E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.868 | TFLOPs: 11.96 | 7: iteration 154390/ 173500 | consumed samples: 39523840 | consumed tokens: 80944824320 | elapsed time per iteration (s): 0.08 | learning rate: 2.544E-05 | global batch size: 256 | lm loss: 4.495690E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.710 | TFLOPs: 12.00 | 7: iteration 154400/ 173500 | consumed samples: 39526400 | consumed tokens: 80950067200 | elapsed time per iteration (s): 0.08 | learning rate: 2.544E-05 | global batch size: 256 | lm loss: 4.500264E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.057 | TFLOPs: 11.94 | 7: iteration 154410/ 173500 | consumed samples: 39528960 | consumed tokens: 80955310080 | elapsed time per iteration (s): 0.08 | learning rate: 2.543E-05 | global batch size: 256 | lm loss: 4.500428E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.257 | TFLOPs: 12.02 | 7: iteration 154420/ 173500 | consumed samples: 39531520 | consumed tokens: 80960552960 | elapsed time per iteration (s): 0.08 | learning rate: 2.543E-05 | global batch size: 256 | lm loss: 4.508220E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.076 | TFLOPs: 12.03 | 7: iteration 154430/ 173500 | consumed samples: 39534080 | consumed tokens: 80965795840 | elapsed time per iteration (s): 0.08 | learning rate: 2.542E-05 | global batch size: 256 | lm loss: 4.497981E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.831 | TFLOPs: 11.81 | 7: iteration 154440/ 173500 | consumed samples: 39536640 | consumed tokens: 80971038720 | elapsed time per iteration (s): 0.08 | learning rate: 2.541E-05 | global batch size: 256 | lm loss: 4.491630E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.969 | TFLOPs: 12.04 | 7: iteration 154450/ 173500 | consumed samples: 39539200 | consumed tokens: 80976281600 | elapsed time per iteration (s): 0.08 | learning rate: 2.541E-05 | global batch size: 256 | lm loss: 4.509016E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.564 | TFLOPs: 11.98 | 7: iteration 154460/ 173500 | consumed samples: 39541760 | consumed tokens: 80981524480 | elapsed time per iteration (s): 0.08 | learning rate: 2.540E-05 | global batch size: 256 | lm loss: 4.508854E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.497 | TFLOPs: 11.98 | 7: iteration 154470/ 173500 | consumed samples: 39544320 | consumed tokens: 80986767360 | elapsed time per iteration (s): 0.08 | learning rate: 2.540E-05 | global batch size: 256 | lm loss: 4.511502E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.184 | TFLOPs: 12.00 | 7: iteration 154480/ 173500 | consumed samples: 39546880 | consumed tokens: 80992010240 | elapsed time per iteration (s): 0.08 | learning rate: 2.539E-05 | global batch size: 256 | lm loss: 4.491386E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.442 | TFLOPs: 11.97 | 7: iteration 154490/ 173500 | consumed samples: 39549440 | consumed tokens: 80997253120 | elapsed time per iteration (s): 0.08 | learning rate: 2.539E-05 | global batch size: 256 | lm loss: 4.497032E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.591 | TFLOPs: 12.01 | 7: iteration 154500/ 173500 | consumed samples: 39552000 | consumed tokens: 81002496000 | elapsed time per iteration (s): 0.08 | learning rate: 2.538E-05 | global batch size: 256 | lm loss: 4.503392E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.920 | TFLOPs: 12.02 | 7: iteration 154510/ 173500 | consumed samples: 39554560 | consumed tokens: 81007738880 | elapsed time per iteration (s): 0.08 | learning rate: 2.537E-05 | global batch size: 256 | lm loss: 4.487852E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.677 | TFLOPs: 11.91 | 7: iteration 154520/ 173500 | consumed samples: 39557120 | consumed tokens: 81012981760 | elapsed time per iteration (s): 0.08 | learning rate: 2.537E-05 | global batch size: 256 | lm loss: 4.489445E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.964 | TFLOPs: 11.89 | 7: iteration 154530/ 173500 | consumed samples: 39559680 | consumed tokens: 81018224640 | elapsed time per iteration (s): 0.08 | learning rate: 2.536E-05 | global batch size: 256 | lm loss: 4.504563E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.535 | TFLOPs: 11.82 | 7: iteration 154540/ 173500 | consumed samples: 39562240 | consumed tokens: 81023467520 | elapsed time per iteration (s): 0.08 | learning rate: 2.536E-05 | global batch size: 256 | lm loss: 4.523312E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.436 | TFLOPs: 11.88 | 7: iteration 154550/ 173500 | consumed samples: 39564800 | consumed tokens: 81028710400 | elapsed time per iteration (s): 0.08 | learning rate: 2.535E-05 | global batch size: 256 | lm loss: 4.498243E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.075 | TFLOPs: 11.90 | 7: iteration 154560/ 173500 | consumed samples: 39567360 | consumed tokens: 81033953280 | elapsed time per iteration (s): 0.08 | learning rate: 2.535E-05 | global batch size: 256 | lm loss: 4.513036E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.107 | TFLOPs: 11.87 | 7: iteration 154570/ 173500 | consumed samples: 39569920 | consumed tokens: 81039196160 | elapsed time per iteration (s): 0.08 | learning rate: 2.534E-05 | global batch size: 256 | lm loss: 4.503429E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.024 | TFLOPs: 11.89 | 7: iteration 154580/ 173500 | consumed samples: 39572480 | consumed tokens: 81044439040 | elapsed time per iteration (s): 0.08 | learning rate: 2.534E-05 | global batch size: 256 | lm loss: 4.506844E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.934 | TFLOPs: 11.81 | 7: iteration 154590/ 173500 | consumed samples: 39575040 | consumed tokens: 81049681920 | elapsed time per iteration (s): 0.08 | learning rate: 2.533E-05 | global batch size: 256 | lm loss: 4.512565E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.501 | TFLOPs: 11.88 | 7: iteration 154600/ 173500 | consumed samples: 39577600 | consumed tokens: 81054924800 | elapsed time per iteration (s): 0.08 | learning rate: 2.532E-05 | global batch size: 256 | lm loss: 4.484492E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.967 | TFLOPs: 11.94 | 7: iteration 154610/ 173500 | consumed samples: 39580160 | consumed tokens: 81060167680 | elapsed time per iteration (s): 0.08 | learning rate: 2.532E-05 | global batch size: 256 | lm loss: 4.507809E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.780 | TFLOPs: 12.01 | 7: iteration 154620/ 173500 | consumed samples: 39582720 | consumed tokens: 81065410560 | elapsed time per iteration (s): 0.08 | learning rate: 2.531E-05 | global batch size: 256 | lm loss: 4.515373E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.078 | TFLOPs: 12.01 | 7: iteration 154630/ 173500 | consumed samples: 39585280 | consumed tokens: 81070653440 | elapsed time per iteration (s): 0.08 | learning rate: 2.531E-05 | global batch size: 256 | lm loss: 4.515502E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.679 | TFLOPs: 11.94 | 7: iteration 154640/ 173500 | consumed samples: 39587840 | consumed tokens: 81075896320 | elapsed time per iteration (s): 0.08 | learning rate: 2.530E-05 | global batch size: 256 | lm loss: 4.497401E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.690 | TFLOPs: 12.04 | 7: iteration 154650/ 173500 | consumed samples: 39590400 | consumed tokens: 81081139200 | elapsed time per iteration (s): 0.08 | learning rate: 2.530E-05 | global batch size: 256 | lm loss: 4.507180E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.241 | TFLOPs: 12.04 | 7: iteration 154660/ 173500 | consumed samples: 39592960 | consumed tokens: 81086382080 | elapsed time per iteration (s): 0.08 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 4.505917E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.902 | TFLOPs: 11.98 | 7: iteration 154670/ 173500 | consumed samples: 39595520 | consumed tokens: 81091624960 | elapsed time per iteration (s): 0.08 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 4.509930E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.371 | TFLOPs: 12.06 | 7: iteration 154680/ 173500 | consumed samples: 39598080 | consumed tokens: 81096867840 | elapsed time per iteration (s): 0.08 | learning rate: 2.528E-05 | global batch size: 256 | lm loss: 4.520166E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.001 | TFLOPs: 12.04 | 7: iteration 154690/ 173500 | consumed samples: 39600640 | consumed tokens: 81102110720 | elapsed time per iteration (s): 0.08 | learning rate: 2.527E-05 | global batch size: 256 | lm loss: 4.513760E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.486 | TFLOPs: 12.05 | 7: iteration 154700/ 173500 | consumed samples: 39603200 | consumed tokens: 81107353600 | elapsed time per iteration (s): 0.08 | learning rate: 2.527E-05 | global batch size: 256 | lm loss: 4.505707E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.085 | TFLOPs: 11.99 | 7: iteration 154710/ 173500 | consumed samples: 39605760 | consumed tokens: 81112596480 | elapsed time per iteration (s): 0.08 | learning rate: 2.526E-05 | global batch size: 256 | lm loss: 4.511203E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.511 | TFLOPs: 11.90 | 7: iteration 154720/ 173500 | consumed samples: 39608320 | consumed tokens: 81117839360 | elapsed time per iteration (s): 0.08 | learning rate: 2.526E-05 | global batch size: 256 | lm loss: 4.517973E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.588 | TFLOPs: 11.89 | 7: iteration 154730/ 173500 | consumed samples: 39610880 | consumed tokens: 81123082240 | elapsed time per iteration (s): 0.08 | learning rate: 2.525E-05 | global batch size: 256 | lm loss: 4.513319E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.545 | TFLOPs: 11.81 | 7: iteration 154740/ 173500 | consumed samples: 39613440 | consumed tokens: 81128325120 | elapsed time per iteration (s): 0.08 | learning rate: 2.525E-05 | global batch size: 256 | lm loss: 4.500520E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.768 | TFLOPs: 11.78 | 7: iteration 154750/ 173500 | consumed samples: 39616000 | consumed tokens: 81133568000 | elapsed time per iteration (s): 0.08 | learning rate: 2.524E-05 | global batch size: 256 | lm loss: 4.515250E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.019 | TFLOPs: 11.92 | 7: iteration 154760/ 173500 | consumed samples: 39618560 | consumed tokens: 81138810880 | elapsed time per iteration (s): 0.08 | learning rate: 2.524E-05 | global batch size: 256 | lm loss: 4.497987E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.436 | TFLOPs: 11.90 | 7: iteration 154770/ 173500 | consumed samples: 39621120 | consumed tokens: 81144053760 | elapsed time per iteration (s): 0.08 | learning rate: 2.523E-05 | global batch size: 256 | lm loss: 4.516686E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.925 | TFLOPs: 11.90 | 7: iteration 154780/ 173500 | consumed samples: 39623680 | consumed tokens: 81149296640 | elapsed time per iteration (s): 0.08 | learning rate: 2.522E-05 | global batch size: 256 | lm loss: 4.512032E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.577 | TFLOPs: 11.95 | 7: iteration 154790/ 173500 | consumed samples: 39626240 | consumed tokens: 81154539520 | elapsed time per iteration (s): 0.08 | learning rate: 2.522E-05 | global batch size: 256 | lm loss: 4.508475E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.204 | TFLOPs: 11.93 | 7: iteration 154800/ 173500 | consumed samples: 39628800 | consumed tokens: 81159782400 | elapsed time per iteration (s): 0.08 | learning rate: 2.521E-05 | global batch size: 256 | lm loss: 4.499473E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.769 | TFLOPs: 11.92 | 7: iteration 154810/ 173500 | consumed samples: 39631360 | consumed tokens: 81165025280 | elapsed time per iteration (s): 0.08 | learning rate: 2.521E-05 | global batch size: 256 | lm loss: 4.498461E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.770 | TFLOPs: 11.89 | 7: iteration 154820/ 173500 | consumed samples: 39633920 | consumed tokens: 81170268160 | elapsed time per iteration (s): 0.08 | learning rate: 2.520E-05 | global batch size: 256 | lm loss: 4.503550E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.889 | TFLOPs: 11.87 | 7: iteration 154830/ 173500 | consumed samples: 39636480 | consumed tokens: 81175511040 | elapsed time per iteration (s): 0.08 | learning rate: 2.520E-05 | global batch size: 256 | lm loss: 4.505236E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.585 | TFLOPs: 11.93 | 7: iteration 154840/ 173500 | consumed samples: 39639040 | consumed tokens: 81180753920 | elapsed time per iteration (s): 0.08 | learning rate: 2.519E-05 | global batch size: 256 | lm loss: 4.504137E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.455 | TFLOPs: 11.92 | 7: iteration 154850/ 173500 | consumed samples: 39641600 | consumed tokens: 81185996800 | elapsed time per iteration (s): 0.08 | learning rate: 2.519E-05 | global batch size: 256 | lm loss: 4.525301E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.277 | TFLOPs: 11.85 | 7: iteration 154860/ 173500 | consumed samples: 39644160 | consumed tokens: 81191239680 | elapsed time per iteration (s): 0.08 | learning rate: 2.518E-05 | global batch size: 256 | lm loss: 4.510218E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.411 | TFLOPs: 11.84 | 7: iteration 154870/ 173500 | consumed samples: 39646720 | consumed tokens: 81196482560 | elapsed time per iteration (s): 0.08 | learning rate: 2.517E-05 | global batch size: 256 | lm loss: 4.503827E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.608 | TFLOPs: 11.93 | 7: iteration 154880/ 173500 | consumed samples: 39649280 | consumed tokens: 81201725440 | elapsed time per iteration (s): 0.08 | learning rate: 2.517E-05 | global batch size: 256 | lm loss: 4.517004E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.792 | TFLOPs: 11.62 | 7: iteration 154890/ 173500 | consumed samples: 39651840 | consumed tokens: 81206968320 | elapsed time per iteration (s): 0.08 | learning rate: 2.516E-05 | global batch size: 256 | lm loss: 4.506493E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.935 | TFLOPs: 11.85 | 7: iteration 154900/ 173500 | consumed samples: 39654400 | consumed tokens: 81212211200 | elapsed time per iteration (s): 0.08 | learning rate: 2.516E-05 | global batch size: 256 | lm loss: 4.505587E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.741 | TFLOPs: 11.87 | 7: iteration 154910/ 173500 | consumed samples: 39656960 | consumed tokens: 81217454080 | elapsed time per iteration (s): 0.08 | learning rate: 2.515E-05 | global batch size: 256 | lm loss: 4.507003E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.951 | TFLOPs: 11.89 | 7: iteration 154920/ 173500 | consumed samples: 39659520 | consumed tokens: 81222696960 | elapsed time per iteration (s): 0.08 | learning rate: 2.515E-05 | global batch size: 256 | lm loss: 4.506313E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.025 | TFLOPs: 11.90 | 7: iteration 154930/ 173500 | consumed samples: 39662080 | consumed tokens: 81227939840 | elapsed time per iteration (s): 0.08 | learning rate: 2.514E-05 | global batch size: 256 | lm loss: 4.508851E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.453 | TFLOPs: 11.83 | 7: iteration 154940/ 173500 | consumed samples: 39664640 | consumed tokens: 81233182720 | elapsed time per iteration (s): 0.08 | learning rate: 2.514E-05 | global batch size: 256 | lm loss: 4.498389E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.455 | TFLOPs: 11.92 | 7: iteration 154950/ 173500 | consumed samples: 39667200 | consumed tokens: 81238425600 | elapsed time per iteration (s): 0.08 | learning rate: 2.513E-05 | global batch size: 256 | lm loss: 4.498314E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.431 | TFLOPs: 11.92 | 7: iteration 154960/ 173500 | consumed samples: 39669760 | consumed tokens: 81243668480 | elapsed time per iteration (s): 0.08 | learning rate: 2.513E-05 | global batch size: 256 | lm loss: 4.512326E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.038 | TFLOPs: 11.94 | 7: iteration 154970/ 173500 | consumed samples: 39672320 | consumed tokens: 81248911360 | elapsed time per iteration (s): 0.08 | learning rate: 2.512E-05 | global batch size: 256 | lm loss: 4.513309E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.898 | TFLOPs: 11.84 | 7: iteration 154980/ 173500 | consumed samples: 39674880 | consumed tokens: 81254154240 | elapsed time per iteration (s): 0.08 | learning rate: 2.511E-05 | global batch size: 256 | lm loss: 4.517976E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.793 | TFLOPs: 11.89 | 7: iteration 154990/ 173500 | consumed samples: 39677440 | consumed tokens: 81259397120 | elapsed time per iteration (s): 0.08 | learning rate: 2.511E-05 | global batch size: 256 | lm loss: 4.511748E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.979 | TFLOPs: 11.93 | 7: iteration 155000/ 173500 | consumed samples: 39680000 | consumed tokens: 81264640000 | elapsed time per iteration (s): 0.08 | learning rate: 2.510E-05 | global batch size: 256 | lm loss: 4.501076E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.173 | TFLOPs: 11.91 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 155000 | lm loss value: 4.442116E+00 | lm loss PPL: 8.495450E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 155000 to checkpoints_14m91b100m 0: [2023-03-17 04:02:56,154] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step155000 is begin to save! 0: [2023-03-17 04:02:56,157] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:02:56,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:02:56,184] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:02:56,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:02:56,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:02:56,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:02:56,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:02:56,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:02:56,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:02:56,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:02:56,196] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:02:56,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:02:56,197] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step155000/mp_rank_00_model_states.pt 0: [2023-03-17 04:02:56,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:02:56,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:02:56,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:02:56,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,227] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,228] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,228] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 4: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 3: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 2: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 6: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 5: [2023-03-17 04:02:56,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:02:56,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:02:56,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 1: [2023-03-17 04:02:56,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:02:56,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:02:56,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 7: [2023-03-17 04:02:56,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:02:56,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step155000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:02:56,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step155000 is ready now! 0: successfully saved checkpoint at iteration 155000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.72 7: iteration 155010/ 173500 | consumed samples: 39682560 | consumed tokens: 81269882880 | elapsed time per iteration (s): 0.09 | learning rate: 2.510E-05 | global batch size: 256 | lm loss: 4.488181E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.794 | TFLOPs: 10.42 | 7: iteration 155020/ 173500 | consumed samples: 39685120 | consumed tokens: 81275125760 | elapsed time per iteration (s): 0.08 | learning rate: 2.509E-05 | global batch size: 256 | lm loss: 4.490488E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.123 | TFLOPs: 11.88 | 7: iteration 155030/ 173500 | consumed samples: 39687680 | consumed tokens: 81280368640 | elapsed time per iteration (s): 0.08 | learning rate: 2.509E-05 | global batch size: 256 | lm loss: 4.493180E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.117 | TFLOPs: 11.89 | 7: iteration 155040/ 173500 | consumed samples: 39690240 | consumed tokens: 81285611520 | elapsed time per iteration (s): 0.08 | learning rate: 2.508E-05 | global batch size: 256 | lm loss: 4.506302E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.504 | TFLOPs: 11.85 | 7: iteration 155050/ 173500 | consumed samples: 39692800 | consumed tokens: 81290854400 | elapsed time per iteration (s): 0.08 | learning rate: 2.508E-05 | global batch size: 256 | lm loss: 4.501940E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.832 | TFLOPs: 11.89 | 7: iteration 155060/ 173500 | consumed samples: 39695360 | consumed tokens: 81296097280 | elapsed time per iteration (s): 0.08 | learning rate: 2.507E-05 | global batch size: 256 | lm loss: 4.498887E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.033 | TFLOPs: 11.83 | 7: iteration 155070/ 173500 | consumed samples: 39697920 | consumed tokens: 81301340160 | elapsed time per iteration (s): 0.09 | learning rate: 2.507E-05 | global batch size: 256 | lm loss: 4.511679E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.250 | TFLOPs: 10.33 | 7: iteration 155080/ 173500 | consumed samples: 39700480 | consumed tokens: 81306583040 | elapsed time per iteration (s): 0.08 | learning rate: 2.506E-05 | global batch size: 256 | lm loss: 4.495042E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.272 | TFLOPs: 11.78 | 7: iteration 155090/ 173500 | consumed samples: 39703040 | consumed tokens: 81311825920 | elapsed time per iteration (s): 0.08 | learning rate: 2.505E-05 | global batch size: 256 | lm loss: 4.496865E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.500 | TFLOPs: 11.80 | 7: iteration 155100/ 173500 | consumed samples: 39705600 | consumed tokens: 81317068800 | elapsed time per iteration (s): 0.08 | learning rate: 2.505E-05 | global batch size: 256 | lm loss: 4.504885E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.063 | TFLOPs: 11.82 | 7: iteration 155110/ 173500 | consumed samples: 39708160 | consumed tokens: 81322311680 | elapsed time per iteration (s): 0.08 | learning rate: 2.504E-05 | global batch size: 256 | lm loss: 4.507973E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.711 | TFLOPs: 11.78 | 7: iteration 155120/ 173500 | consumed samples: 39710720 | consumed tokens: 81327554560 | elapsed time per iteration (s): 0.08 | learning rate: 2.504E-05 | global batch size: 256 | lm loss: 4.498874E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.146 | TFLOPs: 11.82 | 7: iteration 155130/ 173500 | consumed samples: 39713280 | consumed tokens: 81332797440 | elapsed time per iteration (s): 0.11 | learning rate: 2.503E-05 | global batch size: 256 | lm loss: 4.503127E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2332.305 | TFLOPs: 8.68 | 7: iteration 155140/ 173500 | consumed samples: 39715840 | consumed tokens: 81338040320 | elapsed time per iteration (s): 0.08 | learning rate: 2.503E-05 | global batch size: 256 | lm loss: 4.507105E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.435 | TFLOPs: 11.89 | 7: iteration 155150/ 173500 | consumed samples: 39718400 | consumed tokens: 81343283200 | elapsed time per iteration (s): 0.08 | learning rate: 2.502E-05 | global batch size: 256 | lm loss: 4.502590E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.131 | TFLOPs: 11.88 | 7: iteration 155160/ 173500 | consumed samples: 39720960 | consumed tokens: 81348526080 | elapsed time per iteration (s): 0.08 | learning rate: 2.502E-05 | global batch size: 256 | lm loss: 4.503847E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.969 | TFLOPs: 12.02 | 7: iteration 155170/ 173500 | consumed samples: 39723520 | consumed tokens: 81353768960 | elapsed time per iteration (s): 0.08 | learning rate: 2.501E-05 | global batch size: 256 | lm loss: 4.521067E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.820 | TFLOPs: 12.02 | 7: iteration 155180/ 173500 | consumed samples: 39726080 | consumed tokens: 81359011840 | elapsed time per iteration (s): 0.08 | learning rate: 2.501E-05 | global batch size: 256 | lm loss: 4.496756E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.952 | TFLOPs: 12.04 | 7: iteration 155190/ 173500 | consumed samples: 39728640 | consumed tokens: 81364254720 | elapsed time per iteration (s): 0.08 | learning rate: 2.500E-05 | global batch size: 256 | lm loss: 4.504556E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.606 | TFLOPs: 11.97 | 7: iteration 155200/ 173500 | consumed samples: 39731200 | consumed tokens: 81369497600 | elapsed time per iteration (s): 0.08 | learning rate: 2.499E-05 | global batch size: 256 | lm loss: 4.511600E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.799 | TFLOPs: 11.91 | 7: iteration 155210/ 173500 | consumed samples: 39733760 | consumed tokens: 81374740480 | elapsed time per iteration (s): 0.08 | learning rate: 2.499E-05 | global batch size: 256 | lm loss: 4.510270E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.629 | TFLOPs: 11.96 | 7: iteration 155220/ 173500 | consumed samples: 39736320 | consumed tokens: 81379983360 | elapsed time per iteration (s): 0.08 | learning rate: 2.498E-05 | global batch size: 256 | lm loss: 4.510656E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.635 | TFLOPs: 11.21 | 7: iteration 155230/ 173500 | consumed samples: 39738880 | consumed tokens: 81385226240 | elapsed time per iteration (s): 0.08 | learning rate: 2.498E-05 | global batch size: 256 | lm loss: 4.514812E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.372 | TFLOPs: 11.96 | 7: iteration 155240/ 173500 | consumed samples: 39741440 | consumed tokens: 81390469120 | elapsed time per iteration (s): 0.08 | learning rate: 2.497E-05 | global batch size: 256 | lm loss: 4.503389E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.259 | TFLOPs: 11.94 | 7: iteration 155250/ 173500 | consumed samples: 39744000 | consumed tokens: 81395712000 | elapsed time per iteration (s): 0.08 | learning rate: 2.497E-05 | global batch size: 256 | lm loss: 4.515283E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.980 | TFLOPs: 11.94 | 7: iteration 155260/ 173500 | consumed samples: 39746560 | consumed tokens: 81400954880 | elapsed time per iteration (s): 0.08 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 4.502393E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.619 | TFLOPs: 11.94 | 7: iteration 155270/ 173500 | consumed samples: 39749120 | consumed tokens: 81406197760 | elapsed time per iteration (s): 0.08 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 4.506811E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.100 | TFLOPs: 11.94 | 7: iteration 155280/ 173500 | consumed samples: 39751680 | consumed tokens: 81411440640 | elapsed time per iteration (s): 0.08 | learning rate: 2.495E-05 | global batch size: 256 | lm loss: 4.498236E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.364 | TFLOPs: 11.99 | 7: iteration 155290/ 173500 | consumed samples: 39754240 | consumed tokens: 81416683520 | elapsed time per iteration (s): 0.08 | learning rate: 2.495E-05 | global batch size: 256 | lm loss: 4.508778E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.322 | TFLOPs: 11.93 | 7: iteration 155300/ 173500 | consumed samples: 39756800 | consumed tokens: 81421926400 | elapsed time per iteration (s): 0.08 | learning rate: 2.494E-05 | global batch size: 256 | lm loss: 4.513444E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.562 | TFLOPs: 11.90 | 7: iteration 155310/ 173500 | consumed samples: 39759360 | consumed tokens: 81427169280 | elapsed time per iteration (s): 0.08 | learning rate: 2.494E-05 | global batch size: 256 | lm loss: 4.511167E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.739 | TFLOPs: 11.96 | 7: iteration 155320/ 173500 | consumed samples: 39761920 | consumed tokens: 81432412160 | elapsed time per iteration (s): 0.08 | learning rate: 2.493E-05 | global batch size: 256 | lm loss: 4.503580E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.420 | TFLOPs: 12.02 | 7: iteration 155330/ 173500 | consumed samples: 39764480 | consumed tokens: 81437655040 | elapsed time per iteration (s): 0.08 | learning rate: 2.492E-05 | global batch size: 256 | lm loss: 4.497155E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.059 | TFLOPs: 11.97 | 7: iteration 155340/ 173500 | consumed samples: 39767040 | consumed tokens: 81442897920 | elapsed time per iteration (s): 0.08 | learning rate: 2.492E-05 | global batch size: 256 | lm loss: 4.502829E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.446 | TFLOPs: 11.98 | 7: iteration 155350/ 173500 | consumed samples: 39769600 | consumed tokens: 81448140800 | elapsed time per iteration (s): 0.08 | learning rate: 2.491E-05 | global batch size: 256 | lm loss: 4.522995E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.638 | TFLOPs: 12.00 | 7: iteration 155360/ 173500 | consumed samples: 39772160 | consumed tokens: 81453383680 | elapsed time per iteration (s): 0.08 | learning rate: 2.491E-05 | global batch size: 256 | lm loss: 4.502856E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.289 | TFLOPs: 11.94 | 7: iteration 155370/ 173500 | consumed samples: 39774720 | consumed tokens: 81458626560 | elapsed time per iteration (s): 0.08 | learning rate: 2.490E-05 | global batch size: 256 | lm loss: 4.514563E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.367 | TFLOPs: 11.88 | 7: iteration 155380/ 173500 | consumed samples: 39777280 | consumed tokens: 81463869440 | elapsed time per iteration (s): 0.08 | learning rate: 2.490E-05 | global batch size: 256 | lm loss: 4.502988E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.827 | TFLOPs: 11.81 | 7: iteration 155390/ 173500 | consumed samples: 39779840 | consumed tokens: 81469112320 | elapsed time per iteration (s): 0.08 | learning rate: 2.489E-05 | global batch size: 256 | lm loss: 4.520256E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.678 | TFLOPs: 11.88 | 7: iteration 155400/ 173500 | consumed samples: 39782400 | consumed tokens: 81474355200 | elapsed time per iteration (s): 0.08 | learning rate: 2.489E-05 | global batch size: 256 | lm loss: 4.513485E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.847 | TFLOPs: 11.91 | 7: iteration 155410/ 173500 | consumed samples: 39784960 | consumed tokens: 81479598080 | elapsed time per iteration (s): 0.08 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 4.507096E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.941 | TFLOPs: 11.94 | 7: iteration 155420/ 173500 | consumed samples: 39787520 | consumed tokens: 81484840960 | elapsed time per iteration (s): 0.08 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 4.503869E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.862 | TFLOPs: 12.00 | 7: iteration 155430/ 173500 | consumed samples: 39790080 | consumed tokens: 81490083840 | elapsed time per iteration (s): 0.08 | learning rate: 2.487E-05 | global batch size: 256 | lm loss: 4.507481E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.599 | TFLOPs: 12.00 | 7: iteration 155440/ 173500 | consumed samples: 39792640 | consumed tokens: 81495326720 | elapsed time per iteration (s): 0.08 | learning rate: 2.487E-05 | global batch size: 256 | lm loss: 4.496306E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.748 | TFLOPs: 12.06 | 7: iteration 155450/ 173500 | consumed samples: 39795200 | consumed tokens: 81500569600 | elapsed time per iteration (s): 0.08 | learning rate: 2.486E-05 | global batch size: 256 | lm loss: 4.490580E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.723 | TFLOPs: 12.04 | 7: iteration 155460/ 173500 | consumed samples: 39797760 | consumed tokens: 81505812480 | elapsed time per iteration (s): 0.08 | learning rate: 2.486E-05 | global batch size: 256 | lm loss: 4.520657E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.063 | TFLOPs: 12.01 | 7: iteration 155470/ 173500 | consumed samples: 39800320 | consumed tokens: 81511055360 | elapsed time per iteration (s): 0.08 | learning rate: 2.485E-05 | global batch size: 256 | lm loss: 4.508730E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.894 | TFLOPs: 12.04 | 7: iteration 155480/ 173500 | consumed samples: 39802880 | consumed tokens: 81516298240 | elapsed time per iteration (s): 0.08 | learning rate: 2.484E-05 | global batch size: 256 | lm loss: 4.495747E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.494 | TFLOPs: 12.05 | 7: iteration 155490/ 173500 | consumed samples: 39805440 | consumed tokens: 81521541120 | elapsed time per iteration (s): 0.08 | learning rate: 2.484E-05 | global batch size: 256 | lm loss: 4.504417E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.889 | TFLOPs: 12.04 | 7: iteration 155500/ 173500 | consumed samples: 39808000 | consumed tokens: 81526784000 | elapsed time per iteration (s): 0.08 | learning rate: 2.483E-05 | global batch size: 256 | lm loss: 4.503703E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.405 | TFLOPs: 12.04 | 7: iteration 155510/ 173500 | consumed samples: 39810560 | consumed tokens: 81532026880 | elapsed time per iteration (s): 0.08 | learning rate: 2.483E-05 | global batch size: 256 | lm loss: 4.501533E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.145 | TFLOPs: 11.92 | 7: iteration 155520/ 173500 | consumed samples: 39813120 | consumed tokens: 81537269760 | elapsed time per iteration (s): 0.08 | learning rate: 2.482E-05 | global batch size: 256 | lm loss: 4.495590E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.753 | TFLOPs: 11.84 | 7: iteration 155530/ 173500 | consumed samples: 39815680 | consumed tokens: 81542512640 | elapsed time per iteration (s): 0.08 | learning rate: 2.482E-05 | global batch size: 256 | lm loss: 4.508429E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.732 | TFLOPs: 11.84 | 7: iteration 155540/ 173500 | consumed samples: 39818240 | consumed tokens: 81547755520 | elapsed time per iteration (s): 0.08 | learning rate: 2.481E-05 | global batch size: 256 | lm loss: 4.515997E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.635 | TFLOPs: 11.89 | 7: iteration 155550/ 173500 | consumed samples: 39820800 | consumed tokens: 81552998400 | elapsed time per iteration (s): 0.08 | learning rate: 2.481E-05 | global batch size: 256 | lm loss: 4.503290E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.723 | TFLOPs: 11.83 | 7: iteration 155560/ 173500 | consumed samples: 39823360 | consumed tokens: 81558241280 | elapsed time per iteration (s): 0.08 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 4.510537E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.177 | TFLOPs: 11.85 | 7: iteration 155570/ 173500 | consumed samples: 39825920 | consumed tokens: 81563484160 | elapsed time per iteration (s): 0.08 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 4.504016E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.194 | TFLOPs: 11.81 | 7: iteration 155580/ 173500 | consumed samples: 39828480 | consumed tokens: 81568727040 | elapsed time per iteration (s): 0.08 | learning rate: 2.479E-05 | global batch size: 256 | lm loss: 4.499229E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.938 | TFLOPs: 11.88 | 7: iteration 155590/ 173500 | consumed samples: 39831040 | consumed tokens: 81573969920 | elapsed time per iteration (s): 0.08 | learning rate: 2.479E-05 | global batch size: 256 | lm loss: 4.510648E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.858 | TFLOPs: 11.85 | 7: iteration 155600/ 173500 | consumed samples: 39833600 | consumed tokens: 81579212800 | elapsed time per iteration (s): 0.08 | learning rate: 2.478E-05 | global batch size: 256 | lm loss: 4.507352E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.254 | TFLOPs: 11.78 | 7: iteration 155610/ 173500 | consumed samples: 39836160 | consumed tokens: 81584455680 | elapsed time per iteration (s): 0.08 | learning rate: 2.478E-05 | global batch size: 256 | lm loss: 4.502161E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.846 | TFLOPs: 11.77 | 7: iteration 155620/ 173500 | consumed samples: 39838720 | consumed tokens: 81589698560 | elapsed time per iteration (s): 0.08 | learning rate: 2.477E-05 | global batch size: 256 | lm loss: 4.502975E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.070 | TFLOPs: 11.85 | 7: iteration 155630/ 173500 | consumed samples: 39841280 | consumed tokens: 81594941440 | elapsed time per iteration (s): 0.08 | learning rate: 2.476E-05 | global batch size: 256 | lm loss: 4.517844E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.915 | TFLOPs: 11.83 | 7: iteration 155640/ 173500 | consumed samples: 39843840 | consumed tokens: 81600184320 | elapsed time per iteration (s): 0.08 | learning rate: 2.476E-05 | global batch size: 256 | lm loss: 4.510718E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.725 | TFLOPs: 11.80 | 7: iteration 155650/ 173500 | consumed samples: 39846400 | consumed tokens: 81605427200 | elapsed time per iteration (s): 0.08 | learning rate: 2.475E-05 | global batch size: 256 | lm loss: 4.504251E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.783 | TFLOPs: 11.85 | 7: iteration 155660/ 173500 | consumed samples: 39848960 | consumed tokens: 81610670080 | elapsed time per iteration (s): 0.08 | learning rate: 2.475E-05 | global batch size: 256 | lm loss: 4.507926E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.319 | TFLOPs: 11.84 | 7: iteration 155670/ 173500 | consumed samples: 39851520 | consumed tokens: 81615912960 | elapsed time per iteration (s): 0.11 | learning rate: 2.474E-05 | global batch size: 256 | lm loss: 4.514778E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2429.524 | TFLOPs: 9.04 | 7: iteration 155680/ 173500 | consumed samples: 39854080 | consumed tokens: 81621155840 | elapsed time per iteration (s): 0.08 | learning rate: 2.474E-05 | global batch size: 256 | lm loss: 4.505230E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.951 | TFLOPs: 11.30 | 7: iteration 155690/ 173500 | consumed samples: 39856640 | consumed tokens: 81626398720 | elapsed time per iteration (s): 0.08 | learning rate: 2.473E-05 | global batch size: 256 | lm loss: 4.508778E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.647 | TFLOPs: 11.94 | 7: iteration 155700/ 173500 | consumed samples: 39859200 | consumed tokens: 81631641600 | elapsed time per iteration (s): 0.08 | learning rate: 2.473E-05 | global batch size: 256 | lm loss: 4.508167E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.448 | TFLOPs: 11.97 | 7: iteration 155710/ 173500 | consumed samples: 39861760 | consumed tokens: 81636884480 | elapsed time per iteration (s): 0.08 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 4.516502E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.966 | TFLOPs: 11.89 | 7: iteration 155720/ 173500 | consumed samples: 39864320 | consumed tokens: 81642127360 | elapsed time per iteration (s): 0.08 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 4.505972E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.389 | TFLOPs: 11.95 | 7: iteration 155730/ 173500 | consumed samples: 39866880 | consumed tokens: 81647370240 | elapsed time per iteration (s): 0.08 | learning rate: 2.471E-05 | global batch size: 256 | lm loss: 4.496149E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.470 | TFLOPs: 11.88 | 7: iteration 155740/ 173500 | consumed samples: 39869440 | consumed tokens: 81652613120 | elapsed time per iteration (s): 0.09 | learning rate: 2.471E-05 | global batch size: 256 | lm loss: 4.508051E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.590 | TFLOPs: 10.38 | 7: iteration 155750/ 173500 | consumed samples: 39872000 | consumed tokens: 81657856000 | elapsed time per iteration (s): 0.09 | learning rate: 2.470E-05 | global batch size: 256 | lm loss: 4.511898E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2820.748 | TFLOPs: 10.49 | 7: iteration 155760/ 173500 | consumed samples: 39874560 | consumed tokens: 81663098880 | elapsed time per iteration (s): 0.08 | learning rate: 2.470E-05 | global batch size: 256 | lm loss: 4.511576E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.193 | TFLOPs: 11.96 | 7: iteration 155770/ 173500 | consumed samples: 39877120 | consumed tokens: 81668341760 | elapsed time per iteration (s): 0.08 | learning rate: 2.469E-05 | global batch size: 256 | lm loss: 4.510059E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.681 | TFLOPs: 11.95 | 7: iteration 155780/ 173500 | consumed samples: 39879680 | consumed tokens: 81673584640 | elapsed time per iteration (s): 0.08 | learning rate: 2.469E-05 | global batch size: 256 | lm loss: 4.505314E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.927 | TFLOPs: 11.95 | 7: iteration 155790/ 173500 | consumed samples: 39882240 | consumed tokens: 81678827520 | elapsed time per iteration (s): 0.08 | learning rate: 2.468E-05 | global batch size: 256 | lm loss: 4.501954E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.393 | TFLOPs: 11.89 | 7: iteration 155800/ 173500 | consumed samples: 39884800 | consumed tokens: 81684070400 | elapsed time per iteration (s): 0.08 | learning rate: 2.468E-05 | global batch size: 256 | lm loss: 4.507796E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.561 | TFLOPs: 11.95 | 7: iteration 155810/ 173500 | consumed samples: 39887360 | consumed tokens: 81689313280 | elapsed time per iteration (s): 0.08 | learning rate: 2.467E-05 | global batch size: 256 | lm loss: 4.513824E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.465 | TFLOPs: 11.80 | 7: iteration 155820/ 173500 | consumed samples: 39889920 | consumed tokens: 81694556160 | elapsed time per iteration (s): 0.08 | learning rate: 2.466E-05 | global batch size: 256 | lm loss: 4.495787E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.722 | TFLOPs: 11.94 | 7: iteration 155830/ 173500 | consumed samples: 39892480 | consumed tokens: 81699799040 | elapsed time per iteration (s): 0.08 | learning rate: 2.466E-05 | global batch size: 256 | lm loss: 4.500118E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.171 | TFLOPs: 11.95 | 7: iteration 155840/ 173500 | consumed samples: 39895040 | consumed tokens: 81705041920 | elapsed time per iteration (s): 0.08 | learning rate: 2.465E-05 | global batch size: 256 | lm loss: 4.505287E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.530 | TFLOPs: 11.95 | 7: iteration 155850/ 173500 | consumed samples: 39897600 | consumed tokens: 81710284800 | elapsed time per iteration (s): 0.08 | learning rate: 2.465E-05 | global batch size: 256 | lm loss: 4.503046E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.401 | TFLOPs: 11.93 | 7: iteration 155860/ 173500 | consumed samples: 39900160 | consumed tokens: 81715527680 | elapsed time per iteration (s): 0.08 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 4.500515E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.810 | TFLOPs: 11.93 | 7: iteration 155870/ 173500 | consumed samples: 39902720 | consumed tokens: 81720770560 | elapsed time per iteration (s): 0.08 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 4.518596E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.490 | TFLOPs: 11.96 | 7: iteration 155880/ 173500 | consumed samples: 39905280 | consumed tokens: 81726013440 | elapsed time per iteration (s): 0.08 | learning rate: 2.463E-05 | global batch size: 256 | lm loss: 4.494773E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.447 | TFLOPs: 11.92 | 7: iteration 155890/ 173500 | consumed samples: 39907840 | consumed tokens: 81731256320 | elapsed time per iteration (s): 0.08 | learning rate: 2.463E-05 | global batch size: 256 | lm loss: 4.502583E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.045 | TFLOPs: 11.70 | 7: iteration 155900/ 173500 | consumed samples: 39910400 | consumed tokens: 81736499200 | elapsed time per iteration (s): 0.08 | learning rate: 2.462E-05 | global batch size: 256 | lm loss: 4.499147E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.962 | TFLOPs: 11.96 | 7: iteration 155910/ 173500 | consumed samples: 39912960 | consumed tokens: 81741742080 | elapsed time per iteration (s): 0.08 | learning rate: 2.462E-05 | global batch size: 256 | lm loss: 4.503722E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.482 | TFLOPs: 11.91 | 7: iteration 155920/ 173500 | consumed samples: 39915520 | consumed tokens: 81746984960 | elapsed time per iteration (s): 0.08 | learning rate: 2.461E-05 | global batch size: 256 | lm loss: 4.504923E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.661 | TFLOPs: 11.90 | 7: iteration 155930/ 173500 | consumed samples: 39918080 | consumed tokens: 81752227840 | elapsed time per iteration (s): 0.08 | learning rate: 2.461E-05 | global batch size: 256 | lm loss: 4.507397E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.388 | TFLOPs: 11.94 | 7: iteration 155940/ 173500 | consumed samples: 39920640 | consumed tokens: 81757470720 | elapsed time per iteration (s): 0.08 | learning rate: 2.460E-05 | global batch size: 256 | lm loss: 4.494056E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.430 | TFLOPs: 11.93 | 7: iteration 155950/ 173500 | consumed samples: 39923200 | consumed tokens: 81762713600 | elapsed time per iteration (s): 0.08 | learning rate: 2.460E-05 | global batch size: 256 | lm loss: 4.494046E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.923 | TFLOPs: 11.71 | 7: iteration 155960/ 173500 | consumed samples: 39925760 | consumed tokens: 81767956480 | elapsed time per iteration (s): 0.08 | learning rate: 2.459E-05 | global batch size: 256 | lm loss: 4.494669E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.190 | TFLOPs: 11.90 | 7: iteration 155970/ 173500 | consumed samples: 39928320 | consumed tokens: 81773199360 | elapsed time per iteration (s): 0.08 | learning rate: 2.459E-05 | global batch size: 256 | lm loss: 4.515845E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.125 | TFLOPs: 11.53 | 7: iteration 155980/ 173500 | consumed samples: 39930880 | consumed tokens: 81778442240 | elapsed time per iteration (s): 0.08 | learning rate: 2.458E-05 | global batch size: 256 | lm loss: 4.509132E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.227 | TFLOPs: 11.88 | 7: iteration 155990/ 173500 | consumed samples: 39933440 | consumed tokens: 81783685120 | elapsed time per iteration (s): 0.08 | learning rate: 2.458E-05 | global batch size: 256 | lm loss: 4.502997E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.421 | TFLOPs: 11.85 | 0: [2023-03-17 04:04:17,201] [INFO] [logging.py:68:log_dist] [Rank 0] step=156000, skipped=0, lr=[2.4571227150894576e-05, 2.4571227150894576e-05, 2.4571227150894576e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 156000/ 173500 | consumed samples: 39936000 | consumed tokens: 81788928000 | elapsed time per iteration (s): 0.08 | learning rate: 2.457E-05 | global batch size: 256 | lm loss: 4.518445E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.015 | TFLOPs: 11.93 | 0: steps: 156000 loss: 4.5030 iter time (s): 0.080 samples/sec: 3198.453 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 156000 | lm loss value: 4.408175E+00 | lm loss PPL: 8.211950E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 156000 to checkpoints_14m91b100m 0: [2023-03-17 04:04:17,257] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step156000 is begin to save! 0: [2023-03-17 04:04:17,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:04:17,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:04:17,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:04:17,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:04:17,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:04:17,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:04:17,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:04:17,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:04:17,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:04:17,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:04:17,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:04:17,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:04:17,300] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step156000/mp_rank_00_model_states.pt 0: [2023-03-17 04:04:17,300] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:04:17,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:04:17,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:04:17,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:04:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:04:17,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:04:17,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,328] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:04:17,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 4: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 7: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 2: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 1: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 5: [2023-03-17 04:04:17,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:04:17,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 3: [2023-03-17 04:04:17,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:04:17,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:04:17,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 6: [2023-03-17 04:04:17,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:04:17,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step156000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:04:17,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step156000 is ready now! 0: successfully saved checkpoint at iteration 156000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.71 7: iteration 156010/ 173500 | consumed samples: 39938560 | consumed tokens: 81794170880 | elapsed time per iteration (s): 0.09 | learning rate: 2.457E-05 | global batch size: 256 | lm loss: 4.501694E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.505 | TFLOPs: 10.20 | 7: iteration 156020/ 173500 | consumed samples: 39941120 | consumed tokens: 81799413760 | elapsed time per iteration (s): 0.08 | learning rate: 2.456E-05 | global batch size: 256 | lm loss: 4.500100E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.026 | TFLOPs: 11.91 | 7: iteration 156030/ 173500 | consumed samples: 39943680 | consumed tokens: 81804656640 | elapsed time per iteration (s): 0.08 | learning rate: 2.456E-05 | global batch size: 256 | lm loss: 4.494518E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.823 | TFLOPs: 11.41 | 7: iteration 156040/ 173500 | consumed samples: 39946240 | consumed tokens: 81809899520 | elapsed time per iteration (s): 0.08 | learning rate: 2.455E-05 | global batch size: 256 | lm loss: 4.503641E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.000 | TFLOPs: 11.81 | 7: iteration 156050/ 173500 | consumed samples: 39948800 | consumed tokens: 81815142400 | elapsed time per iteration (s): 0.08 | learning rate: 2.455E-05 | global batch size: 256 | lm loss: 4.489183E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3130.153 | TFLOPs: 11.64 | 7: iteration 156060/ 173500 | consumed samples: 39951360 | consumed tokens: 81820385280 | elapsed time per iteration (s): 0.08 | learning rate: 2.454E-05 | global batch size: 256 | lm loss: 4.491508E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.477 | TFLOPs: 11.99 | 7: iteration 156070/ 173500 | consumed samples: 39953920 | consumed tokens: 81825628160 | elapsed time per iteration (s): 0.08 | learning rate: 2.454E-05 | global batch size: 256 | lm loss: 4.518958E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.211 | TFLOPs: 11.22 | 7: iteration 156080/ 173500 | consumed samples: 39956480 | consumed tokens: 81830871040 | elapsed time per iteration (s): 0.08 | learning rate: 2.453E-05 | global batch size: 256 | lm loss: 4.498022E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.058 | TFLOPs: 11.97 | 7: iteration 156090/ 173500 | consumed samples: 39959040 | consumed tokens: 81836113920 | elapsed time per iteration (s): 0.08 | learning rate: 2.452E-05 | global batch size: 256 | lm loss: 4.525193E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.428 | TFLOPs: 12.00 | 7: iteration 156100/ 173500 | consumed samples: 39961600 | consumed tokens: 81841356800 | elapsed time per iteration (s): 0.08 | learning rate: 2.452E-05 | global batch size: 256 | lm loss: 4.509007E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.151 | TFLOPs: 11.94 | 7: iteration 156110/ 173500 | consumed samples: 39964160 | consumed tokens: 81846599680 | elapsed time per iteration (s): 0.08 | learning rate: 2.451E-05 | global batch size: 256 | lm loss: 4.504182E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.468 | TFLOPs: 11.95 | 7: iteration 156120/ 173500 | consumed samples: 39966720 | consumed tokens: 81851842560 | elapsed time per iteration (s): 0.08 | learning rate: 2.451E-05 | global batch size: 256 | lm loss: 4.505825E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.234 | TFLOPs: 11.97 | 7: iteration 156130/ 173500 | consumed samples: 39969280 | consumed tokens: 81857085440 | elapsed time per iteration (s): 0.08 | learning rate: 2.450E-05 | global batch size: 256 | lm loss: 4.524271E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.175 | TFLOPs: 11.87 | 7: iteration 156140/ 173500 | consumed samples: 39971840 | consumed tokens: 81862328320 | elapsed time per iteration (s): 0.08 | learning rate: 2.450E-05 | global batch size: 256 | lm loss: 4.502639E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.872 | TFLOPs: 11.85 | 7: iteration 156150/ 173500 | consumed samples: 39974400 | consumed tokens: 81867571200 | elapsed time per iteration (s): 0.08 | learning rate: 2.449E-05 | global batch size: 256 | lm loss: 4.500685E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.692 | TFLOPs: 11.90 | 7: iteration 156160/ 173500 | consumed samples: 39976960 | consumed tokens: 81872814080 | elapsed time per iteration (s): 0.08 | learning rate: 2.449E-05 | global batch size: 256 | lm loss: 4.506160E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.131 | TFLOPs: 11.82 | 7: iteration 156170/ 173500 | consumed samples: 39979520 | consumed tokens: 81878056960 | elapsed time per iteration (s): 0.08 | learning rate: 2.448E-05 | global batch size: 256 | lm loss: 4.505553E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.809 | TFLOPs: 11.84 | 7: iteration 156180/ 173500 | consumed samples: 39982080 | consumed tokens: 81883299840 | elapsed time per iteration (s): 0.08 | learning rate: 2.448E-05 | global batch size: 256 | lm loss: 4.502718E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.396 | TFLOPs: 11.96 | 7: iteration 156190/ 173500 | consumed samples: 39984640 | consumed tokens: 81888542720 | elapsed time per iteration (s): 0.08 | learning rate: 2.447E-05 | global batch size: 256 | lm loss: 4.502470E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.278 | TFLOPs: 11.92 | 7: iteration 156200/ 173500 | consumed samples: 39987200 | consumed tokens: 81893785600 | elapsed time per iteration (s): 0.08 | learning rate: 2.447E-05 | global batch size: 256 | lm loss: 4.500377E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.023 | TFLOPs: 11.93 | 7: iteration 156210/ 173500 | consumed samples: 39989760 | consumed tokens: 81899028480 | elapsed time per iteration (s): 0.08 | learning rate: 2.446E-05 | global batch size: 256 | lm loss: 4.501840E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3110.883 | TFLOPs: 11.57 | 7: iteration 156220/ 173500 | consumed samples: 39992320 | consumed tokens: 81904271360 | elapsed time per iteration (s): 0.12 | learning rate: 2.446E-05 | global batch size: 256 | lm loss: 4.498946E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.176 | TFLOPs: 7.70 | 7: iteration 156230/ 173500 | consumed samples: 39994880 | consumed tokens: 81909514240 | elapsed time per iteration (s): 0.16 | learning rate: 2.445E-05 | global batch size: 256 | lm loss: 4.489218E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1636.304 | TFLOPs: 6.09 | 7: iteration 156240/ 173500 | consumed samples: 39997440 | consumed tokens: 81914757120 | elapsed time per iteration (s): 0.13 | learning rate: 2.445E-05 | global batch size: 256 | lm loss: 4.510876E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1924.563 | TFLOPs: 7.16 | 7: iteration 156250/ 173500 | consumed samples: 40000000 | consumed tokens: 81920000000 | elapsed time per iteration (s): 0.11 | learning rate: 2.444E-05 | global batch size: 256 | lm loss: 4.495470E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2281.096 | TFLOPs: 8.48 | 7: iteration 156260/ 173500 | consumed samples: 40002560 | consumed tokens: 81925242880 | elapsed time per iteration (s): 0.10 | learning rate: 2.444E-05 | global batch size: 256 | lm loss: 4.506253E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2453.942 | TFLOPs: 9.13 | 7: iteration 156270/ 173500 | consumed samples: 40005120 | consumed tokens: 81930485760 | elapsed time per iteration (s): 0.09 | learning rate: 2.443E-05 | global batch size: 256 | lm loss: 4.506829E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2958.801 | TFLOPs: 11.01 | 7: iteration 156280/ 173500 | consumed samples: 40007680 | consumed tokens: 81935728640 | elapsed time per iteration (s): 0.08 | learning rate: 2.443E-05 | global batch size: 256 | lm loss: 4.515511E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.634 | TFLOPs: 11.85 | 7: iteration 156290/ 173500 | consumed samples: 40010240 | consumed tokens: 81940971520 | elapsed time per iteration (s): 0.09 | learning rate: 2.442E-05 | global batch size: 256 | lm loss: 4.505166E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.124 | TFLOPs: 10.32 | 7: iteration 156300/ 173500 | consumed samples: 40012800 | consumed tokens: 81946214400 | elapsed time per iteration (s): 0.08 | learning rate: 2.442E-05 | global batch size: 256 | lm loss: 4.504657E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.715 | TFLOPs: 11.84 | 7: iteration 156310/ 173500 | consumed samples: 40015360 | consumed tokens: 81951457280 | elapsed time per iteration (s): 0.08 | learning rate: 2.441E-05 | global batch size: 256 | lm loss: 4.505664E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.564 | TFLOPs: 11.88 | 7: iteration 156320/ 173500 | consumed samples: 40017920 | consumed tokens: 81956700160 | elapsed time per iteration (s): 0.08 | learning rate: 2.441E-05 | global batch size: 256 | lm loss: 4.516218E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.523 | TFLOPs: 11.85 | 7: iteration 156330/ 173500 | consumed samples: 40020480 | consumed tokens: 81961943040 | elapsed time per iteration (s): 0.08 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 4.508588E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.596 | TFLOPs: 11.31 | 7: iteration 156340/ 173500 | consumed samples: 40023040 | consumed tokens: 81967185920 | elapsed time per iteration (s): 0.09 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 4.501686E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2748.950 | TFLOPs: 10.22 | 7: iteration 156350/ 173500 | consumed samples: 40025600 | consumed tokens: 81972428800 | elapsed time per iteration (s): 0.09 | learning rate: 2.439E-05 | global batch size: 256 | lm loss: 4.520432E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3003.806 | TFLOPs: 11.17 | 7: iteration 156360/ 173500 | consumed samples: 40028160 | consumed tokens: 81977671680 | elapsed time per iteration (s): 0.09 | learning rate: 2.439E-05 | global batch size: 256 | lm loss: 4.512004E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2946.103 | TFLOPs: 10.96 | 7: iteration 156370/ 173500 | consumed samples: 40030720 | consumed tokens: 81982914560 | elapsed time per iteration (s): 0.08 | learning rate: 2.438E-05 | global batch size: 256 | lm loss: 4.502285E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.882 | TFLOPs: 11.86 | 7: iteration 156380/ 173500 | consumed samples: 40033280 | consumed tokens: 81988157440 | elapsed time per iteration (s): 0.08 | learning rate: 2.438E-05 | global batch size: 256 | lm loss: 4.513526E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.119 | TFLOPs: 11.68 | 7: iteration 156390/ 173500 | consumed samples: 40035840 | consumed tokens: 81993400320 | elapsed time per iteration (s): 0.08 | learning rate: 2.437E-05 | global batch size: 256 | lm loss: 4.504612E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3084.889 | TFLOPs: 11.47 | 7: iteration 156400/ 173500 | consumed samples: 40038400 | consumed tokens: 81998643200 | elapsed time per iteration (s): 0.10 | learning rate: 2.437E-05 | global batch size: 256 | lm loss: 4.495487E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2556.153 | TFLOPs: 9.51 | 7: iteration 156410/ 173500 | consumed samples: 40040960 | consumed tokens: 82003886080 | elapsed time per iteration (s): 0.09 | learning rate: 2.436E-05 | global batch size: 256 | lm loss: 4.495649E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2991.922 | TFLOPs: 11.13 | 7: iteration 156420/ 173500 | consumed samples: 40043520 | consumed tokens: 82009128960 | elapsed time per iteration (s): 0.08 | learning rate: 2.436E-05 | global batch size: 256 | lm loss: 4.516586E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3029.013 | TFLOPs: 11.27 | 7: iteration 156430/ 173500 | consumed samples: 40046080 | consumed tokens: 82014371840 | elapsed time per iteration (s): 0.10 | learning rate: 2.435E-05 | global batch size: 256 | lm loss: 4.496975E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2506.273 | TFLOPs: 9.32 | 7: iteration 156440/ 173500 | consumed samples: 40048640 | consumed tokens: 82019614720 | elapsed time per iteration (s): 0.09 | learning rate: 2.435E-05 | global batch size: 256 | lm loss: 4.493145E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2729.915 | TFLOPs: 10.15 | 7: iteration 156450/ 173500 | consumed samples: 40051200 | consumed tokens: 82024857600 | elapsed time per iteration (s): 0.08 | learning rate: 2.434E-05 | global batch size: 256 | lm loss: 4.501336E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.770 | TFLOPs: 11.85 | 7: iteration 156460/ 173500 | consumed samples: 40053760 | consumed tokens: 82030100480 | elapsed time per iteration (s): 0.08 | learning rate: 2.434E-05 | global batch size: 256 | lm loss: 4.516562E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.192 | TFLOPs: 11.84 | 7: iteration 156470/ 173500 | consumed samples: 40056320 | consumed tokens: 82035343360 | elapsed time per iteration (s): 0.10 | learning rate: 2.433E-05 | global batch size: 256 | lm loss: 4.511111E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2643.217 | TFLOPs: 9.83 | 7: iteration 156480/ 173500 | consumed samples: 40058880 | consumed tokens: 82040586240 | elapsed time per iteration (s): 0.09 | learning rate: 2.433E-05 | global batch size: 256 | lm loss: 4.496460E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2825.184 | TFLOPs: 10.51 | 7: iteration 156490/ 173500 | consumed samples: 40061440 | consumed tokens: 82045829120 | elapsed time per iteration (s): 0.09 | learning rate: 2.432E-05 | global batch size: 256 | lm loss: 4.506283E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2704.759 | TFLOPs: 10.06 | 7: iteration 156500/ 173500 | consumed samples: 40064000 | consumed tokens: 82051072000 | elapsed time per iteration (s): 0.08 | learning rate: 2.432E-05 | global batch size: 256 | lm loss: 4.496645E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.844 | TFLOPs: 11.86 | 7: iteration 156510/ 173500 | consumed samples: 40066560 | consumed tokens: 82056314880 | elapsed time per iteration (s): 0.08 | learning rate: 2.431E-05 | global batch size: 256 | lm loss: 4.508524E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.846 | TFLOPs: 11.68 | 7: iteration 156520/ 173500 | consumed samples: 40069120 | consumed tokens: 82061557760 | elapsed time per iteration (s): 0.08 | learning rate: 2.431E-05 | global batch size: 256 | lm loss: 4.506605E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.122 | TFLOPs: 11.81 | 7: iteration 156530/ 173500 | consumed samples: 40071680 | consumed tokens: 82066800640 | elapsed time per iteration (s): 0.08 | learning rate: 2.430E-05 | global batch size: 256 | lm loss: 4.510102E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.511 | TFLOPs: 11.44 | 7: iteration 156540/ 173500 | consumed samples: 40074240 | consumed tokens: 82072043520 | elapsed time per iteration (s): 0.10 | learning rate: 2.430E-05 | global batch size: 256 | lm loss: 4.513112E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2543.220 | TFLOPs: 9.46 | 7: iteration 156550/ 173500 | consumed samples: 40076800 | consumed tokens: 82077286400 | elapsed time per iteration (s): 0.08 | learning rate: 2.429E-05 | global batch size: 256 | lm loss: 4.514616E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.629 | TFLOPs: 11.59 | 7: iteration 156560/ 173500 | consumed samples: 40079360 | consumed tokens: 82082529280 | elapsed time per iteration (s): 0.08 | learning rate: 2.429E-05 | global batch size: 256 | lm loss: 4.500070E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.799 | TFLOPs: 11.56 | 7: iteration 156570/ 173500 | consumed samples: 40081920 | consumed tokens: 82087772160 | elapsed time per iteration (s): 0.08 | learning rate: 2.428E-05 | global batch size: 256 | lm loss: 4.512498E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.857 | TFLOPs: 11.37 | 7: iteration 156580/ 173500 | consumed samples: 40084480 | consumed tokens: 82093015040 | elapsed time per iteration (s): 0.08 | learning rate: 2.428E-05 | global batch size: 256 | lm loss: 4.511910E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.984 | TFLOPs: 11.34 | 7: iteration 156590/ 173500 | consumed samples: 40087040 | consumed tokens: 82098257920 | elapsed time per iteration (s): 0.08 | learning rate: 2.427E-05 | global batch size: 256 | lm loss: 4.517498E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.294 | TFLOPs: 11.86 | 7: iteration 156600/ 173500 | consumed samples: 40089600 | consumed tokens: 82103500800 | elapsed time per iteration (s): 0.08 | learning rate: 2.427E-05 | global batch size: 256 | lm loss: 4.489040E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.399 | TFLOPs: 11.30 | 7: iteration 156610/ 173500 | consumed samples: 40092160 | consumed tokens: 82108743680 | elapsed time per iteration (s): 0.08 | learning rate: 2.426E-05 | global batch size: 256 | lm loss: 4.518141E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.545 | TFLOPs: 11.23 | 7: iteration 156620/ 173500 | consumed samples: 40094720 | consumed tokens: 82113986560 | elapsed time per iteration (s): 0.09 | learning rate: 2.426E-05 | global batch size: 256 | lm loss: 4.501785E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.112 | TFLOPs: 11.16 | 7: iteration 156630/ 173500 | consumed samples: 40097280 | consumed tokens: 82119229440 | elapsed time per iteration (s): 0.08 | learning rate: 2.425E-05 | global batch size: 256 | lm loss: 4.505725E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.968 | TFLOPs: 11.79 | 7: iteration 156640/ 173500 | consumed samples: 40099840 | consumed tokens: 82124472320 | elapsed time per iteration (s): 0.08 | learning rate: 2.425E-05 | global batch size: 256 | lm loss: 4.505211E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.483 | TFLOPs: 11.80 | 7: iteration 156650/ 173500 | consumed samples: 40102400 | consumed tokens: 82129715200 | elapsed time per iteration (s): 0.10 | learning rate: 2.424E-05 | global batch size: 256 | lm loss: 4.494209E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2464.938 | TFLOPs: 9.17 | 7: iteration 156660/ 173500 | consumed samples: 40104960 | consumed tokens: 82134958080 | elapsed time per iteration (s): 0.08 | learning rate: 2.424E-05 | global batch size: 256 | lm loss: 4.514890E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.235 | TFLOPs: 11.74 | 7: iteration 156670/ 173500 | consumed samples: 40107520 | consumed tokens: 82140200960 | elapsed time per iteration (s): 0.08 | learning rate: 2.423E-05 | global batch size: 256 | lm loss: 4.504736E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.870 | TFLOPs: 11.75 | 7: iteration 156680/ 173500 | consumed samples: 40110080 | consumed tokens: 82145443840 | elapsed time per iteration (s): 0.08 | learning rate: 2.423E-05 | global batch size: 256 | lm loss: 4.505472E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.321 | TFLOPs: 11.78 | 7: iteration 156690/ 173500 | consumed samples: 40112640 | consumed tokens: 82150686720 | elapsed time per iteration (s): 0.08 | learning rate: 2.422E-05 | global batch size: 256 | lm loss: 4.498643E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.438 | TFLOPs: 11.67 | 7: iteration 156700/ 173500 | consumed samples: 40115200 | consumed tokens: 82155929600 | elapsed time per iteration (s): 0.08 | learning rate: 2.422E-05 | global batch size: 256 | lm loss: 4.502042E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.709 | TFLOPs: 11.55 | 7: iteration 156710/ 173500 | consumed samples: 40117760 | consumed tokens: 82161172480 | elapsed time per iteration (s): 0.08 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 4.506986E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.612 | TFLOPs: 11.73 | 7: iteration 156720/ 173500 | consumed samples: 40120320 | consumed tokens: 82166415360 | elapsed time per iteration (s): 0.10 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 4.511806E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.427 | TFLOPs: 9.72 | 7: iteration 156730/ 173500 | consumed samples: 40122880 | consumed tokens: 82171658240 | elapsed time per iteration (s): 0.09 | learning rate: 2.420E-05 | global batch size: 256 | lm loss: 4.516013E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2706.257 | TFLOPs: 10.07 | 7: iteration 156740/ 173500 | consumed samples: 40125440 | consumed tokens: 82176901120 | elapsed time per iteration (s): 0.08 | learning rate: 2.420E-05 | global batch size: 256 | lm loss: 4.511000E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.373 | TFLOPs: 11.79 | 7: iteration 156750/ 173500 | consumed samples: 40128000 | consumed tokens: 82182144000 | elapsed time per iteration (s): 0.08 | learning rate: 2.419E-05 | global batch size: 256 | lm loss: 4.496833E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.834 | TFLOPs: 11.82 | 7: iteration 156760/ 173500 | consumed samples: 40130560 | consumed tokens: 82187386880 | elapsed time per iteration (s): 0.08 | learning rate: 2.419E-05 | global batch size: 256 | lm loss: 4.523834E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.530 | TFLOPs: 11.82 | 7: iteration 156770/ 173500 | consumed samples: 40133120 | consumed tokens: 82192629760 | elapsed time per iteration (s): 0.08 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 4.497335E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.976 | TFLOPs: 11.29 | 7: iteration 156780/ 173500 | consumed samples: 40135680 | consumed tokens: 82197872640 | elapsed time per iteration (s): 0.08 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 4.504176E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.294 | TFLOPs: 11.44 | 7: iteration 156790/ 173500 | consumed samples: 40138240 | consumed tokens: 82203115520 | elapsed time per iteration (s): 0.08 | learning rate: 2.417E-05 | global batch size: 256 | lm loss: 4.519682E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.040 | TFLOPs: 11.64 | 7: iteration 156800/ 173500 | consumed samples: 40140800 | consumed tokens: 82208358400 | elapsed time per iteration (s): 0.08 | learning rate: 2.417E-05 | global batch size: 256 | lm loss: 4.496290E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.790 | TFLOPs: 11.81 | 7: iteration 156810/ 173500 | consumed samples: 40143360 | consumed tokens: 82213601280 | elapsed time per iteration (s): 0.08 | learning rate: 2.416E-05 | global batch size: 256 | lm loss: 4.505570E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3040.910 | TFLOPs: 11.31 | 7: iteration 156820/ 173500 | consumed samples: 40145920 | consumed tokens: 82218844160 | elapsed time per iteration (s): 0.09 | learning rate: 2.416E-05 | global batch size: 256 | lm loss: 4.494334E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2820.719 | TFLOPs: 10.49 | 7: iteration 156830/ 173500 | consumed samples: 40148480 | consumed tokens: 82224087040 | elapsed time per iteration (s): 0.08 | learning rate: 2.415E-05 | global batch size: 256 | lm loss: 4.506174E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.565 | TFLOPs: 11.65 | 7: iteration 156840/ 173500 | consumed samples: 40151040 | consumed tokens: 82229329920 | elapsed time per iteration (s): 0.08 | learning rate: 2.415E-05 | global batch size: 256 | lm loss: 4.509268E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.545 | TFLOPs: 11.65 | 7: iteration 156850/ 173500 | consumed samples: 40153600 | consumed tokens: 82234572800 | elapsed time per iteration (s): 0.08 | learning rate: 2.414E-05 | global batch size: 256 | lm loss: 4.518110E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.698 | TFLOPs: 11.31 | 7: iteration 156860/ 173500 | consumed samples: 40156160 | consumed tokens: 82239815680 | elapsed time per iteration (s): 0.08 | learning rate: 2.414E-05 | global batch size: 256 | lm loss: 4.509644E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3043.740 | TFLOPs: 11.32 | 7: iteration 156870/ 173500 | consumed samples: 40158720 | consumed tokens: 82245058560 | elapsed time per iteration (s): 0.08 | learning rate: 2.413E-05 | global batch size: 256 | lm loss: 4.502294E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.041 | TFLOPs: 11.55 | 7: iteration 156880/ 173500 | consumed samples: 40161280 | consumed tokens: 82250301440 | elapsed time per iteration (s): 0.09 | learning rate: 2.413E-05 | global batch size: 256 | lm loss: 4.501734E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2862.638 | TFLOPs: 10.65 | 7: iteration 156890/ 173500 | consumed samples: 40163840 | consumed tokens: 82255544320 | elapsed time per iteration (s): 0.08 | learning rate: 2.412E-05 | global batch size: 256 | lm loss: 4.511688E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.278 | TFLOPs: 11.89 | 7: iteration 156900/ 173500 | consumed samples: 40166400 | consumed tokens: 82260787200 | elapsed time per iteration (s): 0.09 | learning rate: 2.412E-05 | global batch size: 256 | lm loss: 4.507835E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2768.367 | TFLOPs: 10.30 | 7: iteration 156910/ 173500 | consumed samples: 40168960 | consumed tokens: 82266030080 | elapsed time per iteration (s): 0.08 | learning rate: 2.411E-05 | global batch size: 256 | lm loss: 4.507097E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.488 | TFLOPs: 11.93 | 7: iteration 156920/ 173500 | consumed samples: 40171520 | consumed tokens: 82271272960 | elapsed time per iteration (s): 0.08 | learning rate: 2.411E-05 | global batch size: 256 | lm loss: 4.490956E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.674 | TFLOPs: 11.89 | 7: iteration 156930/ 173500 | consumed samples: 40174080 | consumed tokens: 82276515840 | elapsed time per iteration (s): 0.10 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 4.493412E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2649.589 | TFLOPs: 9.86 | 7: iteration 156940/ 173500 | consumed samples: 40176640 | consumed tokens: 82281758720 | elapsed time per iteration (s): 0.08 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 4.500979E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.892 | TFLOPs: 12.00 | 7: iteration 156950/ 173500 | consumed samples: 40179200 | consumed tokens: 82287001600 | elapsed time per iteration (s): 0.08 | learning rate: 2.409E-05 | global batch size: 256 | lm loss: 4.495262E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.095 | TFLOPs: 11.91 | 7: iteration 156960/ 173500 | consumed samples: 40181760 | consumed tokens: 82292244480 | elapsed time per iteration (s): 0.09 | learning rate: 2.409E-05 | global batch size: 256 | lm loss: 4.501664E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2898.261 | TFLOPs: 10.78 | 7: iteration 156970/ 173500 | consumed samples: 40184320 | consumed tokens: 82297487360 | elapsed time per iteration (s): 0.08 | learning rate: 2.408E-05 | global batch size: 256 | lm loss: 4.514413E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.230 | TFLOPs: 11.61 | 7: iteration 156980/ 173500 | consumed samples: 40186880 | consumed tokens: 82302730240 | elapsed time per iteration (s): 0.08 | learning rate: 2.408E-05 | global batch size: 256 | lm loss: 4.514871E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.836 | TFLOPs: 11.89 | 7: iteration 156990/ 173500 | consumed samples: 40189440 | consumed tokens: 82307973120 | elapsed time per iteration (s): 0.08 | learning rate: 2.407E-05 | global batch size: 256 | lm loss: 4.506669E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.365 | TFLOPs: 11.92 | 7: iteration 157000/ 173500 | consumed samples: 40192000 | consumed tokens: 82313216000 | elapsed time per iteration (s): 0.09 | learning rate: 2.407E-05 | global batch size: 256 | lm loss: 4.494236E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2838.390 | TFLOPs: 10.56 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 157000 | lm loss value: 4.378662E+00 | lm loss PPL: 7.973129E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 157000 to checkpoints_14m91b100m 0: [2023-03-17 04:05:43,606] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step157000 is begin to save! 0: [2023-03-17 04:05:43,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:05:43,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:05:43,635] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:05:43,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:05:43,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:05:43,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:05:43,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:05:43,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:05:43,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:05:43,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:05:43,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:05:43,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:05:43,649] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step157000/mp_rank_00_model_states.pt 0: [2023-03-17 04:05:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:05:43,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:05:43,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:05:43,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 6: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 3: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 2: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 1: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:05:43,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 7: [2023-03-17 04:05:43,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:05:43,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 4: [2023-03-17 04:05:43,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:05:43,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:05:43,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 5: [2023-03-17 04:05:43,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:05:43,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step157000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:05:43,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step157000 is ready now! 0: successfully saved checkpoint at iteration 157000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.68 7: iteration 157010/ 173500 | consumed samples: 40194560 | consumed tokens: 82318458880 | elapsed time per iteration (s): 0.09 | learning rate: 2.406E-05 | global batch size: 256 | lm loss: 4.500376E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2765.062 | TFLOPs: 10.28 | 7: iteration 157020/ 173500 | consumed samples: 40197120 | consumed tokens: 82323701760 | elapsed time per iteration (s): 0.08 | learning rate: 2.406E-05 | global batch size: 256 | lm loss: 4.490228E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.299 | TFLOPs: 11.89 | 7: iteration 157030/ 173500 | consumed samples: 40199680 | consumed tokens: 82328944640 | elapsed time per iteration (s): 0.08 | learning rate: 2.405E-05 | global batch size: 256 | lm loss: 4.507069E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.790 | TFLOPs: 11.79 | 7: iteration 157040/ 173500 | consumed samples: 40202240 | consumed tokens: 82334187520 | elapsed time per iteration (s): 0.08 | learning rate: 2.405E-05 | global batch size: 256 | lm loss: 4.503017E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.283 | TFLOPs: 11.98 | 7: iteration 157050/ 173500 | consumed samples: 40204800 | consumed tokens: 82339430400 | elapsed time per iteration (s): 0.08 | learning rate: 2.404E-05 | global batch size: 256 | lm loss: 4.509248E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.753 | TFLOPs: 11.88 | 7: iteration 157060/ 173500 | consumed samples: 40207360 | consumed tokens: 82344673280 | elapsed time per iteration (s): 0.08 | learning rate: 2.404E-05 | global batch size: 256 | lm loss: 4.515212E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.243 | TFLOPs: 11.84 | 7: iteration 157070/ 173500 | consumed samples: 40209920 | consumed tokens: 82349916160 | elapsed time per iteration (s): 0.09 | learning rate: 2.403E-05 | global batch size: 256 | lm loss: 4.499920E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2882.835 | TFLOPs: 10.72 | 7: iteration 157080/ 173500 | consumed samples: 40212480 | consumed tokens: 82355159040 | elapsed time per iteration (s): 0.08 | learning rate: 2.403E-05 | global batch size: 256 | lm loss: 4.510480E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.733 | TFLOPs: 11.96 | 7: iteration 157090/ 173500 | consumed samples: 40215040 | consumed tokens: 82360401920 | elapsed time per iteration (s): 0.09 | learning rate: 2.402E-05 | global batch size: 256 | lm loss: 4.501011E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2862.363 | TFLOPs: 10.65 | 7: iteration 157100/ 173500 | consumed samples: 40217600 | consumed tokens: 82365644800 | elapsed time per iteration (s): 0.13 | learning rate: 2.402E-05 | global batch size: 256 | lm loss: 4.506922E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2038.417 | TFLOPs: 7.58 | 7: iteration 157110/ 173500 | consumed samples: 40220160 | consumed tokens: 82370887680 | elapsed time per iteration (s): 0.08 | learning rate: 2.401E-05 | global batch size: 256 | lm loss: 4.500365E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.041 | TFLOPs: 11.90 | 7: iteration 157120/ 173500 | consumed samples: 40222720 | consumed tokens: 82376130560 | elapsed time per iteration (s): 0.08 | learning rate: 2.401E-05 | global batch size: 256 | lm loss: 4.522090E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.874 | TFLOPs: 11.98 | 7: iteration 157130/ 173500 | consumed samples: 40225280 | consumed tokens: 82381373440 | elapsed time per iteration (s): 0.08 | learning rate: 2.400E-05 | global batch size: 256 | lm loss: 4.503756E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.844 | TFLOPs: 11.87 | 7: iteration 157140/ 173500 | consumed samples: 40227840 | consumed tokens: 82386616320 | elapsed time per iteration (s): 0.08 | learning rate: 2.400E-05 | global batch size: 256 | lm loss: 4.512123E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.280 | TFLOPs: 11.97 | 7: iteration 157150/ 173500 | consumed samples: 40230400 | consumed tokens: 82391859200 | elapsed time per iteration (s): 0.08 | learning rate: 2.399E-05 | global batch size: 256 | lm loss: 4.490035E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.736 | TFLOPs: 11.95 | 7: iteration 157160/ 173500 | consumed samples: 40232960 | consumed tokens: 82397102080 | elapsed time per iteration (s): 0.08 | learning rate: 2.399E-05 | global batch size: 256 | lm loss: 4.491260E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.321 | TFLOPs: 11.96 | 7: iteration 157170/ 173500 | consumed samples: 40235520 | consumed tokens: 82402344960 | elapsed time per iteration (s): 0.08 | learning rate: 2.398E-05 | global batch size: 256 | lm loss: 4.497558E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.753 | TFLOPs: 11.69 | 7: iteration 157180/ 173500 | consumed samples: 40238080 | consumed tokens: 82407587840 | elapsed time per iteration (s): 0.08 | learning rate: 2.398E-05 | global batch size: 256 | lm loss: 4.497932E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.949 | TFLOPs: 11.87 | 7: iteration 157190/ 173500 | consumed samples: 40240640 | consumed tokens: 82412830720 | elapsed time per iteration (s): 0.08 | learning rate: 2.398E-05 | global batch size: 256 | lm loss: 4.512007E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.907 | TFLOPs: 11.44 | 7: iteration 157200/ 173500 | consumed samples: 40243200 | consumed tokens: 82418073600 | elapsed time per iteration (s): 0.08 | learning rate: 2.397E-05 | global batch size: 256 | lm loss: 4.492825E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.756 | TFLOPs: 11.99 | 7: iteration 157210/ 173500 | consumed samples: 40245760 | consumed tokens: 82423316480 | elapsed time per iteration (s): 0.08 | learning rate: 2.397E-05 | global batch size: 256 | lm loss: 4.500196E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.412 | TFLOPs: 11.99 | 7: iteration 157220/ 173500 | consumed samples: 40248320 | consumed tokens: 82428559360 | elapsed time per iteration (s): 0.08 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 4.498633E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.896 | TFLOPs: 12.00 | 7: iteration 157230/ 173500 | consumed samples: 40250880 | consumed tokens: 82433802240 | elapsed time per iteration (s): 0.08 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 4.502959E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.090 | TFLOPs: 11.72 | 7: iteration 157240/ 173500 | consumed samples: 40253440 | consumed tokens: 82439045120 | elapsed time per iteration (s): 0.08 | learning rate: 2.395E-05 | global batch size: 256 | lm loss: 4.506074E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.693 | TFLOPs: 11.71 | 7: iteration 157250/ 173500 | consumed samples: 40256000 | consumed tokens: 82444288000 | elapsed time per iteration (s): 0.08 | learning rate: 2.395E-05 | global batch size: 256 | lm loss: 4.515635E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3073.119 | TFLOPs: 11.43 | 7: iteration 157260/ 173500 | consumed samples: 40258560 | consumed tokens: 82449530880 | elapsed time per iteration (s): 0.08 | learning rate: 2.394E-05 | global batch size: 256 | lm loss: 4.495542E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.242 | TFLOPs: 12.02 | 7: iteration 157270/ 173500 | consumed samples: 40261120 | consumed tokens: 82454773760 | elapsed time per iteration (s): 0.08 | learning rate: 2.394E-05 | global batch size: 256 | lm loss: 4.496552E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.159 | TFLOPs: 11.98 | 7: iteration 157280/ 173500 | consumed samples: 40263680 | consumed tokens: 82460016640 | elapsed time per iteration (s): 0.08 | learning rate: 2.393E-05 | global batch size: 256 | lm loss: 4.488404E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.919 | TFLOPs: 12.00 | 7: iteration 157290/ 173500 | consumed samples: 40266240 | consumed tokens: 82465259520 | elapsed time per iteration (s): 0.08 | learning rate: 2.393E-05 | global batch size: 256 | lm loss: 4.502595E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.803 | TFLOPs: 11.92 | 7: iteration 157300/ 173500 | consumed samples: 40268800 | consumed tokens: 82470502400 | elapsed time per iteration (s): 0.08 | learning rate: 2.392E-05 | global batch size: 256 | lm loss: 4.504499E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.130 | TFLOPs: 12.01 | 7: iteration 157310/ 173500 | consumed samples: 40271360 | consumed tokens: 82475745280 | elapsed time per iteration (s): 0.08 | learning rate: 2.392E-05 | global batch size: 256 | lm loss: 4.504845E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.137 | TFLOPs: 12.00 | 7: iteration 157320/ 173500 | consumed samples: 40273920 | consumed tokens: 82480988160 | elapsed time per iteration (s): 0.08 | learning rate: 2.391E-05 | global batch size: 256 | lm loss: 4.494969E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.487 | TFLOPs: 12.01 | 7: iteration 157330/ 173500 | consumed samples: 40276480 | consumed tokens: 82486231040 | elapsed time per iteration (s): 0.08 | learning rate: 2.391E-05 | global batch size: 256 | lm loss: 4.494417E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.706 | TFLOPs: 11.99 | 7: iteration 157340/ 173500 | consumed samples: 40279040 | consumed tokens: 82491473920 | elapsed time per iteration (s): 0.08 | learning rate: 2.390E-05 | global batch size: 256 | lm loss: 4.512384E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.502 | TFLOPs: 11.95 | 7: iteration 157350/ 173500 | consumed samples: 40281600 | consumed tokens: 82496716800 | elapsed time per iteration (s): 0.08 | learning rate: 2.390E-05 | global batch size: 256 | lm loss: 4.511969E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.021 | TFLOPs: 11.79 | 7: iteration 157360/ 173500 | consumed samples: 40284160 | consumed tokens: 82501959680 | elapsed time per iteration (s): 0.08 | learning rate: 2.389E-05 | global batch size: 256 | lm loss: 4.508397E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.470 | TFLOPs: 11.83 | 7: iteration 157370/ 173500 | consumed samples: 40286720 | consumed tokens: 82507202560 | elapsed time per iteration (s): 0.08 | learning rate: 2.389E-05 | global batch size: 256 | lm loss: 4.512000E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.652 | TFLOPs: 11.70 | 7: iteration 157380/ 173500 | consumed samples: 40289280 | consumed tokens: 82512445440 | elapsed time per iteration (s): 0.08 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 4.504818E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.740 | TFLOPs: 11.81 | 7: iteration 157390/ 173500 | consumed samples: 40291840 | consumed tokens: 82517688320 | elapsed time per iteration (s): 0.08 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 4.501552E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.083 | TFLOPs: 11.93 | 7: iteration 157400/ 173500 | consumed samples: 40294400 | consumed tokens: 82522931200 | elapsed time per iteration (s): 0.08 | learning rate: 2.387E-05 | global batch size: 256 | lm loss: 4.497831E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.996 | TFLOPs: 11.62 | 7: iteration 157410/ 173500 | consumed samples: 40296960 | consumed tokens: 82528174080 | elapsed time per iteration (s): 0.08 | learning rate: 2.387E-05 | global batch size: 256 | lm loss: 4.513096E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.116 | TFLOPs: 11.89 | 7: iteration 157420/ 173500 | consumed samples: 40299520 | consumed tokens: 82533416960 | elapsed time per iteration (s): 0.08 | learning rate: 2.386E-05 | global batch size: 256 | lm loss: 4.492492E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.414 | TFLOPs: 11.95 | 7: iteration 157430/ 173500 | consumed samples: 40302080 | consumed tokens: 82538659840 | elapsed time per iteration (s): 0.08 | learning rate: 2.386E-05 | global batch size: 256 | lm loss: 4.504430E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.533 | TFLOPs: 11.86 | 7: iteration 157440/ 173500 | consumed samples: 40304640 | consumed tokens: 82543902720 | elapsed time per iteration (s): 0.08 | learning rate: 2.386E-05 | global batch size: 256 | lm loss: 4.511819E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.376 | TFLOPs: 11.71 | 7: iteration 157450/ 173500 | consumed samples: 40307200 | consumed tokens: 82549145600 | elapsed time per iteration (s): 0.08 | learning rate: 2.385E-05 | global batch size: 256 | lm loss: 4.504022E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.121 | TFLOPs: 11.87 | 7: iteration 157460/ 173500 | consumed samples: 40309760 | consumed tokens: 82554388480 | elapsed time per iteration (s): 0.08 | learning rate: 2.385E-05 | global batch size: 256 | lm loss: 4.497322E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.920 | TFLOPs: 11.83 | 7: iteration 157470/ 173500 | consumed samples: 40312320 | consumed tokens: 82559631360 | elapsed time per iteration (s): 0.09 | learning rate: 2.384E-05 | global batch size: 256 | lm loss: 4.510923E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.400 | TFLOPs: 10.99 | 7: iteration 157480/ 173500 | consumed samples: 40314880 | consumed tokens: 82564874240 | elapsed time per iteration (s): 0.09 | learning rate: 2.384E-05 | global batch size: 256 | lm loss: 4.508543E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2835.945 | TFLOPs: 10.55 | 7: iteration 157490/ 173500 | consumed samples: 40317440 | consumed tokens: 82570117120 | elapsed time per iteration (s): 0.08 | learning rate: 2.383E-05 | global batch size: 256 | lm loss: 4.493336E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.185 | TFLOPs: 11.67 | 7: iteration 157500/ 173500 | consumed samples: 40320000 | consumed tokens: 82575360000 | elapsed time per iteration (s): 0.10 | learning rate: 2.383E-05 | global batch size: 256 | lm loss: 4.502494E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2619.247 | TFLOPs: 9.74 | 7: iteration 157510/ 173500 | consumed samples: 40322560 | consumed tokens: 82580602880 | elapsed time per iteration (s): 0.13 | learning rate: 2.382E-05 | global batch size: 256 | lm loss: 4.514066E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.475 | TFLOPs: 7.39 | 7: iteration 157520/ 173500 | consumed samples: 40325120 | consumed tokens: 82585845760 | elapsed time per iteration (s): 0.13 | learning rate: 2.382E-05 | global batch size: 256 | lm loss: 4.515057E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.477 | TFLOPs: 7.37 | 7: iteration 157530/ 173500 | consumed samples: 40327680 | consumed tokens: 82591088640 | elapsed time per iteration (s): 0.13 | learning rate: 2.381E-05 | global batch size: 256 | lm loss: 4.499030E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2005.754 | TFLOPs: 7.46 | 7: iteration 157540/ 173500 | consumed samples: 40330240 | consumed tokens: 82596331520 | elapsed time per iteration (s): 0.13 | learning rate: 2.381E-05 | global batch size: 256 | lm loss: 4.501672E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1981.857 | TFLOPs: 7.37 | 7: iteration 157550/ 173500 | consumed samples: 40332800 | consumed tokens: 82601574400 | elapsed time per iteration (s): 0.13 | learning rate: 2.380E-05 | global batch size: 256 | lm loss: 4.510877E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.527 | TFLOPs: 7.39 | 7: iteration 157560/ 173500 | consumed samples: 40335360 | consumed tokens: 82606817280 | elapsed time per iteration (s): 0.13 | learning rate: 2.380E-05 | global batch size: 256 | lm loss: 4.505921E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1957.107 | TFLOPs: 7.28 | 7: iteration 157570/ 173500 | consumed samples: 40337920 | consumed tokens: 82612060160 | elapsed time per iteration (s): 0.10 | learning rate: 2.379E-05 | global batch size: 256 | lm loss: 4.512809E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.177 | TFLOPs: 9.19 | 7: iteration 157580/ 173500 | consumed samples: 40340480 | consumed tokens: 82617303040 | elapsed time per iteration (s): 0.08 | learning rate: 2.379E-05 | global batch size: 256 | lm loss: 4.505035E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.528 | TFLOPs: 11.95 | 7: iteration 157590/ 173500 | consumed samples: 40343040 | consumed tokens: 82622545920 | elapsed time per iteration (s): 0.08 | learning rate: 2.378E-05 | global batch size: 256 | lm loss: 4.492266E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.624 | TFLOPs: 11.94 | 7: iteration 157600/ 173500 | consumed samples: 40345600 | consumed tokens: 82627788800 | elapsed time per iteration (s): 0.09 | learning rate: 2.378E-05 | global batch size: 256 | lm loss: 4.510557E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.024 | TFLOPs: 10.70 | 7: iteration 157610/ 173500 | consumed samples: 40348160 | consumed tokens: 82633031680 | elapsed time per iteration (s): 0.10 | learning rate: 2.377E-05 | global batch size: 256 | lm loss: 4.497303E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.080 | TFLOPs: 9.96 | 7: iteration 157620/ 173500 | consumed samples: 40350720 | consumed tokens: 82638274560 | elapsed time per iteration (s): 0.10 | learning rate: 2.377E-05 | global batch size: 256 | lm loss: 4.506552E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.710 | TFLOPs: 9.92 | 7: iteration 157630/ 173500 | consumed samples: 40353280 | consumed tokens: 82643517440 | elapsed time per iteration (s): 0.09 | learning rate: 2.377E-05 | global batch size: 256 | lm loss: 4.512842E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2701.230 | TFLOPs: 10.05 | 7: iteration 157640/ 173500 | consumed samples: 40355840 | consumed tokens: 82648760320 | elapsed time per iteration (s): 0.10 | learning rate: 2.376E-05 | global batch size: 256 | lm loss: 4.505358E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2665.951 | TFLOPs: 9.92 | 7: iteration 157650/ 173500 | consumed samples: 40358400 | consumed tokens: 82654003200 | elapsed time per iteration (s): 0.10 | learning rate: 2.376E-05 | global batch size: 256 | lm loss: 4.510898E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.982 | TFLOPs: 9.96 | 7: iteration 157660/ 173500 | consumed samples: 40360960 | consumed tokens: 82659246080 | elapsed time per iteration (s): 0.10 | learning rate: 2.375E-05 | global batch size: 256 | lm loss: 4.501102E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.970 | TFLOPs: 10.00 | 7: iteration 157670/ 173500 | consumed samples: 40363520 | consumed tokens: 82664488960 | elapsed time per iteration (s): 0.10 | learning rate: 2.375E-05 | global batch size: 256 | lm loss: 4.496160E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.840 | TFLOPs: 9.84 | 7: iteration 157680/ 173500 | consumed samples: 40366080 | consumed tokens: 82669731840 | elapsed time per iteration (s): 0.31 | learning rate: 2.374E-05 | global batch size: 256 | lm loss: 4.492676E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 831.089 | TFLOPs: 3.09 | 7: iteration 157690/ 173500 | consumed samples: 40368640 | consumed tokens: 82674974720 | elapsed time per iteration (s): 0.10 | learning rate: 2.374E-05 | global batch size: 256 | lm loss: 4.516042E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.085 | TFLOPs: 10.00 | 7: iteration 157700/ 173500 | consumed samples: 40371200 | consumed tokens: 82680217600 | elapsed time per iteration (s): 0.10 | learning rate: 2.373E-05 | global batch size: 256 | lm loss: 4.510717E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2634.636 | TFLOPs: 9.80 | 7: iteration 157710/ 173500 | consumed samples: 40373760 | consumed tokens: 82685460480 | elapsed time per iteration (s): 0.10 | learning rate: 2.373E-05 | global batch size: 256 | lm loss: 4.508813E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2623.128 | TFLOPs: 9.76 | 7: iteration 157720/ 173500 | consumed samples: 40376320 | consumed tokens: 82690703360 | elapsed time per iteration (s): 0.10 | learning rate: 2.372E-05 | global batch size: 256 | lm loss: 4.498412E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.130 | TFLOPs: 10.00 | 7: iteration 157730/ 173500 | consumed samples: 40378880 | consumed tokens: 82695946240 | elapsed time per iteration (s): 0.09 | learning rate: 2.372E-05 | global batch size: 256 | lm loss: 4.516777E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.590 | TFLOPs: 10.26 | 7: iteration 157740/ 173500 | consumed samples: 40381440 | consumed tokens: 82701189120 | elapsed time per iteration (s): 0.10 | learning rate: 2.371E-05 | global batch size: 256 | lm loss: 4.503925E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2655.459 | TFLOPs: 9.88 | 7: iteration 157750/ 173500 | consumed samples: 40384000 | consumed tokens: 82706432000 | elapsed time per iteration (s): 0.10 | learning rate: 2.371E-05 | global batch size: 256 | lm loss: 4.507505E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.996 | TFLOPs: 9.96 | 7: iteration 157760/ 173500 | consumed samples: 40386560 | consumed tokens: 82711674880 | elapsed time per iteration (s): 0.10 | learning rate: 2.370E-05 | global batch size: 256 | lm loss: 4.500032E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2670.213 | TFLOPs: 9.93 | 7: iteration 157770/ 173500 | consumed samples: 40389120 | consumed tokens: 82716917760 | elapsed time per iteration (s): 0.10 | learning rate: 2.370E-05 | global batch size: 256 | lm loss: 4.488056E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2674.472 | TFLOPs: 9.95 | 7: iteration 157780/ 173500 | consumed samples: 40391680 | consumed tokens: 82722160640 | elapsed time per iteration (s): 0.10 | learning rate: 2.369E-05 | global batch size: 256 | lm loss: 4.503606E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.759 | TFLOPs: 9.80 | 7: iteration 157790/ 173500 | consumed samples: 40394240 | consumed tokens: 82727403520 | elapsed time per iteration (s): 0.10 | learning rate: 2.369E-05 | global batch size: 256 | lm loss: 4.510365E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.066 | TFLOPs: 10.00 | 7: iteration 157800/ 173500 | consumed samples: 40396800 | consumed tokens: 82732646400 | elapsed time per iteration (s): 0.09 | learning rate: 2.369E-05 | global batch size: 256 | lm loss: 4.494675E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.416 | TFLOPs: 10.04 | 7: iteration 157810/ 173500 | consumed samples: 40399360 | consumed tokens: 82737889280 | elapsed time per iteration (s): 0.10 | learning rate: 2.368E-05 | global batch size: 256 | lm loss: 4.501785E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.949 | TFLOPs: 9.96 | 7: iteration 157820/ 173500 | consumed samples: 40401920 | consumed tokens: 82743132160 | elapsed time per iteration (s): 0.09 | learning rate: 2.368E-05 | global batch size: 256 | lm loss: 4.518971E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2910.923 | TFLOPs: 10.83 | 7: iteration 157830/ 173500 | consumed samples: 40404480 | consumed tokens: 82748375040 | elapsed time per iteration (s): 0.10 | learning rate: 2.367E-05 | global batch size: 256 | lm loss: 4.515605E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2508.965 | TFLOPs: 9.33 | 7: iteration 157840/ 173500 | consumed samples: 40407040 | consumed tokens: 82753617920 | elapsed time per iteration (s): 0.11 | learning rate: 2.367E-05 | global batch size: 256 | lm loss: 4.501622E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2319.790 | TFLOPs: 8.63 | 7: iteration 157850/ 173500 | consumed samples: 40409600 | consumed tokens: 82758860800 | elapsed time per iteration (s): 0.08 | learning rate: 2.366E-05 | global batch size: 256 | lm loss: 4.501027E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.158 | TFLOPs: 11.83 | 7: iteration 157860/ 173500 | consumed samples: 40412160 | consumed tokens: 82764103680 | elapsed time per iteration (s): 0.08 | learning rate: 2.366E-05 | global batch size: 256 | lm loss: 4.506396E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.056 | TFLOPs: 11.82 | 7: iteration 157870/ 173500 | consumed samples: 40414720 | consumed tokens: 82769346560 | elapsed time per iteration (s): 0.08 | learning rate: 2.365E-05 | global batch size: 256 | lm loss: 4.522344E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.152 | TFLOPs: 11.78 | 7: iteration 157880/ 173500 | consumed samples: 40417280 | consumed tokens: 82774589440 | elapsed time per iteration (s): 0.08 | learning rate: 2.365E-05 | global batch size: 256 | lm loss: 4.507217E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.022 | TFLOPs: 11.79 | 7: iteration 157890/ 173500 | consumed samples: 40419840 | consumed tokens: 82779832320 | elapsed time per iteration (s): 0.08 | learning rate: 2.364E-05 | global batch size: 256 | lm loss: 4.515033E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.891 | TFLOPs: 11.82 | 7: iteration 157900/ 173500 | consumed samples: 40422400 | consumed tokens: 82785075200 | elapsed time per iteration (s): 0.08 | learning rate: 2.364E-05 | global batch size: 256 | lm loss: 4.516181E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.623 | TFLOPs: 11.90 | 7: iteration 157910/ 173500 | consumed samples: 40424960 | consumed tokens: 82790318080 | elapsed time per iteration (s): 0.09 | learning rate: 2.363E-05 | global batch size: 256 | lm loss: 4.500279E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2820.891 | TFLOPs: 10.49 | 7: iteration 157920/ 173500 | consumed samples: 40427520 | consumed tokens: 82795560960 | elapsed time per iteration (s): 0.10 | learning rate: 2.363E-05 | global batch size: 256 | lm loss: 4.507918E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2644.798 | TFLOPs: 9.84 | 7: iteration 157930/ 173500 | consumed samples: 40430080 | consumed tokens: 82800803840 | elapsed time per iteration (s): 0.10 | learning rate: 2.363E-05 | global batch size: 256 | lm loss: 4.508549E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.273 | TFLOPs: 9.72 | 7: iteration 157940/ 173500 | consumed samples: 40432640 | consumed tokens: 82806046720 | elapsed time per iteration (s): 0.10 | learning rate: 2.362E-05 | global batch size: 256 | lm loss: 4.505698E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2560.129 | TFLOPs: 9.52 | 7: iteration 157950/ 173500 | consumed samples: 40435200 | consumed tokens: 82811289600 | elapsed time per iteration (s): 0.10 | learning rate: 2.362E-05 | global batch size: 256 | lm loss: 4.520259E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.124 | TFLOPs: 9.26 | 7: iteration 157960/ 173500 | consumed samples: 40437760 | consumed tokens: 82816532480 | elapsed time per iteration (s): 0.10 | learning rate: 2.361E-05 | global batch size: 256 | lm loss: 4.496079E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2529.851 | TFLOPs: 9.41 | 7: iteration 157970/ 173500 | consumed samples: 40440320 | consumed tokens: 82821775360 | elapsed time per iteration (s): 0.10 | learning rate: 2.361E-05 | global batch size: 256 | lm loss: 4.500866E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2612.306 | TFLOPs: 9.72 | 7: iteration 157980/ 173500 | consumed samples: 40442880 | consumed tokens: 82827018240 | elapsed time per iteration (s): 0.10 | learning rate: 2.360E-05 | global batch size: 256 | lm loss: 4.504576E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2691.343 | TFLOPs: 10.01 | 7: iteration 157990/ 173500 | consumed samples: 40445440 | consumed tokens: 82832261120 | elapsed time per iteration (s): 0.15 | learning rate: 2.360E-05 | global batch size: 256 | lm loss: 4.489252E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1732.280 | TFLOPs: 6.44 | 0: [2023-03-17 04:07:16,483] [INFO] [logging.py:68:log_dist] [Rank 0] step=158000, skipped=0, lr=[2.3592725009494674e-05, 2.3592725009494674e-05, 2.3592725009494674e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 158000/ 173500 | consumed samples: 40448000 | consumed tokens: 82837504000 | elapsed time per iteration (s): 0.11 | learning rate: 2.359E-05 | global batch size: 256 | lm loss: 4.508984E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2390.174 | TFLOPs: 8.89 | 0: steps: 158000 loss: 4.4908 iter time (s): 0.089 samples/sec: 2879.136 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 158000 | lm loss value: 4.408641E+00 | lm loss PPL: 8.215776E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 158000 to checkpoints_14m91b100m 0: [2023-03-17 04:07:16,552] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step158000 is begin to save! 0: [2023-03-17 04:07:16,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:07:16,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:07:16,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:07:16,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:07:16,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:07:16,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:07:16,589] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:07:16,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:07:16,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:07:16,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:07:16,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:07:16,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:07:16,595] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step158000/mp_rank_00_model_states.pt 0: [2023-03-17 04:07:16,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:07:16,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:07:16,614] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:07:16,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 0: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 6: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: successfully saved checkpoint at iteration 158000 to checkpoints_14m91b100m 2: [2023-03-17 04:07:16,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 4: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 04:07:16,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2023-03-17 04:07:16,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step158000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 5: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 1: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 2: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 3: [2023-03-17 04:07:16,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step158000 is ready now! 7: time (ms) | save-checkpoint: 80.08 7: iteration 158010/ 173500 | consumed samples: 40450560 | consumed tokens: 82842746880 | elapsed time per iteration (s): 0.11 | learning rate: 2.359E-05 | global batch size: 256 | lm loss: 4.514688E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2370.511 | TFLOPs: 8.82 | 7: iteration 158020/ 173500 | consumed samples: 40453120 | consumed tokens: 82847989760 | elapsed time per iteration (s): 0.10 | learning rate: 2.358E-05 | global batch size: 256 | lm loss: 4.499990E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.645 | TFLOPs: 9.92 | 7: iteration 158030/ 173500 | consumed samples: 40455680 | consumed tokens: 82853232640 | elapsed time per iteration (s): 0.09 | learning rate: 2.358E-05 | global batch size: 256 | lm loss: 4.495338E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.925 | TFLOPs: 10.09 | 7: iteration 158040/ 173500 | consumed samples: 40458240 | consumed tokens: 82858475520 | elapsed time per iteration (s): 0.10 | learning rate: 2.357E-05 | global batch size: 256 | lm loss: 4.510191E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.370 | TFLOPs: 10.00 | 7: iteration 158050/ 173500 | consumed samples: 40460800 | consumed tokens: 82863718400 | elapsed time per iteration (s): 0.09 | learning rate: 2.357E-05 | global batch size: 256 | lm loss: 4.503389E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.529 | TFLOPs: 10.04 | 7: iteration 158060/ 173500 | consumed samples: 40463360 | consumed tokens: 82868961280 | elapsed time per iteration (s): 0.10 | learning rate: 2.357E-05 | global batch size: 256 | lm loss: 4.510217E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.991 | TFLOPs: 9.96 | 7: iteration 158070/ 173500 | consumed samples: 40465920 | consumed tokens: 82874204160 | elapsed time per iteration (s): 0.09 | learning rate: 2.356E-05 | global batch size: 256 | lm loss: 4.507533E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2723.280 | TFLOPs: 10.13 | 7: iteration 158080/ 173500 | consumed samples: 40468480 | consumed tokens: 82879447040 | elapsed time per iteration (s): 0.09 | learning rate: 2.356E-05 | global batch size: 256 | lm loss: 4.502150E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2764.662 | TFLOPs: 10.28 | 7: iteration 158090/ 173500 | consumed samples: 40471040 | consumed tokens: 82884689920 | elapsed time per iteration (s): 0.09 | learning rate: 2.355E-05 | global batch size: 256 | lm loss: 4.509969E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2694.850 | TFLOPs: 10.02 | 7: iteration 158100/ 173500 | consumed samples: 40473600 | consumed tokens: 82889932800 | elapsed time per iteration (s): 0.10 | learning rate: 2.355E-05 | global batch size: 256 | lm loss: 4.499462E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2655.445 | TFLOPs: 9.88 | 7: iteration 158110/ 173500 | consumed samples: 40476160 | consumed tokens: 82895175680 | elapsed time per iteration (s): 0.10 | learning rate: 2.354E-05 | global batch size: 256 | lm loss: 4.510175E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2678.080 | TFLOPs: 9.96 | 7: iteration 158120/ 173500 | consumed samples: 40478720 | consumed tokens: 82900418560 | elapsed time per iteration (s): 0.10 | learning rate: 2.354E-05 | global batch size: 256 | lm loss: 4.508173E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2677.933 | TFLOPs: 9.96 | 7: iteration 158130/ 173500 | consumed samples: 40481280 | consumed tokens: 82905661440 | elapsed time per iteration (s): 0.10 | learning rate: 2.353E-05 | global batch size: 256 | lm loss: 4.502766E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2622.887 | TFLOPs: 9.76 | 7: iteration 158140/ 173500 | consumed samples: 40483840 | consumed tokens: 82910904320 | elapsed time per iteration (s): 0.09 | learning rate: 2.353E-05 | global batch size: 256 | lm loss: 4.501064E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.685 | TFLOPs: 10.05 | 7: iteration 158150/ 173500 | consumed samples: 40486400 | consumed tokens: 82916147200 | elapsed time per iteration (s): 0.10 | learning rate: 2.352E-05 | global batch size: 256 | lm loss: 4.519027E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2688.476 | TFLOPs: 10.00 | 7: iteration 158160/ 173500 | consumed samples: 40488960 | consumed tokens: 82921390080 | elapsed time per iteration (s): 0.09 | learning rate: 2.352E-05 | global batch size: 256 | lm loss: 4.501211E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2759.251 | TFLOPs: 10.26 | 7: iteration 158170/ 173500 | consumed samples: 40491520 | consumed tokens: 82926632960 | elapsed time per iteration (s): 0.09 | learning rate: 2.351E-05 | global batch size: 256 | lm loss: 4.494466E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2711.564 | TFLOPs: 10.09 | 7: iteration 158180/ 173500 | consumed samples: 40494080 | consumed tokens: 82931875840 | elapsed time per iteration (s): 0.10 | learning rate: 2.351E-05 | global batch size: 256 | lm loss: 4.502312E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.089 | TFLOPs: 9.92 | 7: iteration 158190/ 173500 | consumed samples: 40496640 | consumed tokens: 82937118720 | elapsed time per iteration (s): 0.09 | learning rate: 2.351E-05 | global batch size: 256 | lm loss: 4.510898E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.162 | TFLOPs: 10.17 | 7: iteration 158200/ 173500 | consumed samples: 40499200 | consumed tokens: 82942361600 | elapsed time per iteration (s): 0.09 | learning rate: 2.350E-05 | global batch size: 256 | lm loss: 4.506742E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2741.801 | TFLOPs: 10.20 | 7: iteration 158210/ 173500 | consumed samples: 40501760 | consumed tokens: 82947604480 | elapsed time per iteration (s): 0.09 | learning rate: 2.350E-05 | global batch size: 256 | lm loss: 4.495680E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.145 | TFLOPs: 10.15 | 7: iteration 158220/ 173500 | consumed samples: 40504320 | consumed tokens: 82952847360 | elapsed time per iteration (s): 0.09 | learning rate: 2.349E-05 | global batch size: 256 | lm loss: 4.498165E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2699.628 | TFLOPs: 10.04 | 7: iteration 158230/ 173500 | consumed samples: 40506880 | consumed tokens: 82958090240 | elapsed time per iteration (s): 0.10 | learning rate: 2.349E-05 | global batch size: 256 | lm loss: 4.517886E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2689.531 | TFLOPs: 10.00 | 7: iteration 158240/ 173500 | consumed samples: 40509440 | consumed tokens: 82963333120 | elapsed time per iteration (s): 0.09 | learning rate: 2.348E-05 | global batch size: 256 | lm loss: 4.507265E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2788.711 | TFLOPs: 10.37 | 7: iteration 158250/ 173500 | consumed samples: 40512000 | consumed tokens: 82968576000 | elapsed time per iteration (s): 0.10 | learning rate: 2.348E-05 | global batch size: 256 | lm loss: 4.506699E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2672.193 | TFLOPs: 9.94 | 7: iteration 158260/ 173500 | consumed samples: 40514560 | consumed tokens: 82973818880 | elapsed time per iteration (s): 0.10 | learning rate: 2.347E-05 | global batch size: 256 | lm loss: 4.509364E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2633.447 | TFLOPs: 9.80 | 7: iteration 158270/ 173500 | consumed samples: 40517120 | consumed tokens: 82979061760 | elapsed time per iteration (s): 0.10 | learning rate: 2.347E-05 | global batch size: 256 | lm loss: 4.501734E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2666.966 | TFLOPs: 9.92 | 7: iteration 158280/ 173500 | consumed samples: 40519680 | consumed tokens: 82984304640 | elapsed time per iteration (s): 0.09 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 4.513601E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2758.677 | TFLOPs: 10.26 | 7: iteration 158290/ 173500 | consumed samples: 40522240 | consumed tokens: 82989547520 | elapsed time per iteration (s): 0.09 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 4.500156E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.557 | TFLOPs: 10.04 | 7: iteration 158300/ 173500 | consumed samples: 40524800 | consumed tokens: 82994790400 | elapsed time per iteration (s): 0.21 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 4.502000E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1203.003 | TFLOPs: 4.47 | 7: iteration 158310/ 173500 | consumed samples: 40527360 | consumed tokens: 83000033280 | elapsed time per iteration (s): 0.08 | learning rate: 2.345E-05 | global batch size: 256 | lm loss: 4.496836E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3014.880 | TFLOPs: 11.21 | 7: iteration 158320/ 173500 | consumed samples: 40529920 | consumed tokens: 83005276160 | elapsed time per iteration (s): 0.08 | learning rate: 2.345E-05 | global batch size: 256 | lm loss: 4.497066E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.677 | TFLOPs: 11.75 | 7: iteration 158330/ 173500 | consumed samples: 40532480 | consumed tokens: 83010519040 | elapsed time per iteration (s): 0.08 | learning rate: 2.344E-05 | global batch size: 256 | lm loss: 4.511631E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.016 | TFLOPs: 11.62 | 7: iteration 158340/ 173500 | consumed samples: 40535040 | consumed tokens: 83015761920 | elapsed time per iteration (s): 0.08 | learning rate: 2.344E-05 | global batch size: 256 | lm loss: 4.492961E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.915 | TFLOPs: 11.78 | 7: iteration 158350/ 173500 | consumed samples: 40537600 | consumed tokens: 83021004800 | elapsed time per iteration (s): 0.08 | learning rate: 2.343E-05 | global batch size: 256 | lm loss: 4.516314E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.315 | TFLOPs: 11.81 | 7: iteration 158360/ 173500 | consumed samples: 40540160 | consumed tokens: 83026247680 | elapsed time per iteration (s): 0.08 | learning rate: 2.343E-05 | global batch size: 256 | lm loss: 4.504501E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.458 | TFLOPs: 11.82 | 7: iteration 158370/ 173500 | consumed samples: 40542720 | consumed tokens: 83031490560 | elapsed time per iteration (s): 0.08 | learning rate: 2.342E-05 | global batch size: 256 | lm loss: 4.498207E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.657 | TFLOPs: 11.79 | 7: iteration 158380/ 173500 | consumed samples: 40545280 | consumed tokens: 83036733440 | elapsed time per iteration (s): 0.08 | learning rate: 2.342E-05 | global batch size: 256 | lm loss: 4.505983E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.091 | TFLOPs: 11.78 | 7: iteration 158390/ 173500 | consumed samples: 40547840 | consumed tokens: 83041976320 | elapsed time per iteration (s): 0.08 | learning rate: 2.342E-05 | global batch size: 256 | lm loss: 4.493496E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.419 | TFLOPs: 11.82 | 7: iteration 158400/ 173500 | consumed samples: 40550400 | consumed tokens: 83047219200 | elapsed time per iteration (s): 0.08 | learning rate: 2.341E-05 | global batch size: 256 | lm loss: 4.515846E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.336 | TFLOPs: 11.84 | 7: iteration 158410/ 173500 | consumed samples: 40552960 | consumed tokens: 83052462080 | elapsed time per iteration (s): 0.08 | learning rate: 2.341E-05 | global batch size: 256 | lm loss: 4.505063E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.784 | TFLOPs: 11.82 | 7: iteration 158420/ 173500 | consumed samples: 40555520 | consumed tokens: 83057704960 | elapsed time per iteration (s): 0.08 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 4.515022E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.804 | TFLOPs: 11.82 | 7: iteration 158430/ 173500 | consumed samples: 40558080 | consumed tokens: 83062947840 | elapsed time per iteration (s): 0.08 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 4.501529E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.413 | TFLOPs: 11.71 | 7: iteration 158440/ 173500 | consumed samples: 40560640 | consumed tokens: 83068190720 | elapsed time per iteration (s): 0.08 | learning rate: 2.339E-05 | global batch size: 256 | lm loss: 4.503193E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.433 | TFLOPs: 11.82 | 7: iteration 158450/ 173500 | consumed samples: 40563200 | consumed tokens: 83073433600 | elapsed time per iteration (s): 0.08 | learning rate: 2.339E-05 | global batch size: 256 | lm loss: 4.503292E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.512 | TFLOPs: 11.82 | 7: iteration 158460/ 173500 | consumed samples: 40565760 | consumed tokens: 83078676480 | elapsed time per iteration (s): 0.08 | learning rate: 2.338E-05 | global batch size: 256 | lm loss: 4.493939E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.173 | TFLOPs: 11.85 | 7: iteration 158470/ 173500 | consumed samples: 40568320 | consumed tokens: 83083919360 | elapsed time per iteration (s): 0.08 | learning rate: 2.338E-05 | global batch size: 256 | lm loss: 4.507601E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.396 | TFLOPs: 11.85 | 7: iteration 158480/ 173500 | consumed samples: 40570880 | consumed tokens: 83089162240 | elapsed time per iteration (s): 0.08 | learning rate: 2.338E-05 | global batch size: 256 | lm loss: 4.508295E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.132 | TFLOPs: 11.90 | 7: iteration 158490/ 173500 | consumed samples: 40573440 | consumed tokens: 83094405120 | elapsed time per iteration (s): 0.08 | learning rate: 2.337E-05 | global batch size: 256 | lm loss: 4.501914E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.828 | TFLOPs: 11.93 | 7: iteration 158500/ 173500 | consumed samples: 40576000 | consumed tokens: 83099648000 | elapsed time per iteration (s): 0.08 | learning rate: 2.337E-05 | global batch size: 256 | lm loss: 4.498070E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.130 | TFLOPs: 11.90 | 7: iteration 158510/ 173500 | consumed samples: 40578560 | consumed tokens: 83104890880 | elapsed time per iteration (s): 0.08 | learning rate: 2.336E-05 | global batch size: 256 | lm loss: 4.504949E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.050 | TFLOPs: 11.92 | 7: iteration 158520/ 173500 | consumed samples: 40581120 | consumed tokens: 83110133760 | elapsed time per iteration (s): 0.09 | learning rate: 2.336E-05 | global batch size: 256 | lm loss: 4.496300E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2912.990 | TFLOPs: 10.84 | 7: iteration 158530/ 173500 | consumed samples: 40583680 | consumed tokens: 83115376640 | elapsed time per iteration (s): 0.08 | learning rate: 2.335E-05 | global batch size: 256 | lm loss: 4.498457E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.891 | TFLOPs: 11.84 | 7: iteration 158540/ 173500 | consumed samples: 40586240 | consumed tokens: 83120619520 | elapsed time per iteration (s): 0.08 | learning rate: 2.335E-05 | global batch size: 256 | lm loss: 4.503418E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.356 | TFLOPs: 11.80 | 7: iteration 158550/ 173500 | consumed samples: 40588800 | consumed tokens: 83125862400 | elapsed time per iteration (s): 0.08 | learning rate: 2.334E-05 | global batch size: 256 | lm loss: 4.514001E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.781 | TFLOPs: 11.87 | 7: iteration 158560/ 173500 | consumed samples: 40591360 | consumed tokens: 83131105280 | elapsed time per iteration (s): 0.09 | learning rate: 2.334E-05 | global batch size: 256 | lm loss: 4.514110E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3007.967 | TFLOPs: 11.19 | 7: iteration 158570/ 173500 | consumed samples: 40593920 | consumed tokens: 83136348160 | elapsed time per iteration (s): 0.09 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 4.505837E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.222 | TFLOPs: 10.39 | 7: iteration 158580/ 173500 | consumed samples: 40596480 | consumed tokens: 83141591040 | elapsed time per iteration (s): 0.08 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 4.505962E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.269 | TFLOPs: 11.82 | 7: iteration 158590/ 173500 | consumed samples: 40599040 | consumed tokens: 83146833920 | elapsed time per iteration (s): 0.08 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 4.509925E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.355 | TFLOPs: 11.79 | 7: iteration 158600/ 173500 | consumed samples: 40601600 | consumed tokens: 83152076800 | elapsed time per iteration (s): 0.08 | learning rate: 2.332E-05 | global batch size: 256 | lm loss: 4.513088E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.714 | TFLOPs: 11.83 | 7: iteration 158610/ 173500 | consumed samples: 40604160 | consumed tokens: 83157319680 | elapsed time per iteration (s): 0.08 | learning rate: 2.332E-05 | global batch size: 256 | lm loss: 4.504724E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.123 | TFLOPs: 11.84 | 7: iteration 158620/ 173500 | consumed samples: 40606720 | consumed tokens: 83162562560 | elapsed time per iteration (s): 0.08 | learning rate: 2.331E-05 | global batch size: 256 | lm loss: 4.509882E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.772 | TFLOPs: 11.84 | 7: iteration 158630/ 173500 | consumed samples: 40609280 | consumed tokens: 83167805440 | elapsed time per iteration (s): 0.08 | learning rate: 2.331E-05 | global batch size: 256 | lm loss: 4.506772E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.961 | TFLOPs: 11.83 | 7: iteration 158640/ 173500 | consumed samples: 40611840 | consumed tokens: 83173048320 | elapsed time per iteration (s): 0.08 | learning rate: 2.330E-05 | global batch size: 256 | lm loss: 4.500478E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.938 | TFLOPs: 11.85 | 7: iteration 158650/ 173500 | consumed samples: 40614400 | consumed tokens: 83178291200 | elapsed time per iteration (s): 0.08 | learning rate: 2.330E-05 | global batch size: 256 | lm loss: 4.503957E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.367 | TFLOPs: 11.82 | 7: iteration 158660/ 173500 | consumed samples: 40616960 | consumed tokens: 83183534080 | elapsed time per iteration (s): 0.08 | learning rate: 2.330E-05 | global batch size: 256 | lm loss: 4.489909E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.466 | TFLOPs: 11.86 | 7: iteration 158670/ 173500 | consumed samples: 40619520 | consumed tokens: 83188776960 | elapsed time per iteration (s): 0.08 | learning rate: 2.329E-05 | global batch size: 256 | lm loss: 4.499340E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.693 | TFLOPs: 11.89 | 7: iteration 158680/ 173500 | consumed samples: 40622080 | consumed tokens: 83194019840 | elapsed time per iteration (s): 0.08 | learning rate: 2.329E-05 | global batch size: 256 | lm loss: 4.519853E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.004 | TFLOPs: 11.86 | 7: iteration 158690/ 173500 | consumed samples: 40624640 | consumed tokens: 83199262720 | elapsed time per iteration (s): 0.08 | learning rate: 2.328E-05 | global batch size: 256 | lm loss: 4.509066E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.106 | TFLOPs: 11.71 | 7: iteration 158700/ 173500 | consumed samples: 40627200 | consumed tokens: 83204505600 | elapsed time per iteration (s): 0.08 | learning rate: 2.328E-05 | global batch size: 256 | lm loss: 4.509305E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.809 | TFLOPs: 11.85 | 7: iteration 158710/ 173500 | consumed samples: 40629760 | consumed tokens: 83209748480 | elapsed time per iteration (s): 0.08 | learning rate: 2.327E-05 | global batch size: 256 | lm loss: 4.512160E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.180 | TFLOPs: 11.84 | 7: iteration 158720/ 173500 | consumed samples: 40632320 | consumed tokens: 83214991360 | elapsed time per iteration (s): 0.08 | learning rate: 2.327E-05 | global batch size: 256 | lm loss: 4.500701E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.594 | TFLOPs: 11.94 | 7: iteration 158730/ 173500 | consumed samples: 40634880 | consumed tokens: 83220234240 | elapsed time per iteration (s): 0.08 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 4.501247E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.708 | TFLOPs: 11.88 | 7: iteration 158740/ 173500 | consumed samples: 40637440 | consumed tokens: 83225477120 | elapsed time per iteration (s): 0.08 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 4.505995E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.174 | TFLOPs: 11.95 | 7: iteration 158750/ 173500 | consumed samples: 40640000 | consumed tokens: 83230720000 | elapsed time per iteration (s): 0.08 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 4.505279E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.067 | TFLOPs: 12.00 | 7: iteration 158760/ 173500 | consumed samples: 40642560 | consumed tokens: 83235962880 | elapsed time per iteration (s): 0.08 | learning rate: 2.325E-05 | global batch size: 256 | lm loss: 4.505375E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.203 | TFLOPs: 12.04 | 7: iteration 158770/ 173500 | consumed samples: 40645120 | consumed tokens: 83241205760 | elapsed time per iteration (s): 0.08 | learning rate: 2.325E-05 | global batch size: 256 | lm loss: 4.503970E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.249 | TFLOPs: 11.96 | 7: iteration 158780/ 173500 | consumed samples: 40647680 | consumed tokens: 83246448640 | elapsed time per iteration (s): 0.08 | learning rate: 2.324E-05 | global batch size: 256 | lm loss: 4.505381E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.912 | TFLOPs: 11.94 | 7: iteration 158790/ 173500 | consumed samples: 40650240 | consumed tokens: 83251691520 | elapsed time per iteration (s): 0.08 | learning rate: 2.324E-05 | global batch size: 256 | lm loss: 4.489439E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.613 | TFLOPs: 11.99 | 7: iteration 158800/ 173500 | consumed samples: 40652800 | consumed tokens: 83256934400 | elapsed time per iteration (s): 0.08 | learning rate: 2.323E-05 | global batch size: 256 | lm loss: 4.504430E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.990 | TFLOPs: 11.72 | 7: iteration 158810/ 173500 | consumed samples: 40655360 | consumed tokens: 83262177280 | elapsed time per iteration (s): 0.08 | learning rate: 2.323E-05 | global batch size: 256 | lm loss: 4.510700E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.166 | TFLOPs: 11.87 | 7: iteration 158820/ 173500 | consumed samples: 40657920 | consumed tokens: 83267420160 | elapsed time per iteration (s): 0.08 | learning rate: 2.322E-05 | global batch size: 256 | lm loss: 4.501380E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.133 | TFLOPs: 11.86 | 7: iteration 158830/ 173500 | consumed samples: 40660480 | consumed tokens: 83272663040 | elapsed time per iteration (s): 0.08 | learning rate: 2.322E-05 | global batch size: 256 | lm loss: 4.493435E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.022 | TFLOPs: 11.88 | 7: iteration 158840/ 173500 | consumed samples: 40663040 | consumed tokens: 83277905920 | elapsed time per iteration (s): 0.08 | learning rate: 2.322E-05 | global batch size: 256 | lm loss: 4.490564E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.196 | TFLOPs: 11.74 | 7: iteration 158850/ 173500 | consumed samples: 40665600 | consumed tokens: 83283148800 | elapsed time per iteration (s): 0.08 | learning rate: 2.321E-05 | global batch size: 256 | lm loss: 4.491701E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.025 | TFLOPs: 11.85 | 7: iteration 158860/ 173500 | consumed samples: 40668160 | consumed tokens: 83288391680 | elapsed time per iteration (s): 0.08 | learning rate: 2.321E-05 | global batch size: 256 | lm loss: 4.514585E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.662 | TFLOPs: 11.88 | 7: iteration 158870/ 173500 | consumed samples: 40670720 | consumed tokens: 83293634560 | elapsed time per iteration (s): 0.08 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 4.514928E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.952 | TFLOPs: 11.27 | 7: iteration 158880/ 173500 | consumed samples: 40673280 | consumed tokens: 83298877440 | elapsed time per iteration (s): 0.08 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 4.499997E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.832 | TFLOPs: 11.90 | 7: iteration 158890/ 173500 | consumed samples: 40675840 | consumed tokens: 83304120320 | elapsed time per iteration (s): 0.08 | learning rate: 2.319E-05 | global batch size: 256 | lm loss: 4.500716E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.495 | TFLOPs: 11.89 | 7: iteration 158900/ 173500 | consumed samples: 40678400 | consumed tokens: 83309363200 | elapsed time per iteration (s): 0.08 | learning rate: 2.319E-05 | global batch size: 256 | lm loss: 4.500299E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.190 | TFLOPs: 11.88 | 7: iteration 158910/ 173500 | consumed samples: 40680960 | consumed tokens: 83314606080 | elapsed time per iteration (s): 0.08 | learning rate: 2.319E-05 | global batch size: 256 | lm loss: 4.506276E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.725 | TFLOPs: 11.89 | 7: iteration 158920/ 173500 | consumed samples: 40683520 | consumed tokens: 83319848960 | elapsed time per iteration (s): 0.08 | learning rate: 2.318E-05 | global batch size: 256 | lm loss: 4.504029E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.276 | TFLOPs: 11.87 | 7: iteration 158930/ 173500 | consumed samples: 40686080 | consumed tokens: 83325091840 | elapsed time per iteration (s): 0.08 | learning rate: 2.318E-05 | global batch size: 256 | lm loss: 4.512422E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.098 | TFLOPs: 11.78 | 7: iteration 158940/ 173500 | consumed samples: 40688640 | consumed tokens: 83330334720 | elapsed time per iteration (s): 0.08 | learning rate: 2.317E-05 | global batch size: 256 | lm loss: 4.505383E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.889 | TFLOPs: 11.86 | 7: iteration 158950/ 173500 | consumed samples: 40691200 | consumed tokens: 83335577600 | elapsed time per iteration (s): 0.08 | learning rate: 2.317E-05 | global batch size: 256 | lm loss: 4.520694E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.940 | TFLOPs: 11.85 | 7: iteration 158960/ 173500 | consumed samples: 40693760 | consumed tokens: 83340820480 | elapsed time per iteration (s): 0.08 | learning rate: 2.316E-05 | global batch size: 256 | lm loss: 4.511098E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.333 | TFLOPs: 11.85 | 7: iteration 158970/ 173500 | consumed samples: 40696320 | consumed tokens: 83346063360 | elapsed time per iteration (s): 0.08 | learning rate: 2.316E-05 | global batch size: 256 | lm loss: 4.502233E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.118 | TFLOPs: 11.89 | 7: iteration 158980/ 173500 | consumed samples: 40698880 | consumed tokens: 83351306240 | elapsed time per iteration (s): 0.08 | learning rate: 2.316E-05 | global batch size: 256 | lm loss: 4.490125E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.885 | TFLOPs: 11.86 | 7: iteration 158990/ 173500 | consumed samples: 40701440 | consumed tokens: 83356549120 | elapsed time per iteration (s): 0.08 | learning rate: 2.315E-05 | global batch size: 256 | lm loss: 4.501213E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.701 | TFLOPs: 11.89 | 7: iteration 159000/ 173500 | consumed samples: 40704000 | consumed tokens: 83361792000 | elapsed time per iteration (s): 0.08 | learning rate: 2.315E-05 | global batch size: 256 | lm loss: 4.498138E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.241 | TFLOPs: 11.87 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 159000 | lm loss value: 4.411024E+00 | lm loss PPL: 8.235372E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 159000 to checkpoints_14m91b100m 0: [2023-03-17 04:08:42,868] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step159000 is begin to save! 0: [2023-03-17 04:08:42,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:08:42,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:08:42,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:08:42,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:08:42,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:08:42,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:08:42,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:08:42,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:08:42,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:08:42,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:08:42,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:08:42,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:08:42,911] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step159000/mp_rank_00_model_states.pt 0: [2023-03-17 04:08:42,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:08:42,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:08:42,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:08:42,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,936] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,936] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,939] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,939] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 2: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,942] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,942] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 6: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 5: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 3: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 7: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 4: [2023-03-17 04:08:42,943] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:08:42,943] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 1: [2023-03-17 04:08:42,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:08:42,944] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step159000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:08:42,944] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step159000 is ready now! 0: successfully saved checkpoint at iteration 159000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 79.15 7: iteration 159010/ 173500 | consumed samples: 40706560 | consumed tokens: 83367034880 | elapsed time per iteration (s): 0.11 | learning rate: 2.314E-05 | global batch size: 256 | lm loss: 4.497037E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2380.254 | TFLOPs: 8.85 | 7: iteration 159020/ 173500 | consumed samples: 40709120 | consumed tokens: 83372277760 | elapsed time per iteration (s): 0.08 | learning rate: 2.314E-05 | global batch size: 256 | lm loss: 4.499765E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.550 | TFLOPs: 11.95 | 7: iteration 159030/ 173500 | consumed samples: 40711680 | consumed tokens: 83377520640 | elapsed time per iteration (s): 0.08 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 4.503490E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.049 | TFLOPs: 11.99 | 7: iteration 159040/ 173500 | consumed samples: 40714240 | consumed tokens: 83382763520 | elapsed time per iteration (s): 0.08 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 4.498391E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.552 | TFLOPs: 12.03 | 7: iteration 159050/ 173500 | consumed samples: 40716800 | consumed tokens: 83388006400 | elapsed time per iteration (s): 0.08 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 4.508232E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.453 | TFLOPs: 12.03 | 7: iteration 159060/ 173500 | consumed samples: 40719360 | consumed tokens: 83393249280 | elapsed time per iteration (s): 0.08 | learning rate: 2.312E-05 | global batch size: 256 | lm loss: 4.502572E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.930 | TFLOPs: 11.96 | 7: iteration 159070/ 173500 | consumed samples: 40721920 | consumed tokens: 83398492160 | elapsed time per iteration (s): 0.08 | learning rate: 2.312E-05 | global batch size: 256 | lm loss: 4.513299E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.190 | TFLOPs: 11.99 | 7: iteration 159080/ 173500 | consumed samples: 40724480 | consumed tokens: 83403735040 | elapsed time per iteration (s): 0.08 | learning rate: 2.311E-05 | global batch size: 256 | lm loss: 4.504070E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.853 | TFLOPs: 11.99 | 7: iteration 159090/ 173500 | consumed samples: 40727040 | consumed tokens: 83408977920 | elapsed time per iteration (s): 0.08 | learning rate: 2.311E-05 | global batch size: 256 | lm loss: 4.506647E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.003 | TFLOPs: 12.01 | 7: iteration 159100/ 173500 | consumed samples: 40729600 | consumed tokens: 83414220800 | elapsed time per iteration (s): 0.08 | learning rate: 2.310E-05 | global batch size: 256 | lm loss: 4.504703E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.714 | TFLOPs: 11.89 | 7: iteration 159110/ 173500 | consumed samples: 40732160 | consumed tokens: 83419463680 | elapsed time per iteration (s): 0.08 | learning rate: 2.310E-05 | global batch size: 256 | lm loss: 4.514157E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.583 | TFLOPs: 12.05 | 7: iteration 159120/ 173500 | consumed samples: 40734720 | consumed tokens: 83424706560 | elapsed time per iteration (s): 0.08 | learning rate: 2.310E-05 | global batch size: 256 | lm loss: 4.511033E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.684 | TFLOPs: 11.96 | 7: iteration 159130/ 173500 | consumed samples: 40737280 | consumed tokens: 83429949440 | elapsed time per iteration (s): 0.08 | learning rate: 2.309E-05 | global batch size: 256 | lm loss: 4.493653E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.961 | TFLOPs: 12.04 | 7: iteration 159140/ 173500 | consumed samples: 40739840 | consumed tokens: 83435192320 | elapsed time per iteration (s): 0.08 | learning rate: 2.309E-05 | global batch size: 256 | lm loss: 4.493797E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.844 | TFLOPs: 12.03 | 7: iteration 159150/ 173500 | consumed samples: 40742400 | consumed tokens: 83440435200 | elapsed time per iteration (s): 0.08 | learning rate: 2.308E-05 | global batch size: 256 | lm loss: 4.504934E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.172 | TFLOPs: 12.01 | 7: iteration 159160/ 173500 | consumed samples: 40744960 | consumed tokens: 83445678080 | elapsed time per iteration (s): 0.08 | learning rate: 2.308E-05 | global batch size: 256 | lm loss: 4.496808E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.443 | TFLOPs: 12.03 | 7: iteration 159170/ 173500 | consumed samples: 40747520 | consumed tokens: 83450920960 | elapsed time per iteration (s): 0.08 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 4.509150E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.325 | TFLOPs: 11.97 | 7: iteration 159180/ 173500 | consumed samples: 40750080 | consumed tokens: 83456163840 | elapsed time per iteration (s): 0.08 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 4.504312E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.274 | TFLOPs: 11.99 | 7: iteration 159190/ 173500 | consumed samples: 40752640 | consumed tokens: 83461406720 | elapsed time per iteration (s): 0.08 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 4.507270E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.683 | TFLOPs: 12.01 | 7: iteration 159200/ 173500 | consumed samples: 40755200 | consumed tokens: 83466649600 | elapsed time per iteration (s): 0.08 | learning rate: 2.306E-05 | global batch size: 256 | lm loss: 4.506985E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.651 | TFLOPs: 12.01 | 7: iteration 159210/ 173500 | consumed samples: 40757760 | consumed tokens: 83471892480 | elapsed time per iteration (s): 0.08 | learning rate: 2.306E-05 | global batch size: 256 | lm loss: 4.497925E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.564 | TFLOPs: 12.03 | 7: iteration 159220/ 173500 | consumed samples: 40760320 | consumed tokens: 83477135360 | elapsed time per iteration (s): 0.08 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 4.511141E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.518 | TFLOPs: 12.03 | 7: iteration 159230/ 173500 | consumed samples: 40762880 | consumed tokens: 83482378240 | elapsed time per iteration (s): 0.08 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 4.511321E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.565 | TFLOPs: 12.04 | 7: iteration 159240/ 173500 | consumed samples: 40765440 | consumed tokens: 83487621120 | elapsed time per iteration (s): 0.08 | learning rate: 2.304E-05 | global batch size: 256 | lm loss: 4.503250E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.460 | TFLOPs: 12.02 | 7: iteration 159250/ 173500 | consumed samples: 40768000 | consumed tokens: 83492864000 | elapsed time per iteration (s): 0.08 | learning rate: 2.304E-05 | global batch size: 256 | lm loss: 4.520898E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.900 | TFLOPs: 12.01 | 7: iteration 159260/ 173500 | consumed samples: 40770560 | consumed tokens: 83498106880 | elapsed time per iteration (s): 0.08 | learning rate: 2.304E-05 | global batch size: 256 | lm loss: 4.506230E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.358 | TFLOPs: 11.99 | 7: iteration 159270/ 173500 | consumed samples: 40773120 | consumed tokens: 83503349760 | elapsed time per iteration (s): 0.08 | learning rate: 2.303E-05 | global batch size: 256 | lm loss: 4.497429E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.248 | TFLOPs: 11.92 | 7: iteration 159280/ 173500 | consumed samples: 40775680 | consumed tokens: 83508592640 | elapsed time per iteration (s): 0.08 | learning rate: 2.303E-05 | global batch size: 256 | lm loss: 4.503770E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.863 | TFLOPs: 12.00 | 7: iteration 159290/ 173500 | consumed samples: 40778240 | consumed tokens: 83513835520 | elapsed time per iteration (s): 0.08 | learning rate: 2.302E-05 | global batch size: 256 | lm loss: 4.508768E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.690 | TFLOPs: 12.02 | 7: iteration 159300/ 173500 | consumed samples: 40780800 | consumed tokens: 83519078400 | elapsed time per iteration (s): 0.08 | learning rate: 2.302E-05 | global batch size: 256 | lm loss: 4.505376E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.756 | TFLOPs: 11.48 | 7: iteration 159310/ 173500 | consumed samples: 40783360 | consumed tokens: 83524321280 | elapsed time per iteration (s): 0.08 | learning rate: 2.301E-05 | global batch size: 256 | lm loss: 4.518343E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3019.534 | TFLOPs: 11.23 | 7: iteration 159320/ 173500 | consumed samples: 40785920 | consumed tokens: 83529564160 | elapsed time per iteration (s): 0.08 | learning rate: 2.301E-05 | global batch size: 256 | lm loss: 4.510636E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.281 | TFLOPs: 12.06 | 7: iteration 159330/ 173500 | consumed samples: 40788480 | consumed tokens: 83534807040 | elapsed time per iteration (s): 0.08 | learning rate: 2.301E-05 | global batch size: 256 | lm loss: 4.492234E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.060 | TFLOPs: 11.65 | 7: iteration 159340/ 173500 | consumed samples: 40791040 | consumed tokens: 83540049920 | elapsed time per iteration (s): 0.09 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 4.498099E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.065 | TFLOPs: 11.11 | 7: iteration 159350/ 173500 | consumed samples: 40793600 | consumed tokens: 83545292800 | elapsed time per iteration (s): 0.08 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 4.496751E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.316 | TFLOPs: 11.58 | 7: iteration 159360/ 173500 | consumed samples: 40796160 | consumed tokens: 83550535680 | elapsed time per iteration (s): 0.08 | learning rate: 2.299E-05 | global batch size: 256 | lm loss: 4.505195E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.215 | TFLOPs: 11.64 | 7: iteration 159370/ 173500 | consumed samples: 40798720 | consumed tokens: 83555778560 | elapsed time per iteration (s): 0.08 | learning rate: 2.299E-05 | global batch size: 256 | lm loss: 4.499863E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.402 | TFLOPs: 11.70 | 7: iteration 159380/ 173500 | consumed samples: 40801280 | consumed tokens: 83561021440 | elapsed time per iteration (s): 0.08 | learning rate: 2.298E-05 | global batch size: 256 | lm loss: 4.498352E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.784 | TFLOPs: 11.94 | 7: iteration 159390/ 173500 | consumed samples: 40803840 | consumed tokens: 83566264320 | elapsed time per iteration (s): 0.09 | learning rate: 2.298E-05 | global batch size: 256 | lm loss: 4.499277E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2930.047 | TFLOPs: 10.90 | 7: iteration 159400/ 173500 | consumed samples: 40806400 | consumed tokens: 83571507200 | elapsed time per iteration (s): 0.08 | learning rate: 2.298E-05 | global batch size: 256 | lm loss: 4.501466E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.661 | TFLOPs: 11.70 | 7: iteration 159410/ 173500 | consumed samples: 40808960 | consumed tokens: 83576750080 | elapsed time per iteration (s): 0.08 | learning rate: 2.297E-05 | global batch size: 256 | lm loss: 4.509700E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.849 | TFLOPs: 11.65 | 7: iteration 159420/ 173500 | consumed samples: 40811520 | consumed tokens: 83581992960 | elapsed time per iteration (s): 0.08 | learning rate: 2.297E-05 | global batch size: 256 | lm loss: 4.512550E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.869 | TFLOPs: 11.39 | 7: iteration 159430/ 173500 | consumed samples: 40814080 | consumed tokens: 83587235840 | elapsed time per iteration (s): 0.08 | learning rate: 2.296E-05 | global batch size: 256 | lm loss: 4.517001E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.527 | TFLOPs: 11.99 | 7: iteration 159440/ 173500 | consumed samples: 40816640 | consumed tokens: 83592478720 | elapsed time per iteration (s): 0.08 | learning rate: 2.296E-05 | global batch size: 256 | lm loss: 4.500802E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.091 | TFLOPs: 11.40 | 7: iteration 159450/ 173500 | consumed samples: 40819200 | consumed tokens: 83597721600 | elapsed time per iteration (s): 0.08 | learning rate: 2.296E-05 | global batch size: 256 | lm loss: 4.495855E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.254 | TFLOPs: 11.68 | 7: iteration 159460/ 173500 | consumed samples: 40821760 | consumed tokens: 83602964480 | elapsed time per iteration (s): 0.08 | learning rate: 2.295E-05 | global batch size: 256 | lm loss: 4.511895E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.702 | TFLOPs: 11.67 | 7: iteration 159470/ 173500 | consumed samples: 40824320 | consumed tokens: 83608207360 | elapsed time per iteration (s): 0.08 | learning rate: 2.295E-05 | global batch size: 256 | lm loss: 4.493769E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.585 | TFLOPs: 11.70 | 7: iteration 159480/ 173500 | consumed samples: 40826880 | consumed tokens: 83613450240 | elapsed time per iteration (s): 0.08 | learning rate: 2.294E-05 | global batch size: 256 | lm loss: 4.507956E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.081 | TFLOPs: 11.36 | 7: iteration 159490/ 173500 | consumed samples: 40829440 | consumed tokens: 83618693120 | elapsed time per iteration (s): 0.08 | learning rate: 2.294E-05 | global batch size: 256 | lm loss: 4.504745E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.891 | TFLOPs: 11.96 | 7: iteration 159500/ 173500 | consumed samples: 40832000 | consumed tokens: 83623936000 | elapsed time per iteration (s): 0.09 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 4.512825E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.804 | TFLOPs: 11.15 | 7: iteration 159510/ 173500 | consumed samples: 40834560 | consumed tokens: 83629178880 | elapsed time per iteration (s): 0.08 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 4.520961E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.574 | TFLOPs: 11.40 | 7: iteration 159520/ 173500 | consumed samples: 40837120 | consumed tokens: 83634421760 | elapsed time per iteration (s): 0.09 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 4.503963E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2937.981 | TFLOPs: 10.93 | 7: iteration 159530/ 173500 | consumed samples: 40839680 | consumed tokens: 83639664640 | elapsed time per iteration (s): 0.09 | learning rate: 2.292E-05 | global batch size: 256 | lm loss: 4.497710E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3001.550 | TFLOPs: 11.16 | 7: iteration 159540/ 173500 | consumed samples: 40842240 | consumed tokens: 83644907520 | elapsed time per iteration (s): 0.09 | learning rate: 2.292E-05 | global batch size: 256 | lm loss: 4.501127E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.229 | TFLOPs: 11.04 | 7: iteration 159550/ 173500 | consumed samples: 40844800 | consumed tokens: 83650150400 | elapsed time per iteration (s): 0.08 | learning rate: 2.291E-05 | global batch size: 256 | lm loss: 4.506694E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3065.369 | TFLOPs: 11.40 | 7: iteration 159560/ 173500 | consumed samples: 40847360 | consumed tokens: 83655393280 | elapsed time per iteration (s): 0.08 | learning rate: 2.291E-05 | global batch size: 256 | lm loss: 4.493923E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.649 | TFLOPs: 11.98 | 7: iteration 159570/ 173500 | consumed samples: 40849920 | consumed tokens: 83660636160 | elapsed time per iteration (s): 0.08 | learning rate: 2.291E-05 | global batch size: 256 | lm loss: 4.510643E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.330 | TFLOPs: 11.70 | 7: iteration 159580/ 173500 | consumed samples: 40852480 | consumed tokens: 83665879040 | elapsed time per iteration (s): 0.08 | learning rate: 2.290E-05 | global batch size: 256 | lm loss: 4.492200E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.484 | TFLOPs: 11.67 | 7: iteration 159590/ 173500 | consumed samples: 40855040 | consumed tokens: 83671121920 | elapsed time per iteration (s): 0.08 | learning rate: 2.290E-05 | global batch size: 256 | lm loss: 4.513846E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.705 | TFLOPs: 11.94 | 7: iteration 159600/ 173500 | consumed samples: 40857600 | consumed tokens: 83676364800 | elapsed time per iteration (s): 0.09 | learning rate: 2.289E-05 | global batch size: 256 | lm loss: 4.508657E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.010 | TFLOPs: 11.15 | 7: iteration 159610/ 173500 | consumed samples: 40860160 | consumed tokens: 83681607680 | elapsed time per iteration (s): 0.08 | learning rate: 2.289E-05 | global batch size: 256 | lm loss: 4.521404E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.196 | TFLOPs: 11.38 | 7: iteration 159620/ 173500 | consumed samples: 40862720 | consumed tokens: 83686850560 | elapsed time per iteration (s): 0.08 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 4.499524E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.701 | TFLOPs: 11.91 | 7: iteration 159630/ 173500 | consumed samples: 40865280 | consumed tokens: 83692093440 | elapsed time per iteration (s): 0.08 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 4.503895E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.838 | TFLOPs: 11.95 | 7: iteration 159640/ 173500 | consumed samples: 40867840 | consumed tokens: 83697336320 | elapsed time per iteration (s): 0.08 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 4.511201E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.308 | TFLOPs: 11.43 | 7: iteration 159650/ 173500 | consumed samples: 40870400 | consumed tokens: 83702579200 | elapsed time per iteration (s): 0.08 | learning rate: 2.287E-05 | global batch size: 256 | lm loss: 4.500270E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.232 | TFLOPs: 11.91 | 7: iteration 159660/ 173500 | consumed samples: 40872960 | consumed tokens: 83707822080 | elapsed time per iteration (s): 0.08 | learning rate: 2.287E-05 | global batch size: 256 | lm loss: 4.515655E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.209 | TFLOPs: 11.94 | 7: iteration 159670/ 173500 | consumed samples: 40875520 | consumed tokens: 83713064960 | elapsed time per iteration (s): 0.08 | learning rate: 2.286E-05 | global batch size: 256 | lm loss: 4.485975E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.217 | TFLOPs: 11.88 | 7: iteration 159680/ 173500 | consumed samples: 40878080 | consumed tokens: 83718307840 | elapsed time per iteration (s): 0.08 | learning rate: 2.286E-05 | global batch size: 256 | lm loss: 4.512939E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.583 | TFLOPs: 11.86 | 7: iteration 159690/ 173500 | consumed samples: 40880640 | consumed tokens: 83723550720 | elapsed time per iteration (s): 0.08 | learning rate: 2.286E-05 | global batch size: 256 | lm loss: 4.501702E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.810 | TFLOPs: 11.91 | 7: iteration 159700/ 173500 | consumed samples: 40883200 | consumed tokens: 83728793600 | elapsed time per iteration (s): 0.08 | learning rate: 2.285E-05 | global batch size: 256 | lm loss: 4.500270E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.507 | TFLOPs: 12.03 | 7: iteration 159710/ 173500 | consumed samples: 40885760 | consumed tokens: 83734036480 | elapsed time per iteration (s): 0.08 | learning rate: 2.285E-05 | global batch size: 256 | lm loss: 4.516850E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.263 | TFLOPs: 12.03 | 7: iteration 159720/ 173500 | consumed samples: 40888320 | consumed tokens: 83739279360 | elapsed time per iteration (s): 0.08 | learning rate: 2.284E-05 | global batch size: 256 | lm loss: 4.507159E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.109 | TFLOPs: 12.02 | 7: iteration 159730/ 173500 | consumed samples: 40890880 | consumed tokens: 83744522240 | elapsed time per iteration (s): 0.08 | learning rate: 2.284E-05 | global batch size: 256 | lm loss: 4.492633E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.729 | TFLOPs: 11.99 | 7: iteration 159740/ 173500 | consumed samples: 40893440 | consumed tokens: 83749765120 | elapsed time per iteration (s): 0.09 | learning rate: 2.284E-05 | global batch size: 256 | lm loss: 4.494133E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.417 | TFLOPs: 11.07 | 7: iteration 159750/ 173500 | consumed samples: 40896000 | consumed tokens: 83755008000 | elapsed time per iteration (s): 0.08 | learning rate: 2.283E-05 | global batch size: 256 | lm loss: 4.492228E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.016 | TFLOPs: 11.92 | 7: iteration 159760/ 173500 | consumed samples: 40898560 | consumed tokens: 83760250880 | elapsed time per iteration (s): 0.08 | learning rate: 2.283E-05 | global batch size: 256 | lm loss: 4.502972E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.113 | TFLOPs: 12.03 | 7: iteration 159770/ 173500 | consumed samples: 40901120 | consumed tokens: 83765493760 | elapsed time per iteration (s): 0.08 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 4.511257E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.730 | TFLOPs: 11.91 | 7: iteration 159780/ 173500 | consumed samples: 40903680 | consumed tokens: 83770736640 | elapsed time per iteration (s): 0.08 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 4.500228E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.520 | TFLOPs: 11.85 | 7: iteration 159790/ 173500 | consumed samples: 40906240 | consumed tokens: 83775979520 | elapsed time per iteration (s): 0.08 | learning rate: 2.281E-05 | global batch size: 256 | lm loss: 4.510361E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.657 | TFLOPs: 11.95 | 7: iteration 159800/ 173500 | consumed samples: 40908800 | consumed tokens: 83781222400 | elapsed time per iteration (s): 0.08 | learning rate: 2.281E-05 | global batch size: 256 | lm loss: 4.498492E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.082 | TFLOPs: 11.87 | 7: iteration 159810/ 173500 | consumed samples: 40911360 | consumed tokens: 83786465280 | elapsed time per iteration (s): 0.08 | learning rate: 2.281E-05 | global batch size: 256 | lm loss: 4.511685E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.793 | TFLOPs: 11.89 | 7: iteration 159820/ 173500 | consumed samples: 40913920 | consumed tokens: 83791708160 | elapsed time per iteration (s): 0.08 | learning rate: 2.280E-05 | global batch size: 256 | lm loss: 4.495684E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.638 | TFLOPs: 12.00 | 7: iteration 159830/ 173500 | consumed samples: 40916480 | consumed tokens: 83796951040 | elapsed time per iteration (s): 0.08 | learning rate: 2.280E-05 | global batch size: 256 | lm loss: 4.502152E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.693 | TFLOPs: 12.01 | 7: iteration 159840/ 173500 | consumed samples: 40919040 | consumed tokens: 83802193920 | elapsed time per iteration (s): 0.08 | learning rate: 2.279E-05 | global batch size: 256 | lm loss: 4.495039E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.820 | TFLOPs: 11.96 | 7: iteration 159850/ 173500 | consumed samples: 40921600 | consumed tokens: 83807436800 | elapsed time per iteration (s): 0.08 | learning rate: 2.279E-05 | global batch size: 256 | lm loss: 4.511377E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.281 | TFLOPs: 11.61 | 7: iteration 159860/ 173500 | consumed samples: 40924160 | consumed tokens: 83812679680 | elapsed time per iteration (s): 0.08 | learning rate: 2.279E-05 | global batch size: 256 | lm loss: 4.514559E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.572 | TFLOPs: 11.45 | 7: iteration 159870/ 173500 | consumed samples: 40926720 | consumed tokens: 83817922560 | elapsed time per iteration (s): 0.08 | learning rate: 2.278E-05 | global batch size: 256 | lm loss: 4.500647E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.485 | TFLOPs: 11.96 | 7: iteration 159880/ 173500 | consumed samples: 40929280 | consumed tokens: 83823165440 | elapsed time per iteration (s): 0.08 | learning rate: 2.278E-05 | global batch size: 256 | lm loss: 4.498933E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.334 | TFLOPs: 11.21 | 7: iteration 159890/ 173500 | consumed samples: 40931840 | consumed tokens: 83828408320 | elapsed time per iteration (s): 0.08 | learning rate: 2.277E-05 | global batch size: 256 | lm loss: 4.509228E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.332 | TFLOPs: 11.75 | 7: iteration 159900/ 173500 | consumed samples: 40934400 | consumed tokens: 83833651200 | elapsed time per iteration (s): 0.08 | learning rate: 2.277E-05 | global batch size: 256 | lm loss: 4.497998E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.185 | TFLOPs: 11.22 | 7: iteration 159910/ 173500 | consumed samples: 40936960 | consumed tokens: 83838894080 | elapsed time per iteration (s): 0.08 | learning rate: 2.277E-05 | global batch size: 256 | lm loss: 4.499648E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3013.955 | TFLOPs: 11.21 | 7: iteration 159920/ 173500 | consumed samples: 40939520 | consumed tokens: 83844136960 | elapsed time per iteration (s): 0.09 | learning rate: 2.276E-05 | global batch size: 256 | lm loss: 4.512345E+00 | grad norm: 0.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2735.171 | TFLOPs: 10.17 | 7: iteration 159930/ 173500 | consumed samples: 40942080 | consumed tokens: 83849379840 | elapsed time per iteration (s): 0.08 | learning rate: 2.276E-05 | global batch size: 256 | lm loss: 4.504473E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.114 | TFLOPs: 11.96 | 7: iteration 159940/ 173500 | consumed samples: 40944640 | consumed tokens: 83854622720 | elapsed time per iteration (s): 0.08 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 4.509838E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.083 | TFLOPs: 11.95 | 7: iteration 159950/ 173500 | consumed samples: 40947200 | consumed tokens: 83859865600 | elapsed time per iteration (s): 0.08 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 4.511056E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.057 | TFLOPs: 11.88 | 7: iteration 159960/ 173500 | consumed samples: 40949760 | consumed tokens: 83865108480 | elapsed time per iteration (s): 0.08 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 4.500598E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.848 | TFLOPs: 11.72 | 7: iteration 159970/ 173500 | consumed samples: 40952320 | consumed tokens: 83870351360 | elapsed time per iteration (s): 0.08 | learning rate: 2.274E-05 | global batch size: 256 | lm loss: 4.504472E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.989 | TFLOPs: 11.85 | 7: iteration 159980/ 173500 | consumed samples: 40954880 | consumed tokens: 83875594240 | elapsed time per iteration (s): 0.08 | learning rate: 2.274E-05 | global batch size: 256 | lm loss: 4.514401E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.276 | TFLOPs: 11.86 | 7: iteration 159990/ 173500 | consumed samples: 40957440 | consumed tokens: 83880837120 | elapsed time per iteration (s): 0.08 | learning rate: 2.273E-05 | global batch size: 256 | lm loss: 4.501839E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.090 | TFLOPs: 11.83 | 0: [2023-03-17 04:10:04,193] [INFO] [logging.py:68:log_dist] [Rank 0] step=160000, skipped=0, lr=[2.2729831288017337e-05, 2.2729831288017337e-05, 2.2729831288017337e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 160000/ 173500 | consumed samples: 40960000 | consumed tokens: 83886080000 | elapsed time per iteration (s): 0.08 | learning rate: 2.273E-05 | global batch size: 256 | lm loss: 4.506290E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.570 | TFLOPs: 11.87 | 0: steps: 160000 loss: 4.5204 iter time (s): 0.083 samples/sec: 3078.980 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 160000 | lm loss value: 4.398004E+00 | lm loss PPL: 8.128846E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 160000 to checkpoints_14m91b100m 0: [2023-03-17 04:10:04,250] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step160000 is begin to save! 0: [2023-03-17 04:10:04,253] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:10:04,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:10:04,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:10:04,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:10:04,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:10:04,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:10:04,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:10:04,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:10:04,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:10:04,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:10:04,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:10:04,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:10:04,292] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step160000/mp_rank_00_model_states.pt 0: [2023-03-17 04:10:04,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:10:04,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,321] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,321] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,322] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 04:10:04,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 7: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 3: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 2: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 1: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 5: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 6: [2023-03-17 04:10:04,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step160000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 4: [2023-03-17 04:10:04,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step160000 is ready now! 0: successfully saved checkpoint at iteration 160000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.50 7: iteration 160010/ 173500 | consumed samples: 40962560 | consumed tokens: 83891322880 | elapsed time per iteration (s): 0.09 | learning rate: 2.273E-05 | global batch size: 256 | lm loss: 4.517637E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.070 | TFLOPs: 10.33 | 7: iteration 160020/ 173500 | consumed samples: 40965120 | consumed tokens: 83896565760 | elapsed time per iteration (s): 0.08 | learning rate: 2.272E-05 | global batch size: 256 | lm loss: 4.510600E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.234 | TFLOPs: 12.04 | 7: iteration 160030/ 173500 | consumed samples: 40967680 | consumed tokens: 83901808640 | elapsed time per iteration (s): 0.08 | learning rate: 2.272E-05 | global batch size: 256 | lm loss: 4.497744E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.850 | TFLOPs: 12.02 | 7: iteration 160040/ 173500 | consumed samples: 40970240 | consumed tokens: 83907051520 | elapsed time per iteration (s): 0.08 | learning rate: 2.271E-05 | global batch size: 256 | lm loss: 4.527056E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.038 | TFLOPs: 12.02 | 7: iteration 160050/ 173500 | consumed samples: 40972800 | consumed tokens: 83912294400 | elapsed time per iteration (s): 0.08 | learning rate: 2.271E-05 | global batch size: 256 | lm loss: 4.505069E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.671 | TFLOPs: 11.99 | 7: iteration 160060/ 173500 | consumed samples: 40975360 | consumed tokens: 83917537280 | elapsed time per iteration (s): 0.08 | learning rate: 2.271E-05 | global batch size: 256 | lm loss: 4.501542E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.340 | TFLOPs: 12.02 | 7: iteration 160070/ 173500 | consumed samples: 40977920 | consumed tokens: 83922780160 | elapsed time per iteration (s): 0.08 | learning rate: 2.270E-05 | global batch size: 256 | lm loss: 4.488546E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.021 | TFLOPs: 11.90 | 7: iteration 160080/ 173500 | consumed samples: 40980480 | consumed tokens: 83928023040 | elapsed time per iteration (s): 0.09 | learning rate: 2.270E-05 | global batch size: 256 | lm loss: 4.506233E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2890.368 | TFLOPs: 10.75 | 7: iteration 160090/ 173500 | consumed samples: 40983040 | consumed tokens: 83933265920 | elapsed time per iteration (s): 0.08 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 4.510413E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.987 | TFLOPs: 11.94 | 7: iteration 160100/ 173500 | consumed samples: 40985600 | consumed tokens: 83938508800 | elapsed time per iteration (s): 0.08 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 4.512187E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.404 | TFLOPs: 11.87 | 7: iteration 160110/ 173500 | consumed samples: 40988160 | consumed tokens: 83943751680 | elapsed time per iteration (s): 0.08 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 4.491237E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.611 | TFLOPs: 11.90 | 7: iteration 160120/ 173500 | consumed samples: 40990720 | consumed tokens: 83948994560 | elapsed time per iteration (s): 0.08 | learning rate: 2.268E-05 | global batch size: 256 | lm loss: 4.507029E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.429 | TFLOPs: 11.90 | 7: iteration 160130/ 173500 | consumed samples: 40993280 | consumed tokens: 83954237440 | elapsed time per iteration (s): 0.08 | learning rate: 2.268E-05 | global batch size: 256 | lm loss: 4.504296E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.420 | TFLOPs: 11.87 | 7: iteration 160140/ 173500 | consumed samples: 40995840 | consumed tokens: 83959480320 | elapsed time per iteration (s): 0.08 | learning rate: 2.267E-05 | global batch size: 256 | lm loss: 4.501157E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.964 | TFLOPs: 11.85 | 7: iteration 160150/ 173500 | consumed samples: 40998400 | consumed tokens: 83964723200 | elapsed time per iteration (s): 0.08 | learning rate: 2.267E-05 | global batch size: 256 | lm loss: 4.500483E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.051 | TFLOPs: 11.85 | 7: iteration 160160/ 173500 | consumed samples: 41000960 | consumed tokens: 83969966080 | elapsed time per iteration (s): 0.08 | learning rate: 2.267E-05 | global batch size: 256 | lm loss: 4.504745E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.876 | TFLOPs: 11.59 | 7: iteration 160170/ 173500 | consumed samples: 41003520 | consumed tokens: 83975208960 | elapsed time per iteration (s): 0.08 | learning rate: 2.266E-05 | global batch size: 256 | lm loss: 4.502878E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.847 | TFLOPs: 11.88 | 7: iteration 160180/ 173500 | consumed samples: 41006080 | consumed tokens: 83980451840 | elapsed time per iteration (s): 0.08 | learning rate: 2.266E-05 | global batch size: 256 | lm loss: 4.514340E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.692 | TFLOPs: 11.88 | 7: iteration 160190/ 173500 | consumed samples: 41008640 | consumed tokens: 83985694720 | elapsed time per iteration (s): 0.08 | learning rate: 2.265E-05 | global batch size: 256 | lm loss: 4.514024E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.258 | TFLOPs: 11.89 | 7: iteration 160200/ 173500 | consumed samples: 41011200 | consumed tokens: 83990937600 | elapsed time per iteration (s): 0.08 | learning rate: 2.265E-05 | global batch size: 256 | lm loss: 4.502365E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.991 | TFLOPs: 11.90 | 7: iteration 160210/ 173500 | consumed samples: 41013760 | consumed tokens: 83996180480 | elapsed time per iteration (s): 0.08 | learning rate: 2.265E-05 | global batch size: 256 | lm loss: 4.502594E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.216 | TFLOPs: 11.87 | 7: iteration 160220/ 173500 | consumed samples: 41016320 | consumed tokens: 84001423360 | elapsed time per iteration (s): 0.08 | learning rate: 2.264E-05 | global batch size: 256 | lm loss: 4.502094E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.737 | TFLOPs: 11.91 | 7: iteration 160230/ 173500 | consumed samples: 41018880 | consumed tokens: 84006666240 | elapsed time per iteration (s): 0.08 | learning rate: 2.264E-05 | global batch size: 256 | lm loss: 4.497367E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.508 | TFLOPs: 11.89 | 7: iteration 160240/ 173500 | consumed samples: 41021440 | consumed tokens: 84011909120 | elapsed time per iteration (s): 0.08 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 4.511997E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.247 | TFLOPs: 11.89 | 7: iteration 160250/ 173500 | consumed samples: 41024000 | consumed tokens: 84017152000 | elapsed time per iteration (s): 0.08 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 4.504303E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.872 | TFLOPs: 11.84 | 7: iteration 160260/ 173500 | consumed samples: 41026560 | consumed tokens: 84022394880 | elapsed time per iteration (s): 0.08 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 4.502192E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.369 | TFLOPs: 11.83 | 7: iteration 160270/ 173500 | consumed samples: 41029120 | consumed tokens: 84027637760 | elapsed time per iteration (s): 0.08 | learning rate: 2.262E-05 | global batch size: 256 | lm loss: 4.503815E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.083 | TFLOPs: 11.84 | 7: iteration 160280/ 173500 | consumed samples: 41031680 | consumed tokens: 84032880640 | elapsed time per iteration (s): 0.08 | learning rate: 2.262E-05 | global batch size: 256 | lm loss: 4.510007E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.664 | TFLOPs: 11.86 | 7: iteration 160290/ 173500 | consumed samples: 41034240 | consumed tokens: 84038123520 | elapsed time per iteration (s): 0.08 | learning rate: 2.261E-05 | global batch size: 256 | lm loss: 4.516327E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.494 | TFLOPs: 11.85 | 7: iteration 160300/ 173500 | consumed samples: 41036800 | consumed tokens: 84043366400 | elapsed time per iteration (s): 0.08 | learning rate: 2.261E-05 | global batch size: 256 | lm loss: 4.496544E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.955 | TFLOPs: 11.81 | 7: iteration 160310/ 173500 | consumed samples: 41039360 | consumed tokens: 84048609280 | elapsed time per iteration (s): 0.08 | learning rate: 2.261E-05 | global batch size: 256 | lm loss: 4.506420E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.508 | TFLOPs: 11.83 | 7: iteration 160320/ 173500 | consumed samples: 41041920 | consumed tokens: 84053852160 | elapsed time per iteration (s): 0.08 | learning rate: 2.260E-05 | global batch size: 256 | lm loss: 4.506371E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.650 | TFLOPs: 11.69 | 7: iteration 160330/ 173500 | consumed samples: 41044480 | consumed tokens: 84059095040 | elapsed time per iteration (s): 0.08 | learning rate: 2.260E-05 | global batch size: 256 | lm loss: 4.520335E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.468 | TFLOPs: 11.82 | 7: iteration 160340/ 173500 | consumed samples: 41047040 | consumed tokens: 84064337920 | elapsed time per iteration (s): 0.08 | learning rate: 2.259E-05 | global batch size: 256 | lm loss: 4.503887E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.402 | TFLOPs: 11.83 | 7: iteration 160350/ 173500 | consumed samples: 41049600 | consumed tokens: 84069580800 | elapsed time per iteration (s): 0.08 | learning rate: 2.259E-05 | global batch size: 256 | lm loss: 4.508144E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.597 | TFLOPs: 11.85 | 7: iteration 160360/ 173500 | consumed samples: 41052160 | consumed tokens: 84074823680 | elapsed time per iteration (s): 0.08 | learning rate: 2.259E-05 | global batch size: 256 | lm loss: 4.505740E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.463 | TFLOPs: 11.65 | 7: iteration 160370/ 173500 | consumed samples: 41054720 | consumed tokens: 84080066560 | elapsed time per iteration (s): 0.08 | learning rate: 2.258E-05 | global batch size: 256 | lm loss: 4.508263E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.850 | TFLOPs: 11.78 | 7: iteration 160380/ 173500 | consumed samples: 41057280 | consumed tokens: 84085309440 | elapsed time per iteration (s): 0.08 | learning rate: 2.258E-05 | global batch size: 256 | lm loss: 4.502170E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.098 | TFLOPs: 11.81 | 7: iteration 160390/ 173500 | consumed samples: 41059840 | consumed tokens: 84090552320 | elapsed time per iteration (s): 0.08 | learning rate: 2.258E-05 | global batch size: 256 | lm loss: 4.498363E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.000 | TFLOPs: 11.81 | 7: iteration 160400/ 173500 | consumed samples: 41062400 | consumed tokens: 84095795200 | elapsed time per iteration (s): 0.08 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 4.500064E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.746 | TFLOPs: 11.84 | 7: iteration 160410/ 173500 | consumed samples: 41064960 | consumed tokens: 84101038080 | elapsed time per iteration (s): 0.08 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 4.508197E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.754 | TFLOPs: 11.86 | 7: iteration 160420/ 173500 | consumed samples: 41067520 | consumed tokens: 84106280960 | elapsed time per iteration (s): 0.08 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 4.504401E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.858 | TFLOPs: 11.85 | 7: iteration 160430/ 173500 | consumed samples: 41070080 | consumed tokens: 84111523840 | elapsed time per iteration (s): 0.08 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 4.500399E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.314 | TFLOPs: 11.88 | 7: iteration 160440/ 173500 | consumed samples: 41072640 | consumed tokens: 84116766720 | elapsed time per iteration (s): 0.08 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 4.506189E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.083 | TFLOPs: 11.84 | 7: iteration 160450/ 173500 | consumed samples: 41075200 | consumed tokens: 84122009600 | elapsed time per iteration (s): 0.08 | learning rate: 2.255E-05 | global batch size: 256 | lm loss: 4.496856E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.755 | TFLOPs: 11.83 | 7: iteration 160460/ 173500 | consumed samples: 41077760 | consumed tokens: 84127252480 | elapsed time per iteration (s): 0.08 | learning rate: 2.255E-05 | global batch size: 256 | lm loss: 4.508360E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.169 | TFLOPs: 11.80 | 7: iteration 160470/ 173500 | consumed samples: 41080320 | consumed tokens: 84132495360 | elapsed time per iteration (s): 0.08 | learning rate: 2.254E-05 | global batch size: 256 | lm loss: 4.499984E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.347 | TFLOPs: 11.82 | 7: iteration 160480/ 173500 | consumed samples: 41082880 | consumed tokens: 84137738240 | elapsed time per iteration (s): 0.08 | learning rate: 2.254E-05 | global batch size: 256 | lm loss: 4.497527E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.017 | TFLOPs: 11.75 | 7: iteration 160490/ 173500 | consumed samples: 41085440 | consumed tokens: 84142981120 | elapsed time per iteration (s): 0.08 | learning rate: 2.254E-05 | global batch size: 256 | lm loss: 4.500376E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.223 | TFLOPs: 11.55 | 7: iteration 160500/ 173500 | consumed samples: 41088000 | consumed tokens: 84148224000 | elapsed time per iteration (s): 0.08 | learning rate: 2.253E-05 | global batch size: 256 | lm loss: 4.508959E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3143.254 | TFLOPs: 11.69 | 7: iteration 160510/ 173500 | consumed samples: 41090560 | consumed tokens: 84153466880 | elapsed time per iteration (s): 0.08 | learning rate: 2.253E-05 | global batch size: 256 | lm loss: 4.505960E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.275 | TFLOPs: 11.82 | 7: iteration 160520/ 173500 | consumed samples: 41093120 | consumed tokens: 84158709760 | elapsed time per iteration (s): 0.08 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 4.511484E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.150 | TFLOPs: 11.81 | 7: iteration 160530/ 173500 | consumed samples: 41095680 | consumed tokens: 84163952640 | elapsed time per iteration (s): 0.08 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 4.497010E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.466 | TFLOPs: 11.84 | 7: iteration 160540/ 173500 | consumed samples: 41098240 | consumed tokens: 84169195520 | elapsed time per iteration (s): 0.08 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 4.505006E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.037 | TFLOPs: 11.85 | 7: iteration 160550/ 173500 | consumed samples: 41100800 | consumed tokens: 84174438400 | elapsed time per iteration (s): 0.08 | learning rate: 2.251E-05 | global batch size: 256 | lm loss: 4.500004E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.153 | TFLOPs: 11.85 | 7: iteration 160560/ 173500 | consumed samples: 41103360 | consumed tokens: 84179681280 | elapsed time per iteration (s): 0.08 | learning rate: 2.251E-05 | global batch size: 256 | lm loss: 4.507831E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.721 | TFLOPs: 11.85 | 7: iteration 160570/ 173500 | consumed samples: 41105920 | consumed tokens: 84184924160 | elapsed time per iteration (s): 0.08 | learning rate: 2.251E-05 | global batch size: 256 | lm loss: 4.491795E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.404 | TFLOPs: 11.83 | 7: iteration 160580/ 173500 | consumed samples: 41108480 | consumed tokens: 84190167040 | elapsed time per iteration (s): 0.08 | learning rate: 2.250E-05 | global batch size: 256 | lm loss: 4.495433E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.329 | TFLOPs: 11.86 | 7: iteration 160590/ 173500 | consumed samples: 41111040 | consumed tokens: 84195409920 | elapsed time per iteration (s): 0.08 | learning rate: 2.250E-05 | global batch size: 256 | lm loss: 4.515166E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.765 | TFLOPs: 11.86 | 7: iteration 160600/ 173500 | consumed samples: 41113600 | consumed tokens: 84200652800 | elapsed time per iteration (s): 0.08 | learning rate: 2.249E-05 | global batch size: 256 | lm loss: 4.516696E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.424 | TFLOPs: 11.85 | 7: iteration 160610/ 173500 | consumed samples: 41116160 | consumed tokens: 84205895680 | elapsed time per iteration (s): 0.08 | learning rate: 2.249E-05 | global batch size: 256 | lm loss: 4.497361E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3055.551 | TFLOPs: 11.37 | 7: iteration 160620/ 173500 | consumed samples: 41118720 | consumed tokens: 84211138560 | elapsed time per iteration (s): 0.09 | learning rate: 2.249E-05 | global batch size: 256 | lm loss: 4.512986E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2872.846 | TFLOPs: 10.69 | 7: iteration 160630/ 173500 | consumed samples: 41121280 | consumed tokens: 84216381440 | elapsed time per iteration (s): 0.09 | learning rate: 2.248E-05 | global batch size: 256 | lm loss: 4.502185E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2959.928 | TFLOPs: 11.01 | 7: iteration 160640/ 173500 | consumed samples: 41123840 | consumed tokens: 84221624320 | elapsed time per iteration (s): 0.08 | learning rate: 2.248E-05 | global batch size: 256 | lm loss: 4.507143E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.062 | TFLOPs: 11.86 | 7: iteration 160650/ 173500 | consumed samples: 41126400 | consumed tokens: 84226867200 | elapsed time per iteration (s): 0.08 | learning rate: 2.247E-05 | global batch size: 256 | lm loss: 4.507449E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.681 | TFLOPs: 11.81 | 7: iteration 160660/ 173500 | consumed samples: 41128960 | consumed tokens: 84232110080 | elapsed time per iteration (s): 0.08 | learning rate: 2.247E-05 | global batch size: 256 | lm loss: 4.505568E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.488 | TFLOPs: 11.81 | 7: iteration 160670/ 173500 | consumed samples: 41131520 | consumed tokens: 84237352960 | elapsed time per iteration (s): 0.08 | learning rate: 2.247E-05 | global batch size: 256 | lm loss: 4.507911E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.911 | TFLOPs: 11.79 | 7: iteration 160680/ 173500 | consumed samples: 41134080 | consumed tokens: 84242595840 | elapsed time per iteration (s): 0.08 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 4.512505E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.889 | TFLOPs: 11.82 | 7: iteration 160690/ 173500 | consumed samples: 41136640 | consumed tokens: 84247838720 | elapsed time per iteration (s): 0.08 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 4.498587E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.060 | TFLOPs: 11.85 | 7: iteration 160700/ 173500 | consumed samples: 41139200 | consumed tokens: 84253081600 | elapsed time per iteration (s): 0.08 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 4.515944E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.186 | TFLOPs: 11.86 | 7: iteration 160710/ 173500 | consumed samples: 41141760 | consumed tokens: 84258324480 | elapsed time per iteration (s): 0.08 | learning rate: 2.245E-05 | global batch size: 256 | lm loss: 4.499460E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.955 | TFLOPs: 11.79 | 7: iteration 160720/ 173500 | consumed samples: 41144320 | consumed tokens: 84263567360 | elapsed time per iteration (s): 0.08 | learning rate: 2.245E-05 | global batch size: 256 | lm loss: 4.501833E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.120 | TFLOPs: 11.85 | 7: iteration 160730/ 173500 | consumed samples: 41146880 | consumed tokens: 84268810240 | elapsed time per iteration (s): 0.08 | learning rate: 2.244E-05 | global batch size: 256 | lm loss: 4.497970E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.039 | TFLOPs: 11.85 | 7: iteration 160740/ 173500 | consumed samples: 41149440 | consumed tokens: 84274053120 | elapsed time per iteration (s): 0.08 | learning rate: 2.244E-05 | global batch size: 256 | lm loss: 4.497739E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.988 | TFLOPs: 11.49 | 7: iteration 160750/ 173500 | consumed samples: 41152000 | consumed tokens: 84279296000 | elapsed time per iteration (s): 0.08 | learning rate: 2.244E-05 | global batch size: 256 | lm loss: 4.502638E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.867 | TFLOPs: 11.85 | 7: iteration 160760/ 173500 | consumed samples: 41154560 | consumed tokens: 84284538880 | elapsed time per iteration (s): 0.08 | learning rate: 2.243E-05 | global batch size: 256 | lm loss: 4.516412E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.130 | TFLOPs: 11.87 | 7: iteration 160770/ 173500 | consumed samples: 41157120 | consumed tokens: 84289781760 | elapsed time per iteration (s): 0.08 | learning rate: 2.243E-05 | global batch size: 256 | lm loss: 4.499469E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.943 | TFLOPs: 11.87 | 7: iteration 160780/ 173500 | consumed samples: 41159680 | consumed tokens: 84295024640 | elapsed time per iteration (s): 0.08 | learning rate: 2.242E-05 | global batch size: 256 | lm loss: 4.507395E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.765 | TFLOPs: 11.83 | 7: iteration 160790/ 173500 | consumed samples: 41162240 | consumed tokens: 84300267520 | elapsed time per iteration (s): 0.08 | learning rate: 2.242E-05 | global batch size: 256 | lm loss: 4.501648E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.486 | TFLOPs: 11.74 | 7: iteration 160800/ 173500 | consumed samples: 41164800 | consumed tokens: 84305510400 | elapsed time per iteration (s): 0.08 | learning rate: 2.242E-05 | global batch size: 256 | lm loss: 4.512470E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.698 | TFLOPs: 11.86 | 7: iteration 160810/ 173500 | consumed samples: 41167360 | consumed tokens: 84310753280 | elapsed time per iteration (s): 0.08 | learning rate: 2.241E-05 | global batch size: 256 | lm loss: 4.497967E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.550 | TFLOPs: 11.82 | 7: iteration 160820/ 173500 | consumed samples: 41169920 | consumed tokens: 84315996160 | elapsed time per iteration (s): 0.08 | learning rate: 2.241E-05 | global batch size: 256 | lm loss: 4.512360E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.594 | TFLOPs: 11.75 | 7: iteration 160830/ 173500 | consumed samples: 41172480 | consumed tokens: 84321239040 | elapsed time per iteration (s): 0.08 | learning rate: 2.241E-05 | global batch size: 256 | lm loss: 4.499780E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.499 | TFLOPs: 11.86 | 7: iteration 160840/ 173500 | consumed samples: 41175040 | consumed tokens: 84326481920 | elapsed time per iteration (s): 0.08 | learning rate: 2.240E-05 | global batch size: 256 | lm loss: 4.496382E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.216 | TFLOPs: 11.80 | 7: iteration 160850/ 173500 | consumed samples: 41177600 | consumed tokens: 84331724800 | elapsed time per iteration (s): 0.08 | learning rate: 2.240E-05 | global batch size: 256 | lm loss: 4.502731E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.278 | TFLOPs: 11.87 | 7: iteration 160860/ 173500 | consumed samples: 41180160 | consumed tokens: 84336967680 | elapsed time per iteration (s): 0.08 | learning rate: 2.239E-05 | global batch size: 256 | lm loss: 4.512210E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.930 | TFLOPs: 11.85 | 7: iteration 160870/ 173500 | consumed samples: 41182720 | consumed tokens: 84342210560 | elapsed time per iteration (s): 0.08 | learning rate: 2.239E-05 | global batch size: 256 | lm loss: 4.490070E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.339 | TFLOPs: 11.79 | 7: iteration 160880/ 173500 | consumed samples: 41185280 | consumed tokens: 84347453440 | elapsed time per iteration (s): 0.10 | learning rate: 2.239E-05 | global batch size: 256 | lm loss: 4.515852E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2557.036 | TFLOPs: 9.51 | 7: iteration 160890/ 173500 | consumed samples: 41187840 | consumed tokens: 84352696320 | elapsed time per iteration (s): 0.09 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 4.489824E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.523 | TFLOPs: 10.54 | 7: iteration 160900/ 173500 | consumed samples: 41190400 | consumed tokens: 84357939200 | elapsed time per iteration (s): 0.09 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 4.509810E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2794.171 | TFLOPs: 10.39 | 7: iteration 160910/ 173500 | consumed samples: 41192960 | consumed tokens: 84363182080 | elapsed time per iteration (s): 0.11 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 4.500982E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2375.634 | TFLOPs: 8.84 | 7: iteration 160920/ 173500 | consumed samples: 41195520 | consumed tokens: 84368424960 | elapsed time per iteration (s): 0.08 | learning rate: 2.237E-05 | global batch size: 256 | lm loss: 4.512545E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.857 | TFLOPs: 11.67 | 7: iteration 160930/ 173500 | consumed samples: 41198080 | consumed tokens: 84373667840 | elapsed time per iteration (s): 0.08 | learning rate: 2.237E-05 | global batch size: 256 | lm loss: 4.491877E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.128 | TFLOPs: 11.53 | 7: iteration 160940/ 173500 | consumed samples: 41200640 | consumed tokens: 84378910720 | elapsed time per iteration (s): 0.08 | learning rate: 2.236E-05 | global batch size: 256 | lm loss: 4.497175E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.252 | TFLOPs: 11.64 | 7: iteration 160950/ 173500 | consumed samples: 41203200 | consumed tokens: 84384153600 | elapsed time per iteration (s): 0.08 | learning rate: 2.236E-05 | global batch size: 256 | lm loss: 4.500988E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.748 | TFLOPs: 11.79 | 7: iteration 160960/ 173500 | consumed samples: 41205760 | consumed tokens: 84389396480 | elapsed time per iteration (s): 0.08 | learning rate: 2.236E-05 | global batch size: 256 | lm loss: 4.501122E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.555 | TFLOPs: 11.74 | 7: iteration 160970/ 173500 | consumed samples: 41208320 | consumed tokens: 84394639360 | elapsed time per iteration (s): 0.09 | learning rate: 2.235E-05 | global batch size: 256 | lm loss: 4.511501E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2795.731 | TFLOPs: 10.40 | 7: iteration 160980/ 173500 | consumed samples: 41210880 | consumed tokens: 84399882240 | elapsed time per iteration (s): 0.10 | learning rate: 2.235E-05 | global batch size: 256 | lm loss: 4.513489E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.496 | TFLOPs: 9.61 | 7: iteration 160990/ 173500 | consumed samples: 41213440 | consumed tokens: 84405125120 | elapsed time per iteration (s): 0.10 | learning rate: 2.235E-05 | global batch size: 256 | lm loss: 4.508306E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2490.240 | TFLOPs: 9.26 | 7: iteration 161000/ 173500 | consumed samples: 41216000 | consumed tokens: 84410368000 | elapsed time per iteration (s): 0.10 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 4.491964E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2530.197 | TFLOPs: 9.41 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 161000 | lm loss value: 4.391460E+00 | lm loss PPL: 8.075824E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 161000 to checkpoints_14m91b100m 0: [2023-03-17 04:11:26,530] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step161000 is begin to save! 0: [2023-03-17 04:11:26,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:11:26,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:11:26,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:11:26,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:11:26,563] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:11:26,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:11:26,566] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:11:26,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:11:26,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:11:26,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:11:26,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:11:26,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:11:26,573] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step161000/mp_rank_00_model_states.pt 0: [2023-03-17 04:11:26,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:11:26,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:11:26,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:11:26,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 6: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:11:26,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 1: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 5: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 2: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 4: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 3: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 7: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:11:26,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step161000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:11:26,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step161000 is ready now! 0: successfully saved checkpoint at iteration 161000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 83.98 7: iteration 161010/ 173500 | consumed samples: 41218560 | consumed tokens: 84415610880 | elapsed time per iteration (s): 0.11 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 4.501433E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2329.353 | TFLOPs: 8.66 | 7: iteration 161020/ 173500 | consumed samples: 41221120 | consumed tokens: 84420853760 | elapsed time per iteration (s): 0.08 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 4.507340E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.263 | TFLOPs: 11.98 | 7: iteration 161030/ 173500 | consumed samples: 41223680 | consumed tokens: 84426096640 | elapsed time per iteration (s): 0.08 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 4.502832E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.924 | TFLOPs: 12.01 | 7: iteration 161040/ 173500 | consumed samples: 41226240 | consumed tokens: 84431339520 | elapsed time per iteration (s): 0.08 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 4.495792E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.373 | TFLOPs: 11.83 | 7: iteration 161050/ 173500 | consumed samples: 41228800 | consumed tokens: 84436582400 | elapsed time per iteration (s): 0.08 | learning rate: 2.232E-05 | global batch size: 256 | lm loss: 4.498973E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.801 | TFLOPs: 11.94 | 7: iteration 161060/ 173500 | consumed samples: 41231360 | consumed tokens: 84441825280 | elapsed time per iteration (s): 0.08 | learning rate: 2.232E-05 | global batch size: 256 | lm loss: 4.511829E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.142 | TFLOPs: 11.92 | 7: iteration 161070/ 173500 | consumed samples: 41233920 | consumed tokens: 84447068160 | elapsed time per iteration (s): 0.08 | learning rate: 2.232E-05 | global batch size: 256 | lm loss: 4.499263E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.542 | TFLOPs: 11.89 | 7: iteration 161080/ 173500 | consumed samples: 41236480 | consumed tokens: 84452311040 | elapsed time per iteration (s): 0.08 | learning rate: 2.231E-05 | global batch size: 256 | lm loss: 4.489803E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.297 | TFLOPs: 11.96 | 7: iteration 161090/ 173500 | consumed samples: 41239040 | consumed tokens: 84457553920 | elapsed time per iteration (s): 0.08 | learning rate: 2.231E-05 | global batch size: 256 | lm loss: 4.495292E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.051 | TFLOPs: 11.96 | 7: iteration 161100/ 173500 | consumed samples: 41241600 | consumed tokens: 84462796800 | elapsed time per iteration (s): 0.08 | learning rate: 2.230E-05 | global batch size: 256 | lm loss: 4.499933E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.390 | TFLOPs: 11.89 | 7: iteration 161110/ 173500 | consumed samples: 41244160 | consumed tokens: 84468039680 | elapsed time per iteration (s): 0.08 | learning rate: 2.230E-05 | global batch size: 256 | lm loss: 4.510486E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.070 | TFLOPs: 11.77 | 7: iteration 161120/ 173500 | consumed samples: 41246720 | consumed tokens: 84473282560 | elapsed time per iteration (s): 0.08 | learning rate: 2.230E-05 | global batch size: 256 | lm loss: 4.501000E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.316 | TFLOPs: 11.86 | 7: iteration 161130/ 173500 | consumed samples: 41249280 | consumed tokens: 84478525440 | elapsed time per iteration (s): 0.08 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 4.504282E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.604 | TFLOPs: 11.86 | 7: iteration 161140/ 173500 | consumed samples: 41251840 | consumed tokens: 84483768320 | elapsed time per iteration (s): 0.08 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 4.506400E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.514 | TFLOPs: 11.99 | 7: iteration 161150/ 173500 | consumed samples: 41254400 | consumed tokens: 84489011200 | elapsed time per iteration (s): 0.08 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 4.505860E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.836 | TFLOPs: 11.94 | 7: iteration 161160/ 173500 | consumed samples: 41256960 | consumed tokens: 84494254080 | elapsed time per iteration (s): 0.08 | learning rate: 2.228E-05 | global batch size: 256 | lm loss: 4.507121E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.662 | TFLOPs: 11.97 | 7: iteration 161170/ 173500 | consumed samples: 41259520 | consumed tokens: 84499496960 | elapsed time per iteration (s): 0.08 | learning rate: 2.228E-05 | global batch size: 256 | lm loss: 4.508538E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3160.549 | TFLOPs: 11.76 | 7: iteration 161180/ 173500 | consumed samples: 41262080 | consumed tokens: 84504739840 | elapsed time per iteration (s): 0.08 | learning rate: 2.228E-05 | global batch size: 256 | lm loss: 4.501632E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.883 | TFLOPs: 11.99 | 7: iteration 161190/ 173500 | consumed samples: 41264640 | consumed tokens: 84509982720 | elapsed time per iteration (s): 0.08 | learning rate: 2.227E-05 | global batch size: 256 | lm loss: 4.499098E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.240 | TFLOPs: 11.99 | 7: iteration 161200/ 173500 | consumed samples: 41267200 | consumed tokens: 84515225600 | elapsed time per iteration (s): 0.08 | learning rate: 2.227E-05 | global batch size: 256 | lm loss: 4.496876E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.098 | TFLOPs: 12.00 | 7: iteration 161210/ 173500 | consumed samples: 41269760 | consumed tokens: 84520468480 | elapsed time per iteration (s): 0.08 | learning rate: 2.226E-05 | global batch size: 256 | lm loss: 4.520766E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.023 | TFLOPs: 11.85 | 7: iteration 161220/ 173500 | consumed samples: 41272320 | consumed tokens: 84525711360 | elapsed time per iteration (s): 0.08 | learning rate: 2.226E-05 | global batch size: 256 | lm loss: 4.490427E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.843 | TFLOPs: 12.00 | 7: iteration 161230/ 173500 | consumed samples: 41274880 | consumed tokens: 84530954240 | elapsed time per iteration (s): 0.08 | learning rate: 2.226E-05 | global batch size: 256 | lm loss: 4.508270E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.676 | TFLOPs: 11.74 | 7: iteration 161240/ 173500 | consumed samples: 41277440 | consumed tokens: 84536197120 | elapsed time per iteration (s): 0.08 | learning rate: 2.225E-05 | global batch size: 256 | lm loss: 4.503778E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.034 | TFLOPs: 11.87 | 7: iteration 161250/ 173500 | consumed samples: 41280000 | consumed tokens: 84541440000 | elapsed time per iteration (s): 0.09 | learning rate: 2.225E-05 | global batch size: 256 | lm loss: 4.502647E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.948 | TFLOPs: 11.04 | 7: iteration 161260/ 173500 | consumed samples: 41282560 | consumed tokens: 84546682880 | elapsed time per iteration (s): 0.08 | learning rate: 2.225E-05 | global batch size: 256 | lm loss: 4.497041E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.676 | TFLOPs: 11.99 | 7: iteration 161270/ 173500 | consumed samples: 41285120 | consumed tokens: 84551925760 | elapsed time per iteration (s): 0.08 | learning rate: 2.224E-05 | global batch size: 256 | lm loss: 4.488553E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.532 | TFLOPs: 11.98 | 7: iteration 161280/ 173500 | consumed samples: 41287680 | consumed tokens: 84557168640 | elapsed time per iteration (s): 0.09 | learning rate: 2.224E-05 | global batch size: 256 | lm loss: 4.509741E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.852 | TFLOPs: 10.89 | 7: iteration 161290/ 173500 | consumed samples: 41290240 | consumed tokens: 84562411520 | elapsed time per iteration (s): 0.09 | learning rate: 2.224E-05 | global batch size: 256 | lm loss: 4.506026E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2804.852 | TFLOPs: 10.43 | 7: iteration 161300/ 173500 | consumed samples: 41292800 | consumed tokens: 84567654400 | elapsed time per iteration (s): 0.09 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 4.513556E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2972.171 | TFLOPs: 11.06 | 7: iteration 161310/ 173500 | consumed samples: 41295360 | consumed tokens: 84572897280 | elapsed time per iteration (s): 0.08 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 4.505542E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.000 | TFLOPs: 11.85 | 7: iteration 161320/ 173500 | consumed samples: 41297920 | consumed tokens: 84578140160 | elapsed time per iteration (s): 0.08 | learning rate: 2.222E-05 | global batch size: 256 | lm loss: 4.508409E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.808 | TFLOPs: 11.80 | 7: iteration 161330/ 173500 | consumed samples: 41300480 | consumed tokens: 84583383040 | elapsed time per iteration (s): 0.08 | learning rate: 2.222E-05 | global batch size: 256 | lm loss: 4.498206E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.432 | TFLOPs: 11.84 | 7: iteration 161340/ 173500 | consumed samples: 41303040 | consumed tokens: 84588625920 | elapsed time per iteration (s): 0.11 | learning rate: 2.222E-05 | global batch size: 256 | lm loss: 4.505799E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2391.222 | TFLOPs: 8.89 | 7: iteration 161350/ 173500 | consumed samples: 41305600 | consumed tokens: 84593868800 | elapsed time per iteration (s): 0.13 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 4.511618E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1982.343 | TFLOPs: 7.37 | 7: iteration 161360/ 173500 | consumed samples: 41308160 | consumed tokens: 84599111680 | elapsed time per iteration (s): 0.08 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 4.506607E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.554 | TFLOPs: 11.30 | 7: iteration 161370/ 173500 | consumed samples: 41310720 | consumed tokens: 84604354560 | elapsed time per iteration (s): 0.08 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 4.493544E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.116 | TFLOPs: 12.04 | 7: iteration 161380/ 173500 | consumed samples: 41313280 | consumed tokens: 84609597440 | elapsed time per iteration (s): 0.08 | learning rate: 2.220E-05 | global batch size: 256 | lm loss: 4.495750E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.175 | TFLOPs: 12.02 | 7: iteration 161390/ 173500 | consumed samples: 41315840 | consumed tokens: 84614840320 | elapsed time per iteration (s): 0.08 | learning rate: 2.220E-05 | global batch size: 256 | lm loss: 4.500714E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.128 | TFLOPs: 12.01 | 7: iteration 161400/ 173500 | consumed samples: 41318400 | consumed tokens: 84620083200 | elapsed time per iteration (s): 0.08 | learning rate: 2.220E-05 | global batch size: 256 | lm loss: 4.509408E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.889 | TFLOPs: 12.01 | 7: iteration 161410/ 173500 | consumed samples: 41320960 | consumed tokens: 84625326080 | elapsed time per iteration (s): 0.08 | learning rate: 2.219E-05 | global batch size: 256 | lm loss: 4.510281E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3101.182 | TFLOPs: 11.54 | 7: iteration 161420/ 173500 | consumed samples: 41323520 | consumed tokens: 84630568960 | elapsed time per iteration (s): 0.08 | learning rate: 2.219E-05 | global batch size: 256 | lm loss: 4.498963E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.520 | TFLOPs: 11.95 | 7: iteration 161430/ 173500 | consumed samples: 41326080 | consumed tokens: 84635811840 | elapsed time per iteration (s): 0.08 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 4.510116E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.618 | TFLOPs: 11.97 | 7: iteration 161440/ 173500 | consumed samples: 41328640 | consumed tokens: 84641054720 | elapsed time per iteration (s): 0.08 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 4.513461E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.281 | TFLOPs: 11.93 | 7: iteration 161450/ 173500 | consumed samples: 41331200 | consumed tokens: 84646297600 | elapsed time per iteration (s): 0.08 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 4.504626E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.591 | TFLOPs: 11.95 | 7: iteration 161460/ 173500 | consumed samples: 41333760 | consumed tokens: 84651540480 | elapsed time per iteration (s): 0.08 | learning rate: 2.217E-05 | global batch size: 256 | lm loss: 4.497645E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.332 | TFLOPs: 11.98 | 7: iteration 161470/ 173500 | consumed samples: 41336320 | consumed tokens: 84656783360 | elapsed time per iteration (s): 0.08 | learning rate: 2.217E-05 | global batch size: 256 | lm loss: 4.506335E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.588 | TFLOPs: 11.36 | 7: iteration 161480/ 173500 | consumed samples: 41338880 | consumed tokens: 84662026240 | elapsed time per iteration (s): 0.10 | learning rate: 2.217E-05 | global batch size: 256 | lm loss: 4.493678E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2471.792 | TFLOPs: 9.19 | 7: iteration 161490/ 173500 | consumed samples: 41341440 | consumed tokens: 84667269120 | elapsed time per iteration (s): 0.08 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 4.501813E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.471 | TFLOPs: 11.79 | 7: iteration 161500/ 173500 | consumed samples: 41344000 | consumed tokens: 84672512000 | elapsed time per iteration (s): 0.08 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 4.501445E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.551 | TFLOPs: 11.82 | 7: iteration 161510/ 173500 | consumed samples: 41346560 | consumed tokens: 84677754880 | elapsed time per iteration (s): 0.08 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 4.502982E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.427 | TFLOPs: 11.77 | 7: iteration 161520/ 173500 | consumed samples: 41349120 | consumed tokens: 84682997760 | elapsed time per iteration (s): 0.08 | learning rate: 2.215E-05 | global batch size: 256 | lm loss: 4.498886E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.663 | TFLOPs: 11.80 | 7: iteration 161530/ 173500 | consumed samples: 41351680 | consumed tokens: 84688240640 | elapsed time per iteration (s): 0.08 | learning rate: 2.215E-05 | global batch size: 256 | lm loss: 4.499797E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.482 | TFLOPs: 11.79 | 7: iteration 161540/ 173500 | consumed samples: 41354240 | consumed tokens: 84693483520 | elapsed time per iteration (s): 0.08 | learning rate: 2.214E-05 | global batch size: 256 | lm loss: 4.493610E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.565 | TFLOPs: 11.82 | 7: iteration 161550/ 173500 | consumed samples: 41356800 | consumed tokens: 84698726400 | elapsed time per iteration (s): 0.08 | learning rate: 2.214E-05 | global batch size: 256 | lm loss: 4.502171E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.908 | TFLOPs: 11.85 | 7: iteration 161560/ 173500 | consumed samples: 41359360 | consumed tokens: 84703969280 | elapsed time per iteration (s): 0.08 | learning rate: 2.214E-05 | global batch size: 256 | lm loss: 4.504506E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.102 | TFLOPs: 11.84 | 7: iteration 161570/ 173500 | consumed samples: 41361920 | consumed tokens: 84709212160 | elapsed time per iteration (s): 0.08 | learning rate: 2.213E-05 | global batch size: 256 | lm loss: 4.513292E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.436 | TFLOPs: 11.81 | 7: iteration 161580/ 173500 | consumed samples: 41364480 | consumed tokens: 84714455040 | elapsed time per iteration (s): 0.08 | learning rate: 2.213E-05 | global batch size: 256 | lm loss: 4.503737E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.334 | TFLOPs: 11.71 | 7: iteration 161590/ 173500 | consumed samples: 41367040 | consumed tokens: 84719697920 | elapsed time per iteration (s): 0.08 | learning rate: 2.213E-05 | global batch size: 256 | lm loss: 4.516255E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.095 | TFLOPs: 11.83 | 7: iteration 161600/ 173500 | consumed samples: 41369600 | consumed tokens: 84724940800 | elapsed time per iteration (s): 0.08 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 4.506376E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.717 | TFLOPs: 11.75 | 7: iteration 161610/ 173500 | consumed samples: 41372160 | consumed tokens: 84730183680 | elapsed time per iteration (s): 0.08 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 4.505671E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.020 | TFLOPs: 11.87 | 7: iteration 161620/ 173500 | consumed samples: 41374720 | consumed tokens: 84735426560 | elapsed time per iteration (s): 0.08 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 4.503519E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.730 | TFLOPs: 11.75 | 7: iteration 161630/ 173500 | consumed samples: 41377280 | consumed tokens: 84740669440 | elapsed time per iteration (s): 0.08 | learning rate: 2.211E-05 | global batch size: 256 | lm loss: 4.512440E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.923 | TFLOPs: 11.84 | 7: iteration 161640/ 173500 | consumed samples: 41379840 | consumed tokens: 84745912320 | elapsed time per iteration (s): 0.08 | learning rate: 2.211E-05 | global batch size: 256 | lm loss: 4.498588E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.453 | TFLOPs: 11.75 | 7: iteration 161650/ 173500 | consumed samples: 41382400 | consumed tokens: 84751155200 | elapsed time per iteration (s): 0.08 | learning rate: 2.211E-05 | global batch size: 256 | lm loss: 4.497193E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.975 | TFLOPs: 11.80 | 7: iteration 161660/ 173500 | consumed samples: 41384960 | consumed tokens: 84756398080 | elapsed time per iteration (s): 0.09 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 4.504797E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.315 | TFLOPs: 11.08 | 7: iteration 161670/ 173500 | consumed samples: 41387520 | consumed tokens: 84761640960 | elapsed time per iteration (s): 0.08 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 4.495729E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.132 | TFLOPs: 11.85 | 7: iteration 161680/ 173500 | consumed samples: 41390080 | consumed tokens: 84766883840 | elapsed time per iteration (s): 0.08 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 4.499407E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.331 | TFLOPs: 11.87 | 7: iteration 161690/ 173500 | consumed samples: 41392640 | consumed tokens: 84772126720 | elapsed time per iteration (s): 0.08 | learning rate: 2.209E-05 | global batch size: 256 | lm loss: 4.500687E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.271 | TFLOPs: 11.78 | 7: iteration 161700/ 173500 | consumed samples: 41395200 | consumed tokens: 84777369600 | elapsed time per iteration (s): 0.08 | learning rate: 2.209E-05 | global batch size: 256 | lm loss: 4.502752E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.530 | TFLOPs: 11.70 | 7: iteration 161710/ 173500 | consumed samples: 41397760 | consumed tokens: 84782612480 | elapsed time per iteration (s): 0.08 | learning rate: 2.208E-05 | global batch size: 256 | lm loss: 4.501122E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.464 | TFLOPs: 11.82 | 7: iteration 161720/ 173500 | consumed samples: 41400320 | consumed tokens: 84787855360 | elapsed time per iteration (s): 0.08 | learning rate: 2.208E-05 | global batch size: 256 | lm loss: 4.502407E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.351 | TFLOPs: 11.81 | 7: iteration 161730/ 173500 | consumed samples: 41402880 | consumed tokens: 84793098240 | elapsed time per iteration (s): 0.08 | learning rate: 2.208E-05 | global batch size: 256 | lm loss: 4.499068E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.199 | TFLOPs: 11.70 | 7: iteration 161740/ 173500 | consumed samples: 41405440 | consumed tokens: 84798341120 | elapsed time per iteration (s): 0.08 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 4.516652E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.661 | TFLOPs: 11.80 | 7: iteration 161750/ 173500 | consumed samples: 41408000 | consumed tokens: 84803584000 | elapsed time per iteration (s): 0.08 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 4.508662E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.169 | TFLOPs: 11.85 | 7: iteration 161760/ 173500 | consumed samples: 41410560 | consumed tokens: 84808826880 | elapsed time per iteration (s): 0.08 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 4.502658E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.106 | TFLOPs: 11.86 | 7: iteration 161770/ 173500 | consumed samples: 41413120 | consumed tokens: 84814069760 | elapsed time per iteration (s): 0.08 | learning rate: 2.206E-05 | global batch size: 256 | lm loss: 4.511293E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.157 | TFLOPs: 11.57 | 7: iteration 161780/ 173500 | consumed samples: 41415680 | consumed tokens: 84819312640 | elapsed time per iteration (s): 0.08 | learning rate: 2.206E-05 | global batch size: 256 | lm loss: 4.506486E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.368 | TFLOPs: 11.89 | 7: iteration 161790/ 173500 | consumed samples: 41418240 | consumed tokens: 84824555520 | elapsed time per iteration (s): 0.08 | learning rate: 2.206E-05 | global batch size: 256 | lm loss: 4.506314E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.411 | TFLOPs: 11.72 | 7: iteration 161800/ 173500 | consumed samples: 41420800 | consumed tokens: 84829798400 | elapsed time per iteration (s): 0.08 | learning rate: 2.205E-05 | global batch size: 256 | lm loss: 4.501778E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.546 | TFLOPs: 11.90 | 7: iteration 161810/ 173500 | consumed samples: 41423360 | consumed tokens: 84835041280 | elapsed time per iteration (s): 0.08 | learning rate: 2.205E-05 | global batch size: 256 | lm loss: 4.502948E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.394 | TFLOPs: 11.65 | 7: iteration 161820/ 173500 | consumed samples: 41425920 | consumed tokens: 84840284160 | elapsed time per iteration (s): 0.08 | learning rate: 2.205E-05 | global batch size: 256 | lm loss: 4.503289E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.190 | TFLOPs: 11.44 | 7: iteration 161830/ 173500 | consumed samples: 41428480 | consumed tokens: 84845527040 | elapsed time per iteration (s): 0.08 | learning rate: 2.204E-05 | global batch size: 256 | lm loss: 4.496700E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.861 | TFLOPs: 12.00 | 7: iteration 161840/ 173500 | consumed samples: 41431040 | consumed tokens: 84850769920 | elapsed time per iteration (s): 0.08 | learning rate: 2.204E-05 | global batch size: 256 | lm loss: 4.507094E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.355 | TFLOPs: 11.44 | 7: iteration 161850/ 173500 | consumed samples: 41433600 | consumed tokens: 84856012800 | elapsed time per iteration (s): 0.08 | learning rate: 2.204E-05 | global batch size: 256 | lm loss: 4.505249E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3058.865 | TFLOPs: 11.38 | 7: iteration 161860/ 173500 | consumed samples: 41436160 | consumed tokens: 84861255680 | elapsed time per iteration (s): 0.08 | learning rate: 2.203E-05 | global batch size: 256 | lm loss: 4.506829E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3081.599 | TFLOPs: 11.46 | 7: iteration 161870/ 173500 | consumed samples: 41438720 | consumed tokens: 84866498560 | elapsed time per iteration (s): 0.08 | learning rate: 2.203E-05 | global batch size: 256 | lm loss: 4.502504E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.661 | TFLOPs: 11.87 | 7: iteration 161880/ 173500 | consumed samples: 41441280 | consumed tokens: 84871741440 | elapsed time per iteration (s): 0.08 | learning rate: 2.203E-05 | global batch size: 256 | lm loss: 4.504721E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.360 | TFLOPs: 11.85 | 7: iteration 161890/ 173500 | consumed samples: 41443840 | consumed tokens: 84876984320 | elapsed time per iteration (s): 0.09 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 4.500827E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.575 | TFLOPs: 10.10 | 7: iteration 161900/ 173500 | consumed samples: 41446400 | consumed tokens: 84882227200 | elapsed time per iteration (s): 0.09 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 4.486909E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2922.981 | TFLOPs: 10.87 | 7: iteration 161910/ 173500 | consumed samples: 41448960 | consumed tokens: 84887470080 | elapsed time per iteration (s): 0.09 | learning rate: 2.201E-05 | global batch size: 256 | lm loss: 4.511892E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.267 | TFLOPs: 11.11 | 7: iteration 161920/ 173500 | consumed samples: 41451520 | consumed tokens: 84892712960 | elapsed time per iteration (s): 0.08 | learning rate: 2.201E-05 | global batch size: 256 | lm loss: 4.491550E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.661 | TFLOPs: 11.95 | 7: iteration 161930/ 173500 | consumed samples: 41454080 | consumed tokens: 84897955840 | elapsed time per iteration (s): 0.10 | learning rate: 2.201E-05 | global batch size: 256 | lm loss: 4.515013E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2671.230 | TFLOPs: 9.94 | 7: iteration 161940/ 173500 | consumed samples: 41456640 | consumed tokens: 84903198720 | elapsed time per iteration (s): 0.09 | learning rate: 2.200E-05 | global batch size: 256 | lm loss: 4.514989E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.581 | TFLOPs: 10.50 | 7: iteration 161950/ 173500 | consumed samples: 41459200 | consumed tokens: 84908441600 | elapsed time per iteration (s): 0.08 | learning rate: 2.200E-05 | global batch size: 256 | lm loss: 4.501273E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3042.055 | TFLOPs: 11.32 | 7: iteration 161960/ 173500 | consumed samples: 41461760 | consumed tokens: 84913684480 | elapsed time per iteration (s): 0.09 | learning rate: 2.200E-05 | global batch size: 256 | lm loss: 4.496616E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.900 | TFLOPs: 10.82 | 7: iteration 161970/ 173500 | consumed samples: 41464320 | consumed tokens: 84918927360 | elapsed time per iteration (s): 0.08 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 4.495377E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3077.976 | TFLOPs: 11.45 | 7: iteration 161980/ 173500 | consumed samples: 41466880 | consumed tokens: 84924170240 | elapsed time per iteration (s): 0.08 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 4.491759E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.201 | TFLOPs: 11.62 | 7: iteration 161990/ 173500 | consumed samples: 41469440 | consumed tokens: 84929413120 | elapsed time per iteration (s): 0.08 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 4.507200E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3038.116 | TFLOPs: 11.30 | 0: [2023-03-17 04:12:49,359] [INFO] [logging.py:68:log_dist] [Rank 0] step=162000, skipped=0, lr=[2.1983700493183342e-05, 2.1983700493183342e-05, 2.1983700493183342e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 162000/ 173500 | consumed samples: 41472000 | consumed tokens: 84934656000 | elapsed time per iteration (s): 0.08 | learning rate: 2.198E-05 | global batch size: 256 | lm loss: 4.507229E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.457 | TFLOPs: 11.48 | 0: steps: 162000 loss: 4.5180 iter time (s): 0.082 samples/sec: 3126.569 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 162000 | lm loss value: 4.395362E+00 | lm loss PPL: 8.107400E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 162000 to checkpoints_14m91b100m 0: [2023-03-17 04:12:49,417] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step162000 is begin to save! 0: [2023-03-17 04:12:49,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:12:49,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:12:49,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:12:49,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:12:49,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:12:49,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:12:49,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:12:49,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:12:49,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:12:49,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:12:49,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:12:49,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:12:49,465] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step162000/mp_rank_00_model_states.pt 0: [2023-03-17 04:12:49,465] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:12:49,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:12:49,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:12:49,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 1: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 2: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 3: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 7: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:12:49,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 5: [2023-03-17 04:12:49,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:12:49,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 6: [2023-03-17 04:12:49,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:12:49,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 4: [2023-03-17 04:12:49,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:12:49,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step162000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:12:49,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step162000 is ready now! 0: successfully saved checkpoint at iteration 162000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 86.17 7: iteration 162010/ 173500 | consumed samples: 41474560 | consumed tokens: 84939898880 | elapsed time per iteration (s): 0.09 | learning rate: 2.198E-05 | global batch size: 256 | lm loss: 4.497869E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2724.329 | TFLOPs: 10.13 | 7: iteration 162020/ 173500 | consumed samples: 41477120 | consumed tokens: 84945141760 | elapsed time per iteration (s): 0.08 | learning rate: 2.198E-05 | global batch size: 256 | lm loss: 4.519694E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.346 | TFLOPs: 11.78 | 7: iteration 162030/ 173500 | consumed samples: 41479680 | consumed tokens: 84950384640 | elapsed time per iteration (s): 0.09 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 4.504594E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.032 | TFLOPs: 10.31 | 7: iteration 162040/ 173500 | consumed samples: 41482240 | consumed tokens: 84955627520 | elapsed time per iteration (s): 0.11 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 4.508950E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2423.124 | TFLOPs: 9.01 | 7: iteration 162050/ 173500 | consumed samples: 41484800 | consumed tokens: 84960870400 | elapsed time per iteration (s): 0.08 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 4.498849E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.356 | TFLOPs: 11.88 | 7: iteration 162060/ 173500 | consumed samples: 41487360 | consumed tokens: 84966113280 | elapsed time per iteration (s): 0.08 | learning rate: 2.196E-05 | global batch size: 256 | lm loss: 4.515503E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.767 | TFLOPs: 11.88 | 7: iteration 162070/ 173500 | consumed samples: 41489920 | consumed tokens: 84971356160 | elapsed time per iteration (s): 0.09 | learning rate: 2.196E-05 | global batch size: 256 | lm loss: 4.498573E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.301 | TFLOPs: 10.63 | 7: iteration 162080/ 173500 | consumed samples: 41492480 | consumed tokens: 84976599040 | elapsed time per iteration (s): 0.08 | learning rate: 2.196E-05 | global batch size: 256 | lm loss: 4.507457E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.041 | TFLOPs: 11.83 | 7: iteration 162090/ 173500 | consumed samples: 41495040 | consumed tokens: 84981841920 | elapsed time per iteration (s): 0.12 | learning rate: 2.195E-05 | global batch size: 256 | lm loss: 4.494820E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2199.533 | TFLOPs: 8.18 | 7: iteration 162100/ 173500 | consumed samples: 41497600 | consumed tokens: 84987084800 | elapsed time per iteration (s): 0.11 | learning rate: 2.195E-05 | global batch size: 256 | lm loss: 4.493304E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2270.543 | TFLOPs: 8.45 | 7: iteration 162110/ 173500 | consumed samples: 41500160 | consumed tokens: 84992327680 | elapsed time per iteration (s): 0.10 | learning rate: 2.195E-05 | global batch size: 256 | lm loss: 4.494571E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2627.056 | TFLOPs: 9.77 | 7: iteration 162120/ 173500 | consumed samples: 41502720 | consumed tokens: 84997570560 | elapsed time per iteration (s): 0.14 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 4.501804E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1883.490 | TFLOPs: 7.01 | 7: iteration 162130/ 173500 | consumed samples: 41505280 | consumed tokens: 85002813440 | elapsed time per iteration (s): 0.14 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 4.499814E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1865.320 | TFLOPs: 6.94 | 7: iteration 162140/ 173500 | consumed samples: 41507840 | consumed tokens: 85008056320 | elapsed time per iteration (s): 0.12 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 4.501965E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.312 | TFLOPs: 7.63 | 7: iteration 162150/ 173500 | consumed samples: 41510400 | consumed tokens: 85013299200 | elapsed time per iteration (s): 0.10 | learning rate: 2.193E-05 | global batch size: 256 | lm loss: 4.506470E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2485.145 | TFLOPs: 9.24 | 7: iteration 162160/ 173500 | consumed samples: 41512960 | consumed tokens: 85018542080 | elapsed time per iteration (s): 0.08 | learning rate: 2.193E-05 | global batch size: 256 | lm loss: 4.485831E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.464 | TFLOPs: 12.00 | 7: iteration 162170/ 173500 | consumed samples: 41515520 | consumed tokens: 85023784960 | elapsed time per iteration (s): 0.08 | learning rate: 2.193E-05 | global batch size: 256 | lm loss: 4.498176E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.456 | TFLOPs: 12.03 | 7: iteration 162180/ 173500 | consumed samples: 41518080 | consumed tokens: 85029027840 | elapsed time per iteration (s): 0.08 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 4.503268E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3082.419 | TFLOPs: 11.47 | 7: iteration 162190/ 173500 | consumed samples: 41520640 | consumed tokens: 85034270720 | elapsed time per iteration (s): 0.09 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 4.506876E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.225 | TFLOPs: 10.53 | 7: iteration 162200/ 173500 | consumed samples: 41523200 | consumed tokens: 85039513600 | elapsed time per iteration (s): 0.08 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 4.503734E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.409 | TFLOPs: 11.94 | 7: iteration 162210/ 173500 | consumed samples: 41525760 | consumed tokens: 85044756480 | elapsed time per iteration (s): 0.08 | learning rate: 2.191E-05 | global batch size: 256 | lm loss: 4.497895E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.748 | TFLOPs: 11.89 | 7: iteration 162220/ 173500 | consumed samples: 41528320 | consumed tokens: 85049999360 | elapsed time per iteration (s): 0.08 | learning rate: 2.191E-05 | global batch size: 256 | lm loss: 4.494891E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.959 | TFLOPs: 11.99 | 7: iteration 162230/ 173500 | consumed samples: 41530880 | consumed tokens: 85055242240 | elapsed time per iteration (s): 0.08 | learning rate: 2.191E-05 | global batch size: 256 | lm loss: 4.495841E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.605 | TFLOPs: 11.62 | 7: iteration 162240/ 173500 | consumed samples: 41533440 | consumed tokens: 85060485120 | elapsed time per iteration (s): 0.09 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 4.509573E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2878.993 | TFLOPs: 10.71 | 7: iteration 162250/ 173500 | consumed samples: 41536000 | consumed tokens: 85065728000 | elapsed time per iteration (s): 0.08 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 4.505473E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.496 | TFLOPs: 11.99 | 7: iteration 162260/ 173500 | consumed samples: 41538560 | consumed tokens: 85070970880 | elapsed time per iteration (s): 0.09 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 4.510551E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.674 | TFLOPs: 10.82 | 7: iteration 162270/ 173500 | consumed samples: 41541120 | consumed tokens: 85076213760 | elapsed time per iteration (s): 0.08 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 4.513181E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3017.091 | TFLOPs: 11.22 | 7: iteration 162280/ 173500 | consumed samples: 41543680 | consumed tokens: 85081456640 | elapsed time per iteration (s): 0.08 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 4.516887E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3012.489 | TFLOPs: 11.21 | 7: iteration 162290/ 173500 | consumed samples: 41546240 | consumed tokens: 85086699520 | elapsed time per iteration (s): 0.09 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 4.507315E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2778.493 | TFLOPs: 10.33 | 7: iteration 162300/ 173500 | consumed samples: 41548800 | consumed tokens: 85091942400 | elapsed time per iteration (s): 0.09 | learning rate: 2.188E-05 | global batch size: 256 | lm loss: 4.498430E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2907.954 | TFLOPs: 10.82 | 7: iteration 162310/ 173500 | consumed samples: 41551360 | consumed tokens: 85097185280 | elapsed time per iteration (s): 0.13 | learning rate: 2.188E-05 | global batch size: 256 | lm loss: 4.496758E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2013.220 | TFLOPs: 7.49 | 7: iteration 162320/ 173500 | consumed samples: 41553920 | consumed tokens: 85102428160 | elapsed time per iteration (s): 0.09 | learning rate: 2.188E-05 | global batch size: 256 | lm loss: 4.505921E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2901.625 | TFLOPs: 10.79 | 7: iteration 162330/ 173500 | consumed samples: 41556480 | consumed tokens: 85107671040 | elapsed time per iteration (s): 0.08 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 4.496609E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.506 | TFLOPs: 11.95 | 7: iteration 162340/ 173500 | consumed samples: 41559040 | consumed tokens: 85112913920 | elapsed time per iteration (s): 0.12 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 4.494118E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2054.059 | TFLOPs: 7.64 | 7: iteration 162350/ 173500 | consumed samples: 41561600 | consumed tokens: 85118156800 | elapsed time per iteration (s): 0.13 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 4.505431E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1997.463 | TFLOPs: 7.43 | 7: iteration 162360/ 173500 | consumed samples: 41564160 | consumed tokens: 85123399680 | elapsed time per iteration (s): 0.16 | learning rate: 2.186E-05 | global batch size: 256 | lm loss: 4.509152E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1558.952 | TFLOPs: 5.80 | 7: iteration 162370/ 173500 | consumed samples: 41566720 | consumed tokens: 85128642560 | elapsed time per iteration (s): 0.15 | learning rate: 2.186E-05 | global batch size: 256 | lm loss: 4.490310E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1714.995 | TFLOPs: 6.38 | 7: iteration 162380/ 173500 | consumed samples: 41569280 | consumed tokens: 85133885440 | elapsed time per iteration (s): 0.12 | learning rate: 2.186E-05 | global batch size: 256 | lm loss: 4.520633E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2126.591 | TFLOPs: 7.91 | 7: iteration 162390/ 173500 | consumed samples: 41571840 | consumed tokens: 85139128320 | elapsed time per iteration (s): 0.13 | learning rate: 2.185E-05 | global batch size: 256 | lm loss: 4.491571E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1987.291 | TFLOPs: 7.39 | 7: iteration 162400/ 173500 | consumed samples: 41574400 | consumed tokens: 85144371200 | elapsed time per iteration (s): 0.13 | learning rate: 2.185E-05 | global batch size: 256 | lm loss: 4.513984E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.508 | TFLOPs: 7.35 | 7: iteration 162410/ 173500 | consumed samples: 41576960 | consumed tokens: 85149614080 | elapsed time per iteration (s): 0.12 | learning rate: 2.185E-05 | global batch size: 256 | lm loss: 4.502489E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2051.357 | TFLOPs: 7.63 | 7: iteration 162420/ 173500 | consumed samples: 41579520 | consumed tokens: 85154856960 | elapsed time per iteration (s): 0.13 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 4.499517E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.184 | TFLOPs: 7.35 | 7: iteration 162430/ 173500 | consumed samples: 41582080 | consumed tokens: 85160099840 | elapsed time per iteration (s): 0.09 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 4.499471E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2772.615 | TFLOPs: 10.31 | 7: iteration 162440/ 173500 | consumed samples: 41584640 | consumed tokens: 85165342720 | elapsed time per iteration (s): 0.08 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 4.514891E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.815 | TFLOPs: 11.94 | 7: iteration 162450/ 173500 | consumed samples: 41587200 | consumed tokens: 85170585600 | elapsed time per iteration (s): 0.08 | learning rate: 2.183E-05 | global batch size: 256 | lm loss: 4.507780E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.700 | TFLOPs: 11.85 | 7: iteration 162460/ 173500 | consumed samples: 41589760 | consumed tokens: 85175828480 | elapsed time per iteration (s): 0.08 | learning rate: 2.183E-05 | global batch size: 256 | lm loss: 4.509069E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.256 | TFLOPs: 11.89 | 7: iteration 162470/ 173500 | consumed samples: 41592320 | consumed tokens: 85181071360 | elapsed time per iteration (s): 0.08 | learning rate: 2.183E-05 | global batch size: 256 | lm loss: 4.494795E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.516 | TFLOPs: 11.85 | 7: iteration 162480/ 173500 | consumed samples: 41594880 | consumed tokens: 85186314240 | elapsed time per iteration (s): 0.08 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 4.493426E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.688 | TFLOPs: 11.84 | 7: iteration 162490/ 173500 | consumed samples: 41597440 | consumed tokens: 85191557120 | elapsed time per iteration (s): 0.08 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 4.496156E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.912 | TFLOPs: 11.85 | 7: iteration 162500/ 173500 | consumed samples: 41600000 | consumed tokens: 85196800000 | elapsed time per iteration (s): 0.08 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 4.513279E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.474 | TFLOPs: 11.89 | 7: iteration 162510/ 173500 | consumed samples: 41602560 | consumed tokens: 85202042880 | elapsed time per iteration (s): 0.08 | learning rate: 2.181E-05 | global batch size: 256 | lm loss: 4.503669E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.999 | TFLOPs: 11.88 | 7: iteration 162520/ 173500 | consumed samples: 41605120 | consumed tokens: 85207285760 | elapsed time per iteration (s): 0.08 | learning rate: 2.181E-05 | global batch size: 256 | lm loss: 4.508192E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.356 | TFLOPs: 11.89 | 7: iteration 162530/ 173500 | consumed samples: 41607680 | consumed tokens: 85212528640 | elapsed time per iteration (s): 0.08 | learning rate: 2.181E-05 | global batch size: 256 | lm loss: 4.509400E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.152 | TFLOPs: 11.88 | 7: iteration 162540/ 173500 | consumed samples: 41610240 | consumed tokens: 85217771520 | elapsed time per iteration (s): 0.08 | learning rate: 2.180E-05 | global batch size: 256 | lm loss: 4.496072E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.647 | TFLOPs: 11.88 | 7: iteration 162550/ 173500 | consumed samples: 41612800 | consumed tokens: 85223014400 | elapsed time per iteration (s): 0.08 | learning rate: 2.180E-05 | global batch size: 256 | lm loss: 4.511855E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.776 | TFLOPs: 11.88 | 7: iteration 162560/ 173500 | consumed samples: 41615360 | consumed tokens: 85228257280 | elapsed time per iteration (s): 0.08 | learning rate: 2.180E-05 | global batch size: 256 | lm loss: 4.506895E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.457 | TFLOPs: 11.90 | 7: iteration 162570/ 173500 | consumed samples: 41617920 | consumed tokens: 85233500160 | elapsed time per iteration (s): 0.08 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 4.496362E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.447 | TFLOPs: 11.86 | 7: iteration 162580/ 173500 | consumed samples: 41620480 | consumed tokens: 85238743040 | elapsed time per iteration (s): 0.09 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 4.505011E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.396 | TFLOPs: 11.12 | 7: iteration 162590/ 173500 | consumed samples: 41623040 | consumed tokens: 85243985920 | elapsed time per iteration (s): 0.08 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 4.494795E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.175 | TFLOPs: 11.79 | 7: iteration 162600/ 173500 | consumed samples: 41625600 | consumed tokens: 85249228800 | elapsed time per iteration (s): 0.10 | learning rate: 2.178E-05 | global batch size: 256 | lm loss: 4.499052E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2593.994 | TFLOPs: 9.65 | 7: iteration 162610/ 173500 | consumed samples: 41628160 | consumed tokens: 85254471680 | elapsed time per iteration (s): 0.09 | learning rate: 2.178E-05 | global batch size: 256 | lm loss: 4.488333E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.800 | TFLOPs: 11.15 | 7: iteration 162620/ 173500 | consumed samples: 41630720 | consumed tokens: 85259714560 | elapsed time per iteration (s): 0.08 | learning rate: 2.178E-05 | global batch size: 256 | lm loss: 4.494896E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.564 | TFLOPs: 11.70 | 7: iteration 162630/ 173500 | consumed samples: 41633280 | consumed tokens: 85264957440 | elapsed time per iteration (s): 0.08 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 4.502821E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.482 | TFLOPs: 11.81 | 7: iteration 162640/ 173500 | consumed samples: 41635840 | consumed tokens: 85270200320 | elapsed time per iteration (s): 0.08 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 4.495744E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.407 | TFLOPs: 11.92 | 7: iteration 162650/ 173500 | consumed samples: 41638400 | consumed tokens: 85275443200 | elapsed time per iteration (s): 0.08 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 4.509482E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.351 | TFLOPs: 11.88 | 7: iteration 162660/ 173500 | consumed samples: 41640960 | consumed tokens: 85280686080 | elapsed time per iteration (s): 0.08 | learning rate: 2.176E-05 | global batch size: 256 | lm loss: 4.512031E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.951 | TFLOPs: 11.78 | 7: iteration 162670/ 173500 | consumed samples: 41643520 | consumed tokens: 85285928960 | elapsed time per iteration (s): 0.08 | learning rate: 2.176E-05 | global batch size: 256 | lm loss: 4.504122E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.578 | TFLOPs: 11.94 | 7: iteration 162680/ 173500 | consumed samples: 41646080 | consumed tokens: 85291171840 | elapsed time per iteration (s): 0.08 | learning rate: 2.176E-05 | global batch size: 256 | lm loss: 4.519588E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.149 | TFLOPs: 11.93 | 7: iteration 162690/ 173500 | consumed samples: 41648640 | consumed tokens: 85296414720 | elapsed time per iteration (s): 0.08 | learning rate: 2.175E-05 | global batch size: 256 | lm loss: 4.497940E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.497 | TFLOPs: 11.85 | 7: iteration 162700/ 173500 | consumed samples: 41651200 | consumed tokens: 85301657600 | elapsed time per iteration (s): 0.08 | learning rate: 2.175E-05 | global batch size: 256 | lm loss: 4.511552E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.177 | TFLOPs: 11.79 | 7: iteration 162710/ 173500 | consumed samples: 41653760 | consumed tokens: 85306900480 | elapsed time per iteration (s): 0.08 | learning rate: 2.175E-05 | global batch size: 256 | lm loss: 4.501266E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.224 | TFLOPs: 11.36 | 7: iteration 162720/ 173500 | consumed samples: 41656320 | consumed tokens: 85312143360 | elapsed time per iteration (s): 0.09 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 4.509053E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.935 | TFLOPs: 10.53 | 7: iteration 162730/ 173500 | consumed samples: 41658880 | consumed tokens: 85317386240 | elapsed time per iteration (s): 0.09 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 4.504383E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2978.408 | TFLOPs: 11.08 | 7: iteration 162740/ 173500 | consumed samples: 41661440 | consumed tokens: 85322629120 | elapsed time per iteration (s): 0.08 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 4.493462E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.810 | TFLOPs: 11.66 | 7: iteration 162750/ 173500 | consumed samples: 41664000 | consumed tokens: 85327872000 | elapsed time per iteration (s): 0.08 | learning rate: 2.173E-05 | global batch size: 256 | lm loss: 4.508780E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.074 | TFLOPs: 11.98 | 7: iteration 162760/ 173500 | consumed samples: 41666560 | consumed tokens: 85333114880 | elapsed time per iteration (s): 0.08 | learning rate: 2.173E-05 | global batch size: 256 | lm loss: 4.511563E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.319 | TFLOPs: 11.84 | 7: iteration 162770/ 173500 | consumed samples: 41669120 | consumed tokens: 85338357760 | elapsed time per iteration (s): 0.08 | learning rate: 2.173E-05 | global batch size: 256 | lm loss: 4.511640E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.728 | TFLOPs: 11.85 | 7: iteration 162780/ 173500 | consumed samples: 41671680 | consumed tokens: 85343600640 | elapsed time per iteration (s): 0.08 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 4.500184E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.043 | TFLOPs: 11.68 | 7: iteration 162790/ 173500 | consumed samples: 41674240 | consumed tokens: 85348843520 | elapsed time per iteration (s): 0.08 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 4.509590E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.384 | TFLOPs: 11.97 | 7: iteration 162800/ 173500 | consumed samples: 41676800 | consumed tokens: 85354086400 | elapsed time per iteration (s): 0.08 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 4.509848E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.042 | TFLOPs: 11.88 | 7: iteration 162810/ 173500 | consumed samples: 41679360 | consumed tokens: 85359329280 | elapsed time per iteration (s): 0.08 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 4.503181E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.516 | TFLOPs: 11.95 | 7: iteration 162820/ 173500 | consumed samples: 41681920 | consumed tokens: 85364572160 | elapsed time per iteration (s): 0.08 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 4.512920E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.303 | TFLOPs: 11.97 | 7: iteration 162830/ 173500 | consumed samples: 41684480 | consumed tokens: 85369815040 | elapsed time per iteration (s): 0.08 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 4.505699E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.591 | TFLOPs: 11.90 | 7: iteration 162840/ 173500 | consumed samples: 41687040 | consumed tokens: 85375057920 | elapsed time per iteration (s): 0.08 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 4.504450E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.675 | TFLOPs: 11.63 | 7: iteration 162850/ 173500 | consumed samples: 41689600 | consumed tokens: 85380300800 | elapsed time per iteration (s): 0.08 | learning rate: 2.170E-05 | global batch size: 256 | lm loss: 4.502718E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.934 | TFLOPs: 11.64 | 7: iteration 162860/ 173500 | consumed samples: 41692160 | consumed tokens: 85385543680 | elapsed time per iteration (s): 0.08 | learning rate: 2.170E-05 | global batch size: 256 | lm loss: 4.505571E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.155 | TFLOPs: 11.92 | 7: iteration 162870/ 173500 | consumed samples: 41694720 | consumed tokens: 85390786560 | elapsed time per iteration (s): 0.08 | learning rate: 2.170E-05 | global batch size: 256 | lm loss: 4.499108E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3111.890 | TFLOPs: 11.57 | 7: iteration 162880/ 173500 | consumed samples: 41697280 | consumed tokens: 85396029440 | elapsed time per iteration (s): 0.09 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 4.507708E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.356 | TFLOPs: 11.14 | 7: iteration 162890/ 173500 | consumed samples: 41699840 | consumed tokens: 85401272320 | elapsed time per iteration (s): 0.09 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 4.506379E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.834 | TFLOPs: 11.08 | 7: iteration 162900/ 173500 | consumed samples: 41702400 | consumed tokens: 85406515200 | elapsed time per iteration (s): 0.08 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 4.507308E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.479 | TFLOPs: 11.71 | 7: iteration 162910/ 173500 | consumed samples: 41704960 | consumed tokens: 85411758080 | elapsed time per iteration (s): 0.11 | learning rate: 2.168E-05 | global batch size: 256 | lm loss: 4.497277E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2326.483 | TFLOPs: 8.65 | 7: iteration 162920/ 173500 | consumed samples: 41707520 | consumed tokens: 85417000960 | elapsed time per iteration (s): 0.09 | learning rate: 2.168E-05 | global batch size: 256 | lm loss: 4.509636E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2848.597 | TFLOPs: 10.60 | 7: iteration 162930/ 173500 | consumed samples: 41710080 | consumed tokens: 85422243840 | elapsed time per iteration (s): 0.09 | learning rate: 2.168E-05 | global batch size: 256 | lm loss: 4.496438E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.613 | TFLOPs: 11.18 | 7: iteration 162940/ 173500 | consumed samples: 41712640 | consumed tokens: 85427486720 | elapsed time per iteration (s): 0.08 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 4.512746E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.724 | TFLOPs: 11.63 | 7: iteration 162950/ 173500 | consumed samples: 41715200 | consumed tokens: 85432729600 | elapsed time per iteration (s): 0.08 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 4.505314E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3062.533 | TFLOPs: 11.39 | 7: iteration 162960/ 173500 | consumed samples: 41717760 | consumed tokens: 85437972480 | elapsed time per iteration (s): 0.08 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 4.501705E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.642 | TFLOPs: 11.61 | 7: iteration 162970/ 173500 | consumed samples: 41720320 | consumed tokens: 85443215360 | elapsed time per iteration (s): 0.08 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 4.503038E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.554 | TFLOPs: 11.91 | 7: iteration 162980/ 173500 | consumed samples: 41722880 | consumed tokens: 85448458240 | elapsed time per iteration (s): 0.08 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 4.505231E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.998 | TFLOPs: 11.92 | 7: iteration 162990/ 173500 | consumed samples: 41725440 | consumed tokens: 85453701120 | elapsed time per iteration (s): 0.08 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 4.507327E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3060.030 | TFLOPs: 11.38 | 7: iteration 163000/ 173500 | consumed samples: 41728000 | consumed tokens: 85458944000 | elapsed time per iteration (s): 0.09 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 4.507344E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.731 | TFLOPs: 11.15 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 163000 | lm loss value: 4.384087E+00 | lm loss PPL: 8.016497E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 163000 to checkpoints_14m91b100m 0: [2023-03-17 04:14:20,163] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step163000 is begin to save! 0: [2023-03-17 04:14:20,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:14:20,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:14:20,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:14:20,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:14:20,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:14:20,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:14:20,201] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:14:20,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:14:20,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:14:20,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:14:20,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:14:20,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:14:20,209] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step163000/mp_rank_00_model_states.pt 0: [2023-03-17 04:14:20,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:14:20,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:14:20,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,234] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,237] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,239] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,239] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 7: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: [2023-03-17 04:14:20,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 4: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 1: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:14:20,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 2: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:14:20,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 5: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 3: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:14:20,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:14:20,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 6: [2023-03-17 04:14:20,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:14:20,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step163000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:14:20,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step163000 is ready now! 0: successfully saved checkpoint at iteration 163000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.59 7: iteration 163010/ 173500 | consumed samples: 41730560 | consumed tokens: 85464186880 | elapsed time per iteration (s): 0.09 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 4.505290E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2779.537 | TFLOPs: 10.34 | 7: iteration 163020/ 173500 | consumed samples: 41733120 | consumed tokens: 85469429760 | elapsed time per iteration (s): 0.08 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 4.510306E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.385 | TFLOPs: 11.65 | 7: iteration 163030/ 173500 | consumed samples: 41735680 | consumed tokens: 85474672640 | elapsed time per iteration (s): 0.09 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 4.503637E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2884.037 | TFLOPs: 10.73 | 7: iteration 163040/ 173500 | consumed samples: 41738240 | consumed tokens: 85479915520 | elapsed time per iteration (s): 0.08 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 4.500552E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.464 | TFLOPs: 11.97 | 7: iteration 163050/ 173500 | consumed samples: 41740800 | consumed tokens: 85485158400 | elapsed time per iteration (s): 0.08 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 4.504240E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.386 | TFLOPs: 11.97 | 7: iteration 163060/ 173500 | consumed samples: 41743360 | consumed tokens: 85490401280 | elapsed time per iteration (s): 0.08 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 4.508737E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.747 | TFLOPs: 11.95 | 7: iteration 163070/ 173500 | consumed samples: 41745920 | consumed tokens: 85495644160 | elapsed time per iteration (s): 0.09 | learning rate: 2.163E-05 | global batch size: 256 | lm loss: 4.505607E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.320 | TFLOPs: 10.16 | 7: iteration 163080/ 173500 | consumed samples: 41748480 | consumed tokens: 85500887040 | elapsed time per iteration (s): 0.09 | learning rate: 2.163E-05 | global batch size: 256 | lm loss: 4.505886E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.280 | TFLOPs: 10.53 | 7: iteration 163090/ 173500 | consumed samples: 41751040 | consumed tokens: 85506129920 | elapsed time per iteration (s): 0.08 | learning rate: 2.163E-05 | global batch size: 256 | lm loss: 4.518219E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3131.242 | TFLOPs: 11.65 | 7: iteration 163100/ 173500 | consumed samples: 41753600 | consumed tokens: 85511372800 | elapsed time per iteration (s): 0.08 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 4.497802E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.623 | TFLOPs: 11.93 | 7: iteration 163110/ 173500 | consumed samples: 41756160 | consumed tokens: 85516615680 | elapsed time per iteration (s): 0.08 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 4.489880E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.339 | TFLOPs: 11.96 | 7: iteration 163120/ 173500 | consumed samples: 41758720 | consumed tokens: 85521858560 | elapsed time per iteration (s): 0.08 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 4.516108E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.246 | TFLOPs: 11.99 | 7: iteration 163130/ 173500 | consumed samples: 41761280 | consumed tokens: 85527101440 | elapsed time per iteration (s): 0.09 | learning rate: 2.161E-05 | global batch size: 256 | lm loss: 4.503170E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.470 | TFLOPs: 10.82 | 7: iteration 163140/ 173500 | consumed samples: 41763840 | consumed tokens: 85532344320 | elapsed time per iteration (s): 0.10 | learning rate: 2.161E-05 | global batch size: 256 | lm loss: 4.490877E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2450.997 | TFLOPs: 9.12 | 7: iteration 163150/ 173500 | consumed samples: 41766400 | consumed tokens: 85537587200 | elapsed time per iteration (s): 0.08 | learning rate: 2.161E-05 | global batch size: 256 | lm loss: 4.513185E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.581 | TFLOPs: 11.99 | 7: iteration 163160/ 173500 | consumed samples: 41768960 | consumed tokens: 85542830080 | elapsed time per iteration (s): 0.08 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 4.506250E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.715 | TFLOPs: 11.97 | 7: iteration 163170/ 173500 | consumed samples: 41771520 | consumed tokens: 85548072960 | elapsed time per iteration (s): 0.10 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 4.501460E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2497.989 | TFLOPs: 9.29 | 7: iteration 163180/ 173500 | consumed samples: 41774080 | consumed tokens: 85553315840 | elapsed time per iteration (s): 0.10 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 4.499734E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2582.147 | TFLOPs: 9.60 | 7: iteration 163190/ 173500 | consumed samples: 41776640 | consumed tokens: 85558558720 | elapsed time per iteration (s): 0.08 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 4.496425E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.404 | TFLOPs: 11.89 | 7: iteration 163200/ 173500 | consumed samples: 41779200 | consumed tokens: 85563801600 | elapsed time per iteration (s): 0.10 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 4.502447E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2660.216 | TFLOPs: 9.89 | 7: iteration 163210/ 173500 | consumed samples: 41781760 | consumed tokens: 85569044480 | elapsed time per iteration (s): 0.08 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 4.520975E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.973 | TFLOPs: 11.62 | 7: iteration 163220/ 173500 | consumed samples: 41784320 | consumed tokens: 85574287360 | elapsed time per iteration (s): 0.08 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 4.494786E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.798 | TFLOPs: 11.62 | 7: iteration 163230/ 173500 | consumed samples: 41786880 | consumed tokens: 85579530240 | elapsed time per iteration (s): 0.10 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 4.493914E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2667.684 | TFLOPs: 9.92 | 7: iteration 163240/ 173500 | consumed samples: 41789440 | consumed tokens: 85584773120 | elapsed time per iteration (s): 0.09 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 4.511676E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2857.423 | TFLOPs: 10.63 | 7: iteration 163250/ 173500 | consumed samples: 41792000 | consumed tokens: 85590016000 | elapsed time per iteration (s): 0.09 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 4.501018E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2916.817 | TFLOPs: 10.85 | 7: iteration 163260/ 173500 | consumed samples: 41794560 | consumed tokens: 85595258880 | elapsed time per iteration (s): 0.09 | learning rate: 2.157E-05 | global batch size: 256 | lm loss: 4.511625E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2979.751 | TFLOPs: 11.08 | 7: iteration 163270/ 173500 | consumed samples: 41797120 | consumed tokens: 85600501760 | elapsed time per iteration (s): 0.08 | learning rate: 2.157E-05 | global batch size: 256 | lm loss: 4.493592E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3061.895 | TFLOPs: 11.39 | 7: iteration 163280/ 173500 | consumed samples: 41799680 | consumed tokens: 85605744640 | elapsed time per iteration (s): 0.10 | learning rate: 2.157E-05 | global batch size: 256 | lm loss: 4.488577E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2510.168 | TFLOPs: 9.34 | 7: iteration 163290/ 173500 | consumed samples: 41802240 | consumed tokens: 85610987520 | elapsed time per iteration (s): 0.09 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 4.510863E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2928.338 | TFLOPs: 10.89 | 7: iteration 163300/ 173500 | consumed samples: 41804800 | consumed tokens: 85616230400 | elapsed time per iteration (s): 0.09 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 4.500599E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2902.639 | TFLOPs: 10.80 | 7: iteration 163310/ 173500 | consumed samples: 41807360 | consumed tokens: 85621473280 | elapsed time per iteration (s): 0.10 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 4.509555E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2597.670 | TFLOPs: 9.66 | 7: iteration 163320/ 173500 | consumed samples: 41809920 | consumed tokens: 85626716160 | elapsed time per iteration (s): 0.09 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 4.502549E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2837.424 | TFLOPs: 10.55 | 7: iteration 163330/ 173500 | consumed samples: 41812480 | consumed tokens: 85631959040 | elapsed time per iteration (s): 0.08 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 4.499490E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3100.388 | TFLOPs: 11.53 | 7: iteration 163340/ 173500 | consumed samples: 41815040 | consumed tokens: 85637201920 | elapsed time per iteration (s): 0.09 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 4.504194E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2977.630 | TFLOPs: 11.08 | 7: iteration 163350/ 173500 | consumed samples: 41817600 | consumed tokens: 85642444800 | elapsed time per iteration (s): 0.08 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 4.492100E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.155 | TFLOPs: 11.32 | 7: iteration 163360/ 173500 | consumed samples: 41820160 | consumed tokens: 85647687680 | elapsed time per iteration (s): 0.08 | learning rate: 2.154E-05 | global batch size: 256 | lm loss: 4.500463E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.912 | TFLOPs: 11.30 | 7: iteration 163370/ 173500 | consumed samples: 41822720 | consumed tokens: 85652930560 | elapsed time per iteration (s): 0.09 | learning rate: 2.154E-05 | global batch size: 256 | lm loss: 4.507342E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.395 | TFLOPs: 10.54 | 7: iteration 163380/ 173500 | consumed samples: 41825280 | consumed tokens: 85658173440 | elapsed time per iteration (s): 0.10 | learning rate: 2.154E-05 | global batch size: 256 | lm loss: 4.513765E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2674.746 | TFLOPs: 9.95 | 7: iteration 163390/ 173500 | consumed samples: 41827840 | consumed tokens: 85663416320 | elapsed time per iteration (s): 0.09 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 4.512008E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2788.867 | TFLOPs: 10.37 | 7: iteration 163400/ 173500 | consumed samples: 41830400 | consumed tokens: 85668659200 | elapsed time per iteration (s): 0.09 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 4.500672E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2906.361 | TFLOPs: 10.81 | 7: iteration 163410/ 173500 | consumed samples: 41832960 | consumed tokens: 85673902080 | elapsed time per iteration (s): 0.09 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 4.508448E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2984.005 | TFLOPs: 11.10 | 7: iteration 163420/ 173500 | consumed samples: 41835520 | consumed tokens: 85679144960 | elapsed time per iteration (s): 0.09 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 4.492868E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2877.776 | TFLOPs: 10.70 | 7: iteration 163430/ 173500 | consumed samples: 41838080 | consumed tokens: 85684387840 | elapsed time per iteration (s): 0.09 | learning rate: 2.152E-05 | global batch size: 256 | lm loss: 4.503004E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2790.047 | TFLOPs: 10.38 | 7: iteration 163440/ 173500 | consumed samples: 41840640 | consumed tokens: 85689630720 | elapsed time per iteration (s): 0.08 | learning rate: 2.152E-05 | global batch size: 256 | lm loss: 4.498662E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.743 | TFLOPs: 11.34 | 7: iteration 163450/ 173500 | consumed samples: 41843200 | consumed tokens: 85694873600 | elapsed time per iteration (s): 0.09 | learning rate: 2.152E-05 | global batch size: 256 | lm loss: 4.513355E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.413 | TFLOPs: 10.82 | 7: iteration 163460/ 173500 | consumed samples: 41845760 | consumed tokens: 85700116480 | elapsed time per iteration (s): 0.08 | learning rate: 2.151E-05 | global batch size: 256 | lm loss: 4.505462E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3041.152 | TFLOPs: 11.31 | 7: iteration 163470/ 173500 | consumed samples: 41848320 | consumed tokens: 85705359360 | elapsed time per iteration (s): 0.09 | learning rate: 2.151E-05 | global batch size: 256 | lm loss: 4.497417E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2788.568 | TFLOPs: 10.37 | 7: iteration 163480/ 173500 | consumed samples: 41850880 | consumed tokens: 85710602240 | elapsed time per iteration (s): 0.09 | learning rate: 2.151E-05 | global batch size: 256 | lm loss: 4.502186E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.849 | TFLOPs: 10.82 | 7: iteration 163490/ 173500 | consumed samples: 41853440 | consumed tokens: 85715845120 | elapsed time per iteration (s): 0.09 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 4.501480E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.961 | TFLOPs: 10.42 | 7: iteration 163500/ 173500 | consumed samples: 41856000 | consumed tokens: 85721088000 | elapsed time per iteration (s): 0.10 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 4.490708E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2647.210 | TFLOPs: 9.85 | 7: iteration 163510/ 173500 | consumed samples: 41858560 | consumed tokens: 85726330880 | elapsed time per iteration (s): 0.09 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 4.503727E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2844.739 | TFLOPs: 10.58 | 7: iteration 163520/ 173500 | consumed samples: 41861120 | consumed tokens: 85731573760 | elapsed time per iteration (s): 0.08 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 4.512701E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3053.189 | TFLOPs: 11.36 | 7: iteration 163530/ 173500 | consumed samples: 41863680 | consumed tokens: 85736816640 | elapsed time per iteration (s): 0.09 | learning rate: 2.149E-05 | global batch size: 256 | lm loss: 4.505631E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2955.048 | TFLOPs: 10.99 | 7: iteration 163540/ 173500 | consumed samples: 41866240 | consumed tokens: 85742059520 | elapsed time per iteration (s): 0.09 | learning rate: 2.149E-05 | global batch size: 256 | lm loss: 4.497773E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.495 | TFLOPs: 10.59 | 7: iteration 163550/ 173500 | consumed samples: 41868800 | consumed tokens: 85747302400 | elapsed time per iteration (s): 0.09 | learning rate: 2.149E-05 | global batch size: 256 | lm loss: 4.502637E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2786.451 | TFLOPs: 10.36 | 7: iteration 163560/ 173500 | consumed samples: 41871360 | consumed tokens: 85752545280 | elapsed time per iteration (s): 0.09 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 4.506478E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2728.308 | TFLOPs: 10.15 | 7: iteration 163570/ 173500 | consumed samples: 41873920 | consumed tokens: 85757788160 | elapsed time per iteration (s): 0.09 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 4.523190E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2981.702 | TFLOPs: 11.09 | 7: iteration 163580/ 173500 | consumed samples: 41876480 | consumed tokens: 85763031040 | elapsed time per iteration (s): 0.09 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 4.505075E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.426 | TFLOPs: 10.57 | 7: iteration 163590/ 173500 | consumed samples: 41879040 | consumed tokens: 85768273920 | elapsed time per iteration (s): 0.08 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 4.507985E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.157 | TFLOPs: 11.34 | 7: iteration 163600/ 173500 | consumed samples: 41881600 | consumed tokens: 85773516800 | elapsed time per iteration (s): 0.09 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 4.507366E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.500 | TFLOPs: 11.04 | 7: iteration 163610/ 173500 | consumed samples: 41884160 | consumed tokens: 85778759680 | elapsed time per iteration (s): 0.09 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 4.498890E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.574 | TFLOPs: 11.11 | 7: iteration 163620/ 173500 | consumed samples: 41886720 | consumed tokens: 85784002560 | elapsed time per iteration (s): 0.09 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 4.505018E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.835 | TFLOPs: 11.12 | 7: iteration 163630/ 173500 | consumed samples: 41889280 | consumed tokens: 85789245440 | elapsed time per iteration (s): 0.08 | learning rate: 2.146E-05 | global batch size: 256 | lm loss: 4.497314E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.311 | TFLOPs: 11.96 | 7: iteration 163640/ 173500 | consumed samples: 41891840 | consumed tokens: 85794488320 | elapsed time per iteration (s): 0.08 | learning rate: 2.146E-05 | global batch size: 256 | lm loss: 4.488805E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.514 | TFLOPs: 11.90 | 7: iteration 163650/ 173500 | consumed samples: 41894400 | consumed tokens: 85799731200 | elapsed time per iteration (s): 0.08 | learning rate: 2.146E-05 | global batch size: 256 | lm loss: 4.509864E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.213 | TFLOPs: 11.86 | 7: iteration 163660/ 173500 | consumed samples: 41896960 | consumed tokens: 85804974080 | elapsed time per iteration (s): 0.08 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 4.508356E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.565 | TFLOPs: 11.87 | 7: iteration 163670/ 173500 | consumed samples: 41899520 | consumed tokens: 85810216960 | elapsed time per iteration (s): 0.08 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 4.509433E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.914 | TFLOPs: 11.85 | 7: iteration 163680/ 173500 | consumed samples: 41902080 | consumed tokens: 85815459840 | elapsed time per iteration (s): 0.08 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 4.505861E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.968 | TFLOPs: 11.86 | 7: iteration 163690/ 173500 | consumed samples: 41904640 | consumed tokens: 85820702720 | elapsed time per iteration (s): 0.08 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 4.510593E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.318 | TFLOPs: 11.79 | 7: iteration 163700/ 173500 | consumed samples: 41907200 | consumed tokens: 85825945600 | elapsed time per iteration (s): 0.08 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 4.507758E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.317 | TFLOPs: 11.87 | 7: iteration 163710/ 173500 | consumed samples: 41909760 | consumed tokens: 85831188480 | elapsed time per iteration (s): 0.08 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 4.511291E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.930 | TFLOPs: 11.82 | 7: iteration 163720/ 173500 | consumed samples: 41912320 | consumed tokens: 85836431360 | elapsed time per iteration (s): 0.08 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 4.510427E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.985 | TFLOPs: 11.84 | 7: iteration 163730/ 173500 | consumed samples: 41914880 | consumed tokens: 85841674240 | elapsed time per iteration (s): 0.08 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 4.498274E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.958 | TFLOPs: 11.34 | 7: iteration 163740/ 173500 | consumed samples: 41917440 | consumed tokens: 85846917120 | elapsed time per iteration (s): 0.08 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 4.507747E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3057.373 | TFLOPs: 11.37 | 7: iteration 163750/ 173500 | consumed samples: 41920000 | consumed tokens: 85852160000 | elapsed time per iteration (s): 0.08 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 4.511891E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3114.758 | TFLOPs: 11.59 | 7: iteration 163760/ 173500 | consumed samples: 41922560 | consumed tokens: 85857402880 | elapsed time per iteration (s): 0.08 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 4.511703E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.126 | TFLOPs: 11.85 | 7: iteration 163770/ 173500 | consumed samples: 41925120 | consumed tokens: 85862645760 | elapsed time per iteration (s): 0.09 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 4.498659E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.040 | TFLOPs: 11.13 | 7: iteration 163780/ 173500 | consumed samples: 41927680 | consumed tokens: 85867888640 | elapsed time per iteration (s): 0.08 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 4.480854E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.024 | TFLOPs: 11.70 | 7: iteration 163790/ 173500 | consumed samples: 41930240 | consumed tokens: 85873131520 | elapsed time per iteration (s): 0.08 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 4.519071E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.034 | TFLOPs: 11.79 | 7: iteration 163800/ 173500 | consumed samples: 41932800 | consumed tokens: 85878374400 | elapsed time per iteration (s): 0.08 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 4.496814E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3120.683 | TFLOPs: 11.61 | 7: iteration 163810/ 173500 | consumed samples: 41935360 | consumed tokens: 85883617280 | elapsed time per iteration (s): 0.08 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 4.496420E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.295 | TFLOPs: 11.41 | 7: iteration 163820/ 173500 | consumed samples: 41937920 | consumed tokens: 85888860160 | elapsed time per iteration (s): 0.08 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 4.504596E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3048.818 | TFLOPs: 11.34 | 7: iteration 163830/ 173500 | consumed samples: 41940480 | consumed tokens: 85894103040 | elapsed time per iteration (s): 0.08 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 4.498328E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.044 | TFLOPs: 11.91 | 7: iteration 163840/ 173500 | consumed samples: 41943040 | consumed tokens: 85899345920 | elapsed time per iteration (s): 0.08 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 4.496485E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3095.168 | TFLOPs: 11.51 | 7: iteration 163850/ 173500 | consumed samples: 41945600 | consumed tokens: 85904588800 | elapsed time per iteration (s): 0.08 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 4.498317E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.404 | TFLOPs: 11.83 | 7: iteration 163860/ 173500 | consumed samples: 41948160 | consumed tokens: 85909831680 | elapsed time per iteration (s): 0.08 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 4.503198E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.878 | TFLOPs: 11.87 | 7: iteration 163870/ 173500 | consumed samples: 41950720 | consumed tokens: 85915074560 | elapsed time per iteration (s): 0.08 | learning rate: 2.139E-05 | global batch size: 256 | lm loss: 4.513216E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.683 | TFLOPs: 11.34 | 7: iteration 163880/ 173500 | consumed samples: 41953280 | consumed tokens: 85920317440 | elapsed time per iteration (s): 0.08 | learning rate: 2.139E-05 | global batch size: 256 | lm loss: 4.493136E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.245 | TFLOPs: 11.58 | 7: iteration 163890/ 173500 | consumed samples: 41955840 | consumed tokens: 85925560320 | elapsed time per iteration (s): 0.09 | learning rate: 2.139E-05 | global batch size: 256 | lm loss: 4.510427E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2777.321 | TFLOPs: 10.33 | 7: iteration 163900/ 173500 | consumed samples: 41958400 | consumed tokens: 85930803200 | elapsed time per iteration (s): 0.08 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 4.495486E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3107.054 | TFLOPs: 11.56 | 7: iteration 163910/ 173500 | consumed samples: 41960960 | consumed tokens: 85936046080 | elapsed time per iteration (s): 0.08 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 4.500937E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.691 | TFLOPs: 11.28 | 7: iteration 163920/ 173500 | consumed samples: 41963520 | consumed tokens: 85941288960 | elapsed time per iteration (s): 0.10 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 4.498197E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.882 | TFLOPs: 9.34 | 7: iteration 163930/ 173500 | consumed samples: 41966080 | consumed tokens: 85946531840 | elapsed time per iteration (s): 0.10 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 4.494595E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.555 | TFLOPs: 9.33 | 7: iteration 163940/ 173500 | consumed samples: 41968640 | consumed tokens: 85951774720 | elapsed time per iteration (s): 0.09 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 4.513715E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.658 | TFLOPs: 11.16 | 7: iteration 163950/ 173500 | consumed samples: 41971200 | consumed tokens: 85957017600 | elapsed time per iteration (s): 0.08 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 4.508231E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.365 | TFLOPs: 11.90 | 7: iteration 163960/ 173500 | consumed samples: 41973760 | consumed tokens: 85962260480 | elapsed time per iteration (s): 0.08 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 4.495036E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.164 | TFLOPs: 11.85 | 7: iteration 163970/ 173500 | consumed samples: 41976320 | consumed tokens: 85967503360 | elapsed time per iteration (s): 0.08 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 4.499004E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.171 | TFLOPs: 11.55 | 7: iteration 163980/ 173500 | consumed samples: 41978880 | consumed tokens: 85972746240 | elapsed time per iteration (s): 0.08 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 4.497412E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.777 | TFLOPs: 11.84 | 7: iteration 163990/ 173500 | consumed samples: 41981440 | consumed tokens: 85977989120 | elapsed time per iteration (s): 0.08 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 4.501368E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.027 | TFLOPs: 11.85 | 0: [2023-03-17 04:15:46,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=164000, skipped=0, lr=[2.1355330909017464e-05, 2.1355330909017464e-05, 2.1355330909017464e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 164000/ 173500 | consumed samples: 41984000 | consumed tokens: 85983232000 | elapsed time per iteration (s): 0.09 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 4.488492E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2995.823 | TFLOPs: 11.14 | 0: steps: 164000 loss: 4.4650 iter time (s): 0.087 samples/sec: 2926.154 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 164000 | lm loss value: 4.349153E+00 | lm loss PPL: 7.741287E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 164000 to checkpoints_14m91b100m 0: [2023-03-17 04:15:46,315] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step164000 is begin to save! 0: [2023-03-17 04:15:46,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:15:46,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:15:46,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:15:46,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:15:46,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:15:46,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:15:46,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:15:46,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:15:46,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:15:46,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:15:46,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:15:46,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:15:46,359] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step164000/mp_rank_00_model_states.pt 0: [2023-03-17 04:15:46,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:15:46,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,377] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:15:46,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:15:46,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 2: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 4: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 6: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 5: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 3: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 7: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:15:46,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:15:46,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 1: [2023-03-17 04:15:46,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:15:46,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step164000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:15:46,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step164000 is ready now! 0: successfully saved checkpoint at iteration 164000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 81.53 7: iteration 164010/ 173500 | consumed samples: 41986560 | consumed tokens: 85988474880 | elapsed time per iteration (s): 0.09 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 4.511133E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2737.035 | TFLOPs: 10.18 | 7: iteration 164020/ 173500 | consumed samples: 41989120 | consumed tokens: 85993717760 | elapsed time per iteration (s): 0.08 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 4.501661E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.690 | TFLOPs: 11.89 | 7: iteration 164030/ 173500 | consumed samples: 41991680 | consumed tokens: 85998960640 | elapsed time per iteration (s): 0.08 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 4.507328E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.170 | TFLOPs: 11.87 | 7: iteration 164040/ 173500 | consumed samples: 41994240 | consumed tokens: 86004203520 | elapsed time per iteration (s): 0.08 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 4.506666E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.020 | TFLOPs: 11.54 | 7: iteration 164050/ 173500 | consumed samples: 41996800 | consumed tokens: 86009446400 | elapsed time per iteration (s): 0.09 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 4.504055E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.546 | TFLOPs: 10.60 | 7: iteration 164060/ 173500 | consumed samples: 41999360 | consumed tokens: 86014689280 | elapsed time per iteration (s): 0.08 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 4.518368E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.357 | TFLOPs: 11.32 | 7: iteration 164070/ 173500 | consumed samples: 42001920 | consumed tokens: 86019932160 | elapsed time per iteration (s): 0.08 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 4.515034E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3115.724 | TFLOPs: 11.59 | 7: iteration 164080/ 173500 | consumed samples: 42004480 | consumed tokens: 86025175040 | elapsed time per iteration (s): 0.08 | learning rate: 2.133E-05 | global batch size: 256 | lm loss: 4.512955E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.099 | TFLOPs: 11.56 | 7: iteration 164090/ 173500 | consumed samples: 42007040 | consumed tokens: 86030417920 | elapsed time per iteration (s): 0.08 | learning rate: 2.133E-05 | global batch size: 256 | lm loss: 4.502176E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.565 | TFLOPs: 11.58 | 7: iteration 164100/ 173500 | consumed samples: 42009600 | consumed tokens: 86035660800 | elapsed time per iteration (s): 0.08 | learning rate: 2.133E-05 | global batch size: 256 | lm loss: 4.505809E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.013 | TFLOPs: 11.98 | 7: iteration 164110/ 173500 | consumed samples: 42012160 | consumed tokens: 86040903680 | elapsed time per iteration (s): 0.08 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 4.497664E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.906 | TFLOPs: 11.99 | 7: iteration 164120/ 173500 | consumed samples: 42014720 | consumed tokens: 86046146560 | elapsed time per iteration (s): 0.08 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 4.496824E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.994 | TFLOPs: 11.98 | 7: iteration 164130/ 173500 | consumed samples: 42017280 | consumed tokens: 86051389440 | elapsed time per iteration (s): 0.08 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 4.500066E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.142 | TFLOPs: 11.95 | 7: iteration 164140/ 173500 | consumed samples: 42019840 | consumed tokens: 86056632320 | elapsed time per iteration (s): 0.08 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 4.500909E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.665 | TFLOPs: 11.93 | 7: iteration 164150/ 173500 | consumed samples: 42022400 | consumed tokens: 86061875200 | elapsed time per iteration (s): 0.08 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 4.509154E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.392 | TFLOPs: 11.80 | 7: iteration 164160/ 173500 | consumed samples: 42024960 | consumed tokens: 86067118080 | elapsed time per iteration (s): 0.08 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 4.519051E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.461 | TFLOPs: 11.82 | 7: iteration 164170/ 173500 | consumed samples: 42027520 | consumed tokens: 86072360960 | elapsed time per iteration (s): 0.08 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 4.506688E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.173 | TFLOPs: 11.86 | 7: iteration 164180/ 173500 | consumed samples: 42030080 | consumed tokens: 86077603840 | elapsed time per iteration (s): 0.08 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 4.504837E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.431 | TFLOPs: 11.67 | 7: iteration 164190/ 173500 | consumed samples: 42032640 | consumed tokens: 86082846720 | elapsed time per iteration (s): 0.08 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 4.497736E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.759 | TFLOPs: 11.48 | 7: iteration 164200/ 173500 | consumed samples: 42035200 | consumed tokens: 86088089600 | elapsed time per iteration (s): 0.08 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 4.505058E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.581 | TFLOPs: 11.79 | 7: iteration 164210/ 173500 | consumed samples: 42037760 | consumed tokens: 86093332480 | elapsed time per iteration (s): 0.08 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 4.509132E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3169.558 | TFLOPs: 11.79 | 7: iteration 164220/ 173500 | consumed samples: 42040320 | consumed tokens: 86098575360 | elapsed time per iteration (s): 0.08 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 4.503724E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3155.352 | TFLOPs: 11.74 | 7: iteration 164230/ 173500 | consumed samples: 42042880 | consumed tokens: 86103818240 | elapsed time per iteration (s): 0.08 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 4.508360E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.956 | TFLOPs: 11.79 | 7: iteration 164240/ 173500 | consumed samples: 42045440 | consumed tokens: 86109061120 | elapsed time per iteration (s): 0.08 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 4.502042E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3125.844 | TFLOPs: 11.63 | 7: iteration 164250/ 173500 | consumed samples: 42048000 | consumed tokens: 86114304000 | elapsed time per iteration (s): 0.08 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 4.514719E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.067 | TFLOPs: 11.83 | 7: iteration 164260/ 173500 | consumed samples: 42050560 | consumed tokens: 86119546880 | elapsed time per iteration (s): 0.08 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 4.504281E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.197 | TFLOPs: 11.81 | 7: iteration 164270/ 173500 | consumed samples: 42053120 | consumed tokens: 86124789760 | elapsed time per iteration (s): 0.08 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 4.515108E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.308 | TFLOPs: 11.80 | 7: iteration 164280/ 173500 | consumed samples: 42055680 | consumed tokens: 86130032640 | elapsed time per iteration (s): 0.08 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 4.504829E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.526 | TFLOPs: 11.78 | 7: iteration 164290/ 173500 | consumed samples: 42058240 | consumed tokens: 86135275520 | elapsed time per iteration (s): 0.08 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 4.498870E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.556 | TFLOPs: 11.83 | 7: iteration 164300/ 173500 | consumed samples: 42060800 | consumed tokens: 86140518400 | elapsed time per iteration (s): 0.08 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 4.508437E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.336 | TFLOPs: 11.82 | 7: iteration 164310/ 173500 | consumed samples: 42063360 | consumed tokens: 86145761280 | elapsed time per iteration (s): 0.08 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 4.507422E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.708 | TFLOPs: 11.82 | 7: iteration 164320/ 173500 | consumed samples: 42065920 | consumed tokens: 86151004160 | elapsed time per iteration (s): 0.08 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 4.502466E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.276 | TFLOPs: 11.80 | 7: iteration 164330/ 173500 | consumed samples: 42068480 | consumed tokens: 86156247040 | elapsed time per iteration (s): 0.08 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 4.497490E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.328 | TFLOPs: 11.76 | 7: iteration 164340/ 173500 | consumed samples: 42071040 | consumed tokens: 86161489920 | elapsed time per iteration (s): 0.08 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 4.505873E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.022 | TFLOPs: 11.83 | 7: iteration 164350/ 173500 | consumed samples: 42073600 | consumed tokens: 86166732800 | elapsed time per iteration (s): 0.08 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 4.515347E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.460 | TFLOPs: 11.81 | 7: iteration 164360/ 173500 | consumed samples: 42076160 | consumed tokens: 86171975680 | elapsed time per iteration (s): 0.08 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 4.508838E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.411 | TFLOPs: 11.68 | 7: iteration 164370/ 173500 | consumed samples: 42078720 | consumed tokens: 86177218560 | elapsed time per iteration (s): 0.08 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 4.486837E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.996 | TFLOPs: 11.73 | 7: iteration 164380/ 173500 | consumed samples: 42081280 | consumed tokens: 86182461440 | elapsed time per iteration (s): 0.08 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 4.503900E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.317 | TFLOPs: 11.77 | 7: iteration 164390/ 173500 | consumed samples: 42083840 | consumed tokens: 86187704320 | elapsed time per iteration (s): 0.08 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 4.505394E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.179 | TFLOPs: 11.80 | 7: iteration 164400/ 173500 | consumed samples: 42086400 | consumed tokens: 86192947200 | elapsed time per iteration (s): 0.08 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 4.506112E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.409 | TFLOPs: 11.86 | 7: iteration 164410/ 173500 | consumed samples: 42088960 | consumed tokens: 86198190080 | elapsed time per iteration (s): 0.08 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 4.487792E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.754 | TFLOPs: 11.94 | 7: iteration 164420/ 173500 | consumed samples: 42091520 | consumed tokens: 86203432960 | elapsed time per iteration (s): 0.08 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 4.502055E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.101 | TFLOPs: 11.64 | 7: iteration 164430/ 173500 | consumed samples: 42094080 | consumed tokens: 86208675840 | elapsed time per iteration (s): 0.09 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 4.499685E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.965 | TFLOPs: 10.85 | 7: iteration 164440/ 173500 | consumed samples: 42096640 | consumed tokens: 86213918720 | elapsed time per iteration (s): 0.08 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 4.501784E+00 | grad norm: 0.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.155 | TFLOPs: 11.96 | 7: iteration 164450/ 173500 | consumed samples: 42099200 | consumed tokens: 86219161600 | elapsed time per iteration (s): 0.08 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 4.506705E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.892 | TFLOPs: 11.94 | 7: iteration 164460/ 173500 | consumed samples: 42101760 | consumed tokens: 86224404480 | elapsed time per iteration (s): 0.08 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 4.519117E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.837 | TFLOPs: 11.97 | 7: iteration 164470/ 173500 | consumed samples: 42104320 | consumed tokens: 86229647360 | elapsed time per iteration (s): 0.09 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 4.513473E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.996 | TFLOPs: 11.15 | 7: iteration 164480/ 173500 | consumed samples: 42106880 | consumed tokens: 86234890240 | elapsed time per iteration (s): 0.08 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 4.519225E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.569 | TFLOPs: 11.95 | 7: iteration 164490/ 173500 | consumed samples: 42109440 | consumed tokens: 86240133120 | elapsed time per iteration (s): 0.08 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 4.499431E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3072.320 | TFLOPs: 11.43 | 7: iteration 164500/ 173500 | consumed samples: 42112000 | consumed tokens: 86245376000 | elapsed time per iteration (s): 0.08 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 4.504045E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.829 | TFLOPs: 11.85 | 7: iteration 164510/ 173500 | consumed samples: 42114560 | consumed tokens: 86250618880 | elapsed time per iteration (s): 0.08 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 4.509268E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.247 | TFLOPs: 11.95 | 7: iteration 164520/ 173500 | consumed samples: 42117120 | consumed tokens: 86255861760 | elapsed time per iteration (s): 0.08 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 4.503637E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.951 | TFLOPs: 11.90 | 7: iteration 164530/ 173500 | consumed samples: 42119680 | consumed tokens: 86261104640 | elapsed time per iteration (s): 0.08 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 4.502715E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.360 | TFLOPs: 11.96 | 7: iteration 164540/ 173500 | consumed samples: 42122240 | consumed tokens: 86266347520 | elapsed time per iteration (s): 0.08 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 4.495336E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.496 | TFLOPs: 11.91 | 7: iteration 164550/ 173500 | consumed samples: 42124800 | consumed tokens: 86271590400 | elapsed time per iteration (s): 0.08 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 4.491920E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.380 | TFLOPs: 11.88 | 7: iteration 164560/ 173500 | consumed samples: 42127360 | consumed tokens: 86276833280 | elapsed time per iteration (s): 0.08 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 4.497858E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.338 | TFLOPs: 11.95 | 7: iteration 164570/ 173500 | consumed samples: 42129920 | consumed tokens: 86282076160 | elapsed time per iteration (s): 0.08 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 4.511700E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.347 | TFLOPs: 11.95 | 7: iteration 164580/ 173500 | consumed samples: 42132480 | consumed tokens: 86287319040 | elapsed time per iteration (s): 0.08 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 4.508068E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.945 | TFLOPs: 11.90 | 7: iteration 164590/ 173500 | consumed samples: 42135040 | consumed tokens: 86292561920 | elapsed time per iteration (s): 0.08 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 4.511273E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.177 | TFLOPs: 11.93 | 7: iteration 164600/ 173500 | consumed samples: 42137600 | consumed tokens: 86297804800 | elapsed time per iteration (s): 0.08 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 4.509272E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.948 | TFLOPs: 11.88 | 7: iteration 164610/ 173500 | consumed samples: 42140160 | consumed tokens: 86303047680 | elapsed time per iteration (s): 0.08 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 4.504745E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.136 | TFLOPs: 11.88 | 7: iteration 164620/ 173500 | consumed samples: 42142720 | consumed tokens: 86308290560 | elapsed time per iteration (s): 0.08 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 4.510593E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.614 | TFLOPs: 11.98 | 7: iteration 164630/ 173500 | consumed samples: 42145280 | consumed tokens: 86313533440 | elapsed time per iteration (s): 0.08 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 4.504712E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.292 | TFLOPs: 11.94 | 7: iteration 164640/ 173500 | consumed samples: 42147840 | consumed tokens: 86318776320 | elapsed time per iteration (s): 0.08 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 4.518673E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3165.254 | TFLOPs: 11.77 | 7: iteration 164650/ 173500 | consumed samples: 42150400 | consumed tokens: 86324019200 | elapsed time per iteration (s): 0.08 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 4.509260E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.820 | TFLOPs: 11.87 | 7: iteration 164660/ 173500 | consumed samples: 42152960 | consumed tokens: 86329262080 | elapsed time per iteration (s): 0.08 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 4.508167E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.191 | TFLOPs: 11.90 | 7: iteration 164670/ 173500 | consumed samples: 42155520 | consumed tokens: 86334504960 | elapsed time per iteration (s): 0.08 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 4.505181E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.404 | TFLOPs: 11.91 | 7: iteration 164680/ 173500 | consumed samples: 42158080 | consumed tokens: 86339747840 | elapsed time per iteration (s): 0.08 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 4.502288E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.605 | TFLOPs: 11.84 | 7: iteration 164690/ 173500 | consumed samples: 42160640 | consumed tokens: 86344990720 | elapsed time per iteration (s): 0.08 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 4.502041E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.596 | TFLOPs: 11.92 | 7: iteration 164700/ 173500 | consumed samples: 42163200 | consumed tokens: 86350233600 | elapsed time per iteration (s): 0.08 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 4.503872E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.020 | TFLOPs: 11.92 | 7: iteration 164710/ 173500 | consumed samples: 42165760 | consumed tokens: 86355476480 | elapsed time per iteration (s): 0.08 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 4.507632E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.124 | TFLOPs: 11.94 | 7: iteration 164720/ 173500 | consumed samples: 42168320 | consumed tokens: 86360719360 | elapsed time per iteration (s): 0.08 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 4.498115E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.168 | TFLOPs: 11.86 | 7: iteration 164730/ 173500 | consumed samples: 42170880 | consumed tokens: 86365962240 | elapsed time per iteration (s): 0.08 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 4.515540E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.038 | TFLOPs: 11.94 | 7: iteration 164740/ 173500 | consumed samples: 42173440 | consumed tokens: 86371205120 | elapsed time per iteration (s): 0.08 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 4.493975E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.806 | TFLOPs: 11.99 | 7: iteration 164750/ 173500 | consumed samples: 42176000 | consumed tokens: 86376448000 | elapsed time per iteration (s): 0.08 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 4.501949E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.935 | TFLOPs: 11.91 | 7: iteration 164760/ 173500 | consumed samples: 42178560 | consumed tokens: 86381690880 | elapsed time per iteration (s): 0.08 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 4.518874E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.250 | TFLOPs: 11.91 | 7: iteration 164770/ 173500 | consumed samples: 42181120 | consumed tokens: 86386933760 | elapsed time per iteration (s): 0.08 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 4.507768E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.834 | TFLOPs: 11.96 | 7: iteration 164780/ 173500 | consumed samples: 42183680 | consumed tokens: 86392176640 | elapsed time per iteration (s): 0.08 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 4.503265E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.000 | TFLOPs: 11.89 | 7: iteration 164790/ 173500 | consumed samples: 42186240 | consumed tokens: 86397419520 | elapsed time per iteration (s): 0.08 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 4.495564E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.772 | TFLOPs: 11.93 | 7: iteration 164800/ 173500 | consumed samples: 42188800 | consumed tokens: 86402662400 | elapsed time per iteration (s): 0.08 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 4.507950E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.042 | TFLOPs: 11.86 | 7: iteration 164810/ 173500 | consumed samples: 42191360 | consumed tokens: 86407905280 | elapsed time per iteration (s): 0.08 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 4.510059E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.576 | TFLOPs: 11.89 | 7: iteration 164820/ 173500 | consumed samples: 42193920 | consumed tokens: 86413148160 | elapsed time per iteration (s): 0.08 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 4.506667E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.992 | TFLOPs: 11.90 | 7: iteration 164830/ 173500 | consumed samples: 42196480 | consumed tokens: 86418391040 | elapsed time per iteration (s): 0.08 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 4.508296E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.406 | TFLOPs: 11.90 | 7: iteration 164840/ 173500 | consumed samples: 42199040 | consumed tokens: 86423633920 | elapsed time per iteration (s): 0.08 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 4.508148E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.777 | TFLOPs: 11.90 | 7: iteration 164850/ 173500 | consumed samples: 42201600 | consumed tokens: 86428876800 | elapsed time per iteration (s): 0.08 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 4.501062E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.256 | TFLOPs: 11.85 | 7: iteration 164860/ 173500 | consumed samples: 42204160 | consumed tokens: 86434119680 | elapsed time per iteration (s): 0.08 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 4.506469E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.841 | TFLOPs: 11.85 | 7: iteration 164870/ 173500 | consumed samples: 42206720 | consumed tokens: 86439362560 | elapsed time per iteration (s): 0.08 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 4.493912E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.171 | TFLOPs: 11.91 | 7: iteration 164880/ 173500 | consumed samples: 42209280 | consumed tokens: 86444605440 | elapsed time per iteration (s): 0.08 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 4.497910E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.046 | TFLOPs: 11.90 | 7: iteration 164890/ 173500 | consumed samples: 42211840 | consumed tokens: 86449848320 | elapsed time per iteration (s): 0.08 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 4.498594E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.649 | TFLOPs: 11.92 | 7: iteration 164900/ 173500 | consumed samples: 42214400 | consumed tokens: 86455091200 | elapsed time per iteration (s): 0.08 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 4.497690E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.839 | TFLOPs: 11.85 | 7: iteration 164910/ 173500 | consumed samples: 42216960 | consumed tokens: 86460334080 | elapsed time per iteration (s): 0.08 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 4.516994E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.371 | TFLOPs: 11.95 | 7: iteration 164920/ 173500 | consumed samples: 42219520 | consumed tokens: 86465576960 | elapsed time per iteration (s): 0.08 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 4.492368E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.463 | TFLOPs: 11.93 | 7: iteration 164930/ 173500 | consumed samples: 42222080 | consumed tokens: 86470819840 | elapsed time per iteration (s): 0.08 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 4.503217E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.948 | TFLOPs: 11.98 | 7: iteration 164940/ 173500 | consumed samples: 42224640 | consumed tokens: 86476062720 | elapsed time per iteration (s): 0.08 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 4.509692E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.257 | TFLOPs: 12.04 | 7: iteration 164950/ 173500 | consumed samples: 42227200 | consumed tokens: 86481305600 | elapsed time per iteration (s): 0.08 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 4.516793E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.242 | TFLOPs: 12.04 | 7: iteration 164960/ 173500 | consumed samples: 42229760 | consumed tokens: 86486548480 | elapsed time per iteration (s): 0.08 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 4.507783E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.416 | TFLOPs: 12.02 | 7: iteration 164970/ 173500 | consumed samples: 42232320 | consumed tokens: 86491791360 | elapsed time per iteration (s): 0.08 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 4.504716E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.266 | TFLOPs: 12.03 | 7: iteration 164980/ 173500 | consumed samples: 42234880 | consumed tokens: 86497034240 | elapsed time per iteration (s): 0.08 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 4.514976E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.865 | TFLOPs: 12.05 | 7: iteration 164990/ 173500 | consumed samples: 42237440 | consumed tokens: 86502277120 | elapsed time per iteration (s): 0.08 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 4.505263E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.880 | TFLOPs: 12.05 | 7: iteration 165000/ 173500 | consumed samples: 42240000 | consumed tokens: 86507520000 | elapsed time per iteration (s): 0.08 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 4.500311E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.647 | TFLOPs: 12.06 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 165000 | lm loss value: 4.406357E+00 | lm loss PPL: 8.197029E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 165000 to checkpoints_14m91b100m 0: [2023-03-17 04:17:06,965] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step165000 is begin to save! 0: [2023-03-17 04:17:06,968] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:17:06,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:17:06,995] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:17:06,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:17:06,998] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:17:07,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:17:07,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:17:07,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:17:07,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:17:07,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:17:07,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:17:07,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:17:07,008] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step165000/mp_rank_00_model_states.pt 0: [2023-03-17 04:17:07,008] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:17:07,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:17:07,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 5: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 5: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 5: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 2: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 5: [2023-03-17 04:17:07,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 5: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 1: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 4: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 6: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 04:17:07,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step165000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 3: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 7: [2023-03-17 04:17:07,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step165000 is ready now! 0: successfully saved checkpoint at iteration 165000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.04 7: iteration 165010/ 173500 | consumed samples: 42242560 | consumed tokens: 86512762880 | elapsed time per iteration (s): 0.09 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 4.502037E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2803.121 | TFLOPs: 10.43 | 7: iteration 165020/ 173500 | consumed samples: 42245120 | consumed tokens: 86518005760 | elapsed time per iteration (s): 0.08 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 4.496880E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.804 | TFLOPs: 11.96 | 7: iteration 165030/ 173500 | consumed samples: 42247680 | consumed tokens: 86523248640 | elapsed time per iteration (s): 0.08 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 4.504143E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.560 | TFLOPs: 11.97 | 7: iteration 165040/ 173500 | consumed samples: 42250240 | consumed tokens: 86528491520 | elapsed time per iteration (s): 0.08 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 4.504615E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.892 | TFLOPs: 11.86 | 7: iteration 165050/ 173500 | consumed samples: 42252800 | consumed tokens: 86533734400 | elapsed time per iteration (s): 0.08 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 4.509238E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.347 | TFLOPs: 11.89 | 7: iteration 165060/ 173500 | consumed samples: 42255360 | consumed tokens: 86538977280 | elapsed time per iteration (s): 0.08 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 4.503431E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.607 | TFLOPs: 11.89 | 7: iteration 165070/ 173500 | consumed samples: 42257920 | consumed tokens: 86544220160 | elapsed time per iteration (s): 0.08 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 4.517503E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.975 | TFLOPs: 11.87 | 7: iteration 165080/ 173500 | consumed samples: 42260480 | consumed tokens: 86549463040 | elapsed time per iteration (s): 0.08 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 4.505891E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.557 | TFLOPs: 11.85 | 7: iteration 165090/ 173500 | consumed samples: 42263040 | consumed tokens: 86554705920 | elapsed time per iteration (s): 0.08 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 4.507150E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.780 | TFLOPs: 11.83 | 7: iteration 165100/ 173500 | consumed samples: 42265600 | consumed tokens: 86559948800 | elapsed time per iteration (s): 0.08 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 4.508167E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.997 | TFLOPs: 11.91 | 7: iteration 165110/ 173500 | consumed samples: 42268160 | consumed tokens: 86565191680 | elapsed time per iteration (s): 0.10 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 4.498932E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2662.439 | TFLOPs: 9.90 | 7: iteration 165120/ 173500 | consumed samples: 42270720 | consumed tokens: 86570434560 | elapsed time per iteration (s): 0.10 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 4.498523E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2483.076 | TFLOPs: 9.24 | 7: iteration 165130/ 173500 | consumed samples: 42273280 | consumed tokens: 86575677440 | elapsed time per iteration (s): 0.08 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 4.511123E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.834 | TFLOPs: 11.95 | 7: iteration 165140/ 173500 | consumed samples: 42275840 | consumed tokens: 86580920320 | elapsed time per iteration (s): 0.08 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 4.505208E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.801 | TFLOPs: 11.96 | 7: iteration 165150/ 173500 | consumed samples: 42278400 | consumed tokens: 86586163200 | elapsed time per iteration (s): 0.08 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 4.512096E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.429 | TFLOPs: 11.93 | 7: iteration 165160/ 173500 | consumed samples: 42280960 | consumed tokens: 86591406080 | elapsed time per iteration (s): 0.08 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 4.494751E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.421 | TFLOPs: 11.98 | 7: iteration 165170/ 173500 | consumed samples: 42283520 | consumed tokens: 86596648960 | elapsed time per iteration (s): 0.08 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 4.510794E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.113 | TFLOPs: 11.98 | 7: iteration 165180/ 173500 | consumed samples: 42286080 | consumed tokens: 86601891840 | elapsed time per iteration (s): 0.08 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 4.496077E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.523 | TFLOPs: 12.04 | 7: iteration 165190/ 173500 | consumed samples: 42288640 | consumed tokens: 86607134720 | elapsed time per iteration (s): 0.08 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 4.503726E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.263 | TFLOPs: 12.03 | 7: iteration 165200/ 173500 | consumed samples: 42291200 | consumed tokens: 86612377600 | elapsed time per iteration (s): 0.08 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 4.500983E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.743 | TFLOPs: 11.96 | 7: iteration 165210/ 173500 | consumed samples: 42293760 | consumed tokens: 86617620480 | elapsed time per iteration (s): 0.08 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 4.501366E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.361 | TFLOPs: 12.02 | 7: iteration 165220/ 173500 | consumed samples: 42296320 | consumed tokens: 86622863360 | elapsed time per iteration (s): 0.08 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 4.500124E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.887 | TFLOPs: 11.88 | 7: iteration 165230/ 173500 | consumed samples: 42298880 | consumed tokens: 86628106240 | elapsed time per iteration (s): 0.08 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 4.501505E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.516 | TFLOPs: 11.80 | 7: iteration 165240/ 173500 | consumed samples: 42301440 | consumed tokens: 86633349120 | elapsed time per iteration (s): 0.08 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 4.494025E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3170.892 | TFLOPs: 11.79 | 7: iteration 165250/ 173500 | consumed samples: 42304000 | consumed tokens: 86638592000 | elapsed time per iteration (s): 0.08 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 4.506509E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.086 | TFLOPs: 11.91 | 7: iteration 165260/ 173500 | consumed samples: 42306560 | consumed tokens: 86643834880 | elapsed time per iteration (s): 0.08 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 4.501494E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.572 | TFLOPs: 11.82 | 7: iteration 165270/ 173500 | consumed samples: 42309120 | consumed tokens: 86649077760 | elapsed time per iteration (s): 0.08 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 4.503550E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.531 | TFLOPs: 11.73 | 7: iteration 165280/ 173500 | consumed samples: 42311680 | consumed tokens: 86654320640 | elapsed time per iteration (s): 0.08 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 4.498251E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.139 | TFLOPs: 11.85 | 7: iteration 165290/ 173500 | consumed samples: 42314240 | consumed tokens: 86659563520 | elapsed time per iteration (s): 0.08 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 4.507804E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.807 | TFLOPs: 11.29 | 7: iteration 165300/ 173500 | consumed samples: 42316800 | consumed tokens: 86664806400 | elapsed time per iteration (s): 0.08 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 4.511693E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.865 | TFLOPs: 11.78 | 7: iteration 165310/ 173500 | consumed samples: 42319360 | consumed tokens: 86670049280 | elapsed time per iteration (s): 0.08 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 4.503104E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3163.657 | TFLOPs: 11.77 | 7: iteration 165320/ 173500 | consumed samples: 42321920 | consumed tokens: 86675292160 | elapsed time per iteration (s): 0.08 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 4.508840E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.983 | TFLOPs: 11.85 | 7: iteration 165330/ 173500 | consumed samples: 42324480 | consumed tokens: 86680535040 | elapsed time per iteration (s): 0.08 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 4.508904E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.035 | TFLOPs: 11.81 | 7: iteration 165340/ 173500 | consumed samples: 42327040 | consumed tokens: 86685777920 | elapsed time per iteration (s): 0.08 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 4.504115E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.822 | TFLOPs: 11.82 | 7: iteration 165350/ 173500 | consumed samples: 42329600 | consumed tokens: 86691020800 | elapsed time per iteration (s): 0.08 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 4.503154E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.164 | TFLOPs: 11.83 | 7: iteration 165360/ 173500 | consumed samples: 42332160 | consumed tokens: 86696263680 | elapsed time per iteration (s): 0.08 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 4.495164E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.847 | TFLOPs: 11.84 | 7: iteration 165370/ 173500 | consumed samples: 42334720 | consumed tokens: 86701506560 | elapsed time per iteration (s): 0.08 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 4.516156E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.730 | TFLOPs: 11.83 | 7: iteration 165380/ 173500 | consumed samples: 42337280 | consumed tokens: 86706749440 | elapsed time per iteration (s): 0.08 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 4.498345E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.946 | TFLOPs: 11.74 | 7: iteration 165390/ 173500 | consumed samples: 42339840 | consumed tokens: 86711992320 | elapsed time per iteration (s): 0.08 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 4.504845E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.748 | TFLOPs: 11.82 | 7: iteration 165400/ 173500 | consumed samples: 42342400 | consumed tokens: 86717235200 | elapsed time per iteration (s): 0.08 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 4.499330E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.829 | TFLOPs: 11.63 | 7: iteration 165410/ 173500 | consumed samples: 42344960 | consumed tokens: 86722478080 | elapsed time per iteration (s): 0.08 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 4.491081E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.124 | TFLOPs: 11.82 | 7: iteration 165420/ 173500 | consumed samples: 42347520 | consumed tokens: 86727720960 | elapsed time per iteration (s): 0.08 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 4.497542E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.430 | TFLOPs: 11.81 | 7: iteration 165430/ 173500 | consumed samples: 42350080 | consumed tokens: 86732963840 | elapsed time per iteration (s): 0.08 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 4.514891E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.030 | TFLOPs: 11.87 | 7: iteration 165440/ 173500 | consumed samples: 42352640 | consumed tokens: 86738206720 | elapsed time per iteration (s): 0.08 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 4.504529E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.594 | TFLOPs: 11.86 | 7: iteration 165450/ 173500 | consumed samples: 42355200 | consumed tokens: 86743449600 | elapsed time per iteration (s): 0.08 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 4.504487E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.093 | TFLOPs: 11.91 | 7: iteration 165460/ 173500 | consumed samples: 42357760 | consumed tokens: 86748692480 | elapsed time per iteration (s): 0.08 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 4.491408E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.770 | TFLOPs: 11.90 | 7: iteration 165470/ 173500 | consumed samples: 42360320 | consumed tokens: 86753935360 | elapsed time per iteration (s): 0.08 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 4.517540E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.710 | TFLOPs: 11.79 | 7: iteration 165480/ 173500 | consumed samples: 42362880 | consumed tokens: 86759178240 | elapsed time per iteration (s): 0.08 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 4.509551E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.700 | TFLOPs: 11.88 | 7: iteration 165490/ 173500 | consumed samples: 42365440 | consumed tokens: 86764421120 | elapsed time per iteration (s): 0.08 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 4.507325E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.674 | TFLOPs: 11.78 | 7: iteration 165500/ 173500 | consumed samples: 42368000 | consumed tokens: 86769664000 | elapsed time per iteration (s): 0.08 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 4.495847E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.596 | TFLOPs: 11.83 | 7: iteration 165510/ 173500 | consumed samples: 42370560 | consumed tokens: 86774906880 | elapsed time per iteration (s): 0.08 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 4.489070E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.856 | TFLOPs: 11.86 | 7: iteration 165520/ 173500 | consumed samples: 42373120 | consumed tokens: 86780149760 | elapsed time per iteration (s): 0.08 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 4.507788E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.746 | TFLOPs: 11.88 | 7: iteration 165530/ 173500 | consumed samples: 42375680 | consumed tokens: 86785392640 | elapsed time per iteration (s): 0.08 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 4.520344E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.815 | TFLOPs: 11.90 | 7: iteration 165540/ 173500 | consumed samples: 42378240 | consumed tokens: 86790635520 | elapsed time per iteration (s): 0.08 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 4.500984E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.000 | TFLOPs: 11.94 | 7: iteration 165550/ 173500 | consumed samples: 42380800 | consumed tokens: 86795878400 | elapsed time per iteration (s): 0.08 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 4.501210E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.898 | TFLOPs: 11.94 | 7: iteration 165560/ 173500 | consumed samples: 42383360 | consumed tokens: 86801121280 | elapsed time per iteration (s): 0.08 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 4.505624E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.347 | TFLOPs: 11.93 | 7: iteration 165570/ 173500 | consumed samples: 42385920 | consumed tokens: 86806364160 | elapsed time per iteration (s): 0.08 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 4.499610E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.972 | TFLOPs: 11.89 | 7: iteration 165580/ 173500 | consumed samples: 42388480 | consumed tokens: 86811607040 | elapsed time per iteration (s): 0.08 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 4.513522E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.896 | TFLOPs: 11.93 | 7: iteration 165590/ 173500 | consumed samples: 42391040 | consumed tokens: 86816849920 | elapsed time per iteration (s): 0.08 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 4.500428E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.045 | TFLOPs: 11.94 | 7: iteration 165600/ 173500 | consumed samples: 42393600 | consumed tokens: 86822092800 | elapsed time per iteration (s): 0.08 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 4.514680E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.272 | TFLOPs: 11.97 | 7: iteration 165610/ 173500 | consumed samples: 42396160 | consumed tokens: 86827335680 | elapsed time per iteration (s): 0.08 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 4.508635E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.410 | TFLOPs: 11.95 | 7: iteration 165620/ 173500 | consumed samples: 42398720 | consumed tokens: 86832578560 | elapsed time per iteration (s): 0.08 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 4.489282E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.200 | TFLOPs: 11.91 | 7: iteration 165630/ 173500 | consumed samples: 42401280 | consumed tokens: 86837821440 | elapsed time per iteration (s): 0.08 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 4.515651E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.001 | TFLOPs: 11.96 | 7: iteration 165640/ 173500 | consumed samples: 42403840 | consumed tokens: 86843064320 | elapsed time per iteration (s): 0.08 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 4.516616E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.761 | TFLOPs: 11.89 | 7: iteration 165650/ 173500 | consumed samples: 42406400 | consumed tokens: 86848307200 | elapsed time per iteration (s): 0.09 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 4.495838E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.433 | TFLOPs: 11.01 | 7: iteration 165660/ 173500 | consumed samples: 42408960 | consumed tokens: 86853550080 | elapsed time per iteration (s): 0.08 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 4.502697E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.998 | TFLOPs: 11.84 | 7: iteration 165670/ 173500 | consumed samples: 42411520 | consumed tokens: 86858792960 | elapsed time per iteration (s): 0.08 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 4.500938E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.121 | TFLOPs: 11.84 | 7: iteration 165680/ 173500 | consumed samples: 42414080 | consumed tokens: 86864035840 | elapsed time per iteration (s): 0.08 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 4.505151E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.885 | TFLOPs: 11.56 | 7: iteration 165690/ 173500 | consumed samples: 42416640 | consumed tokens: 86869278720 | elapsed time per iteration (s): 0.08 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 4.510779E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.935 | TFLOPs: 11.22 | 7: iteration 165700/ 173500 | consumed samples: 42419200 | consumed tokens: 86874521600 | elapsed time per iteration (s): 0.08 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 4.503367E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.539 | TFLOPs: 11.80 | 7: iteration 165710/ 173500 | consumed samples: 42421760 | consumed tokens: 86879764480 | elapsed time per iteration (s): 0.08 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 4.495584E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3105.014 | TFLOPs: 11.55 | 7: iteration 165720/ 173500 | consumed samples: 42424320 | consumed tokens: 86885007360 | elapsed time per iteration (s): 0.08 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 4.498885E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.729 | TFLOPs: 11.91 | 7: iteration 165730/ 173500 | consumed samples: 42426880 | consumed tokens: 86890250240 | elapsed time per iteration (s): 0.08 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 4.516746E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.031 | TFLOPs: 11.91 | 7: iteration 165740/ 173500 | consumed samples: 42429440 | consumed tokens: 86895493120 | elapsed time per iteration (s): 0.08 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 4.505965E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.521 | TFLOPs: 11.90 | 7: iteration 165750/ 173500 | consumed samples: 42432000 | consumed tokens: 86900736000 | elapsed time per iteration (s): 0.08 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 4.512967E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.175 | TFLOPs: 11.90 | 7: iteration 165760/ 173500 | consumed samples: 42434560 | consumed tokens: 86905978880 | elapsed time per iteration (s): 0.08 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 4.499909E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.810 | TFLOPs: 11.95 | 7: iteration 165770/ 173500 | consumed samples: 42437120 | consumed tokens: 86911221760 | elapsed time per iteration (s): 0.08 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 4.494584E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.927 | TFLOPs: 11.93 | 7: iteration 165780/ 173500 | consumed samples: 42439680 | consumed tokens: 86916464640 | elapsed time per iteration (s): 0.08 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 4.501561E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.017 | TFLOPs: 11.92 | 7: iteration 165790/ 173500 | consumed samples: 42442240 | consumed tokens: 86921707520 | elapsed time per iteration (s): 0.08 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 4.503122E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.280 | TFLOPs: 11.95 | 7: iteration 165800/ 173500 | consumed samples: 42444800 | consumed tokens: 86926950400 | elapsed time per iteration (s): 0.08 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 4.522344E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.718 | TFLOPs: 12.01 | 7: iteration 165810/ 173500 | consumed samples: 42447360 | consumed tokens: 86932193280 | elapsed time per iteration (s): 0.08 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 4.482920E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.473 | TFLOPs: 12.00 | 7: iteration 165820/ 173500 | consumed samples: 42449920 | consumed tokens: 86937436160 | elapsed time per iteration (s): 0.08 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 4.492236E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.993 | TFLOPs: 12.00 | 7: iteration 165830/ 173500 | consumed samples: 42452480 | consumed tokens: 86942679040 | elapsed time per iteration (s): 0.08 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 4.503720E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.433 | TFLOPs: 11.99 | 7: iteration 165840/ 173500 | consumed samples: 42455040 | consumed tokens: 86947921920 | elapsed time per iteration (s): 0.08 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 4.511831E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.150 | TFLOPs: 11.91 | 7: iteration 165850/ 173500 | consumed samples: 42457600 | consumed tokens: 86953164800 | elapsed time per iteration (s): 0.08 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 4.500126E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.919 | TFLOPs: 11.97 | 7: iteration 165860/ 173500 | consumed samples: 42460160 | consumed tokens: 86958407680 | elapsed time per iteration (s): 0.08 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 4.500217E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.822 | TFLOPs: 11.94 | 7: iteration 165870/ 173500 | consumed samples: 42462720 | consumed tokens: 86963650560 | elapsed time per iteration (s): 0.08 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 4.508781E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.235 | TFLOPs: 12.02 | 7: iteration 165880/ 173500 | consumed samples: 42465280 | consumed tokens: 86968893440 | elapsed time per iteration (s): 0.08 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 4.496953E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.133 | TFLOPs: 12.03 | 7: iteration 165890/ 173500 | consumed samples: 42467840 | consumed tokens: 86974136320 | elapsed time per iteration (s): 0.08 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 4.493476E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.451 | TFLOPs: 11.98 | 7: iteration 165900/ 173500 | consumed samples: 42470400 | consumed tokens: 86979379200 | elapsed time per iteration (s): 0.08 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 4.491423E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.563 | TFLOPs: 12.02 | 7: iteration 165910/ 173500 | consumed samples: 42472960 | consumed tokens: 86984622080 | elapsed time per iteration (s): 0.08 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 4.499994E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.786 | TFLOPs: 11.98 | 7: iteration 165920/ 173500 | consumed samples: 42475520 | consumed tokens: 86989864960 | elapsed time per iteration (s): 0.08 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 4.511647E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.212 | TFLOPs: 12.02 | 7: iteration 165930/ 173500 | consumed samples: 42478080 | consumed tokens: 86995107840 | elapsed time per iteration (s): 0.08 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 4.499681E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.183 | TFLOPs: 12.03 | 7: iteration 165940/ 173500 | consumed samples: 42480640 | consumed tokens: 87000350720 | elapsed time per iteration (s): 0.08 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 4.524307E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.020 | TFLOPs: 12.03 | 7: iteration 165950/ 173500 | consumed samples: 42483200 | consumed tokens: 87005593600 | elapsed time per iteration (s): 0.08 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 4.507619E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.464 | TFLOPs: 12.02 | 7: iteration 165960/ 173500 | consumed samples: 42485760 | consumed tokens: 87010836480 | elapsed time per iteration (s): 0.08 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 4.512540E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.279 | TFLOPs: 12.02 | 7: iteration 165970/ 173500 | consumed samples: 42488320 | consumed tokens: 87016079360 | elapsed time per iteration (s): 0.08 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 4.499088E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.236 | TFLOPs: 11.99 | 7: iteration 165980/ 173500 | consumed samples: 42490880 | consumed tokens: 87021322240 | elapsed time per iteration (s): 0.08 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 4.501498E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.443 | TFLOPs: 11.98 | 7: iteration 165990/ 173500 | consumed samples: 42493440 | consumed tokens: 87026565120 | elapsed time per iteration (s): 0.08 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 4.518430E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.953 | TFLOPs: 11.99 | 0: [2023-03-17 04:18:27,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=166000, skipped=0, lr=[2.0845563261196566e-05, 2.0845563261196566e-05, 2.0845563261196566e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 166000/ 173500 | consumed samples: 42496000 | consumed tokens: 87031808000 | elapsed time per iteration (s): 0.08 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 4.498255E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.398 | TFLOPs: 11.95 | 0: steps: 166000 loss: 4.5325 iter time (s): 0.080 samples/sec: 3201.004 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 166000 | lm loss value: 4.432772E+00 | lm loss PPL: 8.416437E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 166000 to checkpoints_14m91b100m 0: [2023-03-17 04:18:27,674] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step166000 is begin to save! 0: [2023-03-17 04:18:27,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:18:27,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:18:27,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:18:27,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:18:27,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:18:27,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:18:27,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:18:27,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:18:27,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:18:27,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:18:27,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:18:27,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:18:27,716] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step166000/mp_rank_00_model_states.pt 0: [2023-03-17 04:18:27,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:18:27,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:18:27,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:18:27,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 6: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 4: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 2: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 3: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 5: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 7: [2023-03-17 04:18:27,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:18:27,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 1: [2023-03-17 04:18:27,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:18:27,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step166000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:18:27,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step166000 is ready now! 0: successfully saved checkpoint at iteration 166000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.49 7: iteration 166010/ 173500 | consumed samples: 42498560 | consumed tokens: 87037050880 | elapsed time per iteration (s): 0.09 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 4.493317E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.437 | TFLOPs: 10.10 | 7: iteration 166020/ 173500 | consumed samples: 42501120 | consumed tokens: 87042293760 | elapsed time per iteration (s): 0.08 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 4.496296E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.139 | TFLOPs: 11.92 | 7: iteration 166030/ 173500 | consumed samples: 42503680 | consumed tokens: 87047536640 | elapsed time per iteration (s): 0.08 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 4.480836E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.756 | TFLOPs: 11.96 | 7: iteration 166040/ 173500 | consumed samples: 42506240 | consumed tokens: 87052779520 | elapsed time per iteration (s): 0.08 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 4.500283E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.395 | TFLOPs: 11.95 | 7: iteration 166050/ 173500 | consumed samples: 42508800 | consumed tokens: 87058022400 | elapsed time per iteration (s): 0.08 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 4.512146E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.222 | TFLOPs: 11.98 | 7: iteration 166060/ 173500 | consumed samples: 42511360 | consumed tokens: 87063265280 | elapsed time per iteration (s): 0.08 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 4.514942E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.797 | TFLOPs: 11.92 | 7: iteration 166070/ 173500 | consumed samples: 42513920 | consumed tokens: 87068508160 | elapsed time per iteration (s): 0.08 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 4.503865E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.670 | TFLOPs: 11.87 | 7: iteration 166080/ 173500 | consumed samples: 42516480 | consumed tokens: 87073751040 | elapsed time per iteration (s): 0.08 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 4.505717E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3139.114 | TFLOPs: 11.68 | 7: iteration 166090/ 173500 | consumed samples: 42519040 | consumed tokens: 87078993920 | elapsed time per iteration (s): 0.08 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 4.506137E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.590 | TFLOPs: 11.95 | 7: iteration 166100/ 173500 | consumed samples: 42521600 | consumed tokens: 87084236800 | elapsed time per iteration (s): 0.08 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 4.498475E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.565 | TFLOPs: 11.95 | 7: iteration 166110/ 173500 | consumed samples: 42524160 | consumed tokens: 87089479680 | elapsed time per iteration (s): 0.08 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 4.498206E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.293 | TFLOPs: 11.86 | 7: iteration 166120/ 173500 | consumed samples: 42526720 | consumed tokens: 87094722560 | elapsed time per iteration (s): 0.08 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 4.492878E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.777 | TFLOPs: 11.94 | 7: iteration 166130/ 173500 | consumed samples: 42529280 | consumed tokens: 87099965440 | elapsed time per iteration (s): 0.08 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 4.517867E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.896 | TFLOPs: 11.98 | 7: iteration 166140/ 173500 | consumed samples: 42531840 | consumed tokens: 87105208320 | elapsed time per iteration (s): 0.08 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 4.497214E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.517 | TFLOPs: 11.86 | 7: iteration 166150/ 173500 | consumed samples: 42534400 | consumed tokens: 87110451200 | elapsed time per iteration (s): 0.08 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 4.494385E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.115 | TFLOPs: 11.96 | 7: iteration 166160/ 173500 | consumed samples: 42536960 | consumed tokens: 87115694080 | elapsed time per iteration (s): 0.10 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 4.508891E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2575.484 | TFLOPs: 9.58 | 7: iteration 166170/ 173500 | consumed samples: 42539520 | consumed tokens: 87120936960 | elapsed time per iteration (s): 0.08 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 4.504222E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.673 | TFLOPs: 11.93 | 7: iteration 166180/ 173500 | consumed samples: 42542080 | consumed tokens: 87126179840 | elapsed time per iteration (s): 0.08 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 4.497523E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.895 | TFLOPs: 11.97 | 7: iteration 166190/ 173500 | consumed samples: 42544640 | consumed tokens: 87131422720 | elapsed time per iteration (s): 0.08 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 4.513725E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.121 | TFLOPs: 11.92 | 7: iteration 166200/ 173500 | consumed samples: 42547200 | consumed tokens: 87136665600 | elapsed time per iteration (s): 0.08 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 4.498526E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.775 | TFLOPs: 11.93 | 7: iteration 166210/ 173500 | consumed samples: 42549760 | consumed tokens: 87141908480 | elapsed time per iteration (s): 0.08 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 4.485977E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.917 | TFLOPs: 11.96 | 7: iteration 166220/ 173500 | consumed samples: 42552320 | consumed tokens: 87147151360 | elapsed time per iteration (s): 0.08 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 4.505397E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.935 | TFLOPs: 11.97 | 7: iteration 166230/ 173500 | consumed samples: 42554880 | consumed tokens: 87152394240 | elapsed time per iteration (s): 0.08 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 4.503629E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.959 | TFLOPs: 11.93 | 7: iteration 166240/ 173500 | consumed samples: 42557440 | consumed tokens: 87157637120 | elapsed time per iteration (s): 0.08 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 4.500486E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.349 | TFLOPs: 11.68 | 7: iteration 166250/ 173500 | consumed samples: 42560000 | consumed tokens: 87162880000 | elapsed time per iteration (s): 0.08 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 4.501877E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.156 | TFLOPs: 11.96 | 7: iteration 166260/ 173500 | consumed samples: 42562560 | consumed tokens: 87168122880 | elapsed time per iteration (s): 0.08 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 4.503046E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.124 | TFLOPs: 11.94 | 7: iteration 166270/ 173500 | consumed samples: 42565120 | consumed tokens: 87173365760 | elapsed time per iteration (s): 0.08 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 4.511373E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.535 | TFLOPs: 11.95 | 7: iteration 166280/ 173500 | consumed samples: 42567680 | consumed tokens: 87178608640 | elapsed time per iteration (s): 0.08 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 4.503694E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.689 | TFLOPs: 11.95 | 7: iteration 166290/ 173500 | consumed samples: 42570240 | consumed tokens: 87183851520 | elapsed time per iteration (s): 0.08 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 4.492184E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3187.574 | TFLOPs: 11.86 | 7: iteration 166300/ 173500 | consumed samples: 42572800 | consumed tokens: 87189094400 | elapsed time per iteration (s): 0.08 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 4.502976E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.050 | TFLOPs: 11.95 | 7: iteration 166310/ 173500 | consumed samples: 42575360 | consumed tokens: 87194337280 | elapsed time per iteration (s): 0.08 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 4.500776E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.205 | TFLOPs: 11.94 | 7: iteration 166320/ 173500 | consumed samples: 42577920 | consumed tokens: 87199580160 | elapsed time per iteration (s): 0.08 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 4.507740E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.627 | TFLOPs: 11.92 | 7: iteration 166330/ 173500 | consumed samples: 42580480 | consumed tokens: 87204823040 | elapsed time per iteration (s): 0.08 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 4.515796E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.384 | TFLOPs: 11.93 | 7: iteration 166340/ 173500 | consumed samples: 42583040 | consumed tokens: 87210065920 | elapsed time per iteration (s): 0.08 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 4.504511E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.125 | TFLOPs: 11.94 | 7: iteration 166350/ 173500 | consumed samples: 42585600 | consumed tokens: 87215308800 | elapsed time per iteration (s): 0.08 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 4.496363E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.504 | TFLOPs: 11.96 | 7: iteration 166360/ 173500 | consumed samples: 42588160 | consumed tokens: 87220551680 | elapsed time per iteration (s): 0.08 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 4.501120E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.585 | TFLOPs: 11.87 | 7: iteration 166370/ 173500 | consumed samples: 42590720 | consumed tokens: 87225794560 | elapsed time per iteration (s): 0.08 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 4.513425E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.370 | TFLOPs: 11.94 | 7: iteration 166380/ 173500 | consumed samples: 42593280 | consumed tokens: 87231037440 | elapsed time per iteration (s): 0.09 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 4.513676E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2989.106 | TFLOPs: 11.12 | 7: iteration 166390/ 173500 | consumed samples: 42595840 | consumed tokens: 87236280320 | elapsed time per iteration (s): 0.09 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 4.507583E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2754.682 | TFLOPs: 10.25 | 7: iteration 166400/ 173500 | consumed samples: 42598400 | consumed tokens: 87241523200 | elapsed time per iteration (s): 0.08 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 4.515913E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.345 | TFLOPs: 11.86 | 7: iteration 166410/ 173500 | consumed samples: 42600960 | consumed tokens: 87246766080 | elapsed time per iteration (s): 0.08 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 4.501501E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.930 | TFLOPs: 11.89 | 7: iteration 166420/ 173500 | consumed samples: 42603520 | consumed tokens: 87252008960 | elapsed time per iteration (s): 0.08 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 4.502291E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.278 | TFLOPs: 11.87 | 7: iteration 166430/ 173500 | consumed samples: 42606080 | consumed tokens: 87257251840 | elapsed time per iteration (s): 0.08 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 4.509354E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.155 | TFLOPs: 11.87 | 7: iteration 166440/ 173500 | consumed samples: 42608640 | consumed tokens: 87262494720 | elapsed time per iteration (s): 0.08 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 4.502836E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.166 | TFLOPs: 11.85 | 7: iteration 166450/ 173500 | consumed samples: 42611200 | consumed tokens: 87267737600 | elapsed time per iteration (s): 0.08 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 4.508460E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.643 | TFLOPs: 11.93 | 7: iteration 166460/ 173500 | consumed samples: 42613760 | consumed tokens: 87272980480 | elapsed time per iteration (s): 0.08 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 4.492091E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.724 | TFLOPs: 11.87 | 7: iteration 166470/ 173500 | consumed samples: 42616320 | consumed tokens: 87278223360 | elapsed time per iteration (s): 0.08 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 4.503348E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.048 | TFLOPs: 11.85 | 7: iteration 166480/ 173500 | consumed samples: 42618880 | consumed tokens: 87283466240 | elapsed time per iteration (s): 0.08 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 4.496925E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3172.909 | TFLOPs: 11.80 | 7: iteration 166490/ 173500 | consumed samples: 42621440 | consumed tokens: 87288709120 | elapsed time per iteration (s): 0.08 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 4.507387E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.134 | TFLOPs: 11.78 | 7: iteration 166500/ 173500 | consumed samples: 42624000 | consumed tokens: 87293952000 | elapsed time per iteration (s): 0.08 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 4.512672E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3164.079 | TFLOPs: 11.77 | 7: iteration 166510/ 173500 | consumed samples: 42626560 | consumed tokens: 87299194880 | elapsed time per iteration (s): 0.09 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 4.490516E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2988.627 | TFLOPs: 11.12 | 7: iteration 166520/ 173500 | consumed samples: 42629120 | consumed tokens: 87304437760 | elapsed time per iteration (s): 0.13 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 4.502787E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1982.403 | TFLOPs: 7.37 | 7: iteration 166530/ 173500 | consumed samples: 42631680 | consumed tokens: 87309680640 | elapsed time per iteration (s): 0.11 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 4.514424E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2328.679 | TFLOPs: 8.66 | 7: iteration 166540/ 173500 | consumed samples: 42634240 | consumed tokens: 87314923520 | elapsed time per iteration (s): 0.08 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 4.510723E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.467 | TFLOPs: 11.94 | 7: iteration 166550/ 173500 | consumed samples: 42636800 | consumed tokens: 87320166400 | elapsed time per iteration (s): 0.08 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 4.497493E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.040 | TFLOPs: 11.95 | 7: iteration 166560/ 173500 | consumed samples: 42639360 | consumed tokens: 87325409280 | elapsed time per iteration (s): 0.08 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 4.486193E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.345 | TFLOPs: 11.93 | 7: iteration 166570/ 173500 | consumed samples: 42641920 | consumed tokens: 87330652160 | elapsed time per iteration (s): 0.08 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 4.504708E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.559 | TFLOPs: 11.94 | 7: iteration 166580/ 173500 | consumed samples: 42644480 | consumed tokens: 87335895040 | elapsed time per iteration (s): 0.08 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 4.509547E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.089 | TFLOPs: 11.84 | 7: iteration 166590/ 173500 | consumed samples: 42647040 | consumed tokens: 87341137920 | elapsed time per iteration (s): 0.08 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 4.505875E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.294 | TFLOPs: 11.86 | 7: iteration 166600/ 173500 | consumed samples: 42649600 | consumed tokens: 87346380800 | elapsed time per iteration (s): 0.08 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 4.504194E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.671 | TFLOPs: 11.76 | 7: iteration 166610/ 173500 | consumed samples: 42652160 | consumed tokens: 87351623680 | elapsed time per iteration (s): 0.08 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 4.500374E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.857 | TFLOPs: 11.82 | 7: iteration 166620/ 173500 | consumed samples: 42654720 | consumed tokens: 87356866560 | elapsed time per iteration (s): 0.08 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 4.502969E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.537 | TFLOPs: 11.83 | 7: iteration 166630/ 173500 | consumed samples: 42657280 | consumed tokens: 87362109440 | elapsed time per iteration (s): 0.08 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 4.500602E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.434 | TFLOPs: 11.87 | 7: iteration 166640/ 173500 | consumed samples: 42659840 | consumed tokens: 87367352320 | elapsed time per iteration (s): 0.08 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 4.506120E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.217 | TFLOPs: 11.84 | 7: iteration 166650/ 173500 | consumed samples: 42662400 | consumed tokens: 87372595200 | elapsed time per iteration (s): 0.08 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 4.516889E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.857 | TFLOPs: 11.85 | 7: iteration 166660/ 173500 | consumed samples: 42664960 | consumed tokens: 87377838080 | elapsed time per iteration (s): 0.08 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 4.512576E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.580 | TFLOPs: 11.82 | 7: iteration 166670/ 173500 | consumed samples: 42667520 | consumed tokens: 87383080960 | elapsed time per iteration (s): 0.08 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 4.508984E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.549 | TFLOPs: 11.54 | 7: iteration 166680/ 173500 | consumed samples: 42670080 | consumed tokens: 87388323840 | elapsed time per iteration (s): 0.08 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 4.519546E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.846 | TFLOPs: 11.95 | 7: iteration 166690/ 173500 | consumed samples: 42672640 | consumed tokens: 87393566720 | elapsed time per iteration (s): 0.08 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 4.524187E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.370 | TFLOPs: 11.95 | 7: iteration 166700/ 173500 | consumed samples: 42675200 | consumed tokens: 87398809600 | elapsed time per iteration (s): 0.08 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 4.507236E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.587 | TFLOPs: 11.95 | 7: iteration 166710/ 173500 | consumed samples: 42677760 | consumed tokens: 87404052480 | elapsed time per iteration (s): 0.08 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 4.504554E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.439 | TFLOPs: 11.91 | 7: iteration 166720/ 173500 | consumed samples: 42680320 | consumed tokens: 87409295360 | elapsed time per iteration (s): 0.08 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 4.512538E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.961 | TFLOPs: 11.95 | 7: iteration 166730/ 173500 | consumed samples: 42682880 | consumed tokens: 87414538240 | elapsed time per iteration (s): 0.08 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 4.505736E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.822 | TFLOPs: 11.95 | 7: iteration 166740/ 173500 | consumed samples: 42685440 | consumed tokens: 87419781120 | elapsed time per iteration (s): 0.08 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 4.510588E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.459 | TFLOPs: 11.95 | 7: iteration 166750/ 173500 | consumed samples: 42688000 | consumed tokens: 87425024000 | elapsed time per iteration (s): 0.08 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 4.511819E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.341 | TFLOPs: 11.91 | 7: iteration 166760/ 173500 | consumed samples: 42690560 | consumed tokens: 87430266880 | elapsed time per iteration (s): 0.08 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 4.522761E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.881 | TFLOPs: 11.91 | 7: iteration 166770/ 173500 | consumed samples: 42693120 | consumed tokens: 87435509760 | elapsed time per iteration (s): 0.08 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 4.500571E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.200 | TFLOPs: 11.90 | 7: iteration 166780/ 173500 | consumed samples: 42695680 | consumed tokens: 87440752640 | elapsed time per iteration (s): 0.08 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 4.501297E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.601 | TFLOPs: 11.82 | 7: iteration 166790/ 173500 | consumed samples: 42698240 | consumed tokens: 87445995520 | elapsed time per iteration (s): 0.08 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 4.507009E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.804 | TFLOPs: 11.83 | 7: iteration 166800/ 173500 | consumed samples: 42700800 | consumed tokens: 87451238400 | elapsed time per iteration (s): 0.08 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 4.503799E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.327 | TFLOPs: 11.90 | 7: iteration 166810/ 173500 | consumed samples: 42703360 | consumed tokens: 87456481280 | elapsed time per iteration (s): 0.08 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 4.509537E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.046 | TFLOPs: 11.94 | 7: iteration 166820/ 173500 | consumed samples: 42705920 | consumed tokens: 87461724160 | elapsed time per iteration (s): 0.08 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 4.494342E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.478 | TFLOPs: 11.89 | 7: iteration 166830/ 173500 | consumed samples: 42708480 | consumed tokens: 87466967040 | elapsed time per iteration (s): 0.08 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 4.490941E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.546 | TFLOPs: 11.94 | 7: iteration 166840/ 173500 | consumed samples: 42711040 | consumed tokens: 87472209920 | elapsed time per iteration (s): 0.08 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 4.500125E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.769 | TFLOPs: 11.94 | 7: iteration 166850/ 173500 | consumed samples: 42713600 | consumed tokens: 87477452800 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.503573E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.895 | TFLOPs: 11.94 | 7: iteration 166860/ 173500 | consumed samples: 42716160 | consumed tokens: 87482695680 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.504366E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.168 | TFLOPs: 11.94 | 7: iteration 166870/ 173500 | consumed samples: 42718720 | consumed tokens: 87487938560 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.509101E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.628 | TFLOPs: 12.06 | 7: iteration 166880/ 173500 | consumed samples: 42721280 | consumed tokens: 87493181440 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.504657E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.201 | TFLOPs: 12.01 | 7: iteration 166890/ 173500 | consumed samples: 42723840 | consumed tokens: 87498424320 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.506327E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.473 | TFLOPs: 11.93 | 7: iteration 166900/ 173500 | consumed samples: 42726400 | consumed tokens: 87503667200 | elapsed time per iteration (s): 0.08 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 4.501334E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.142 | TFLOPs: 12.04 | 7: iteration 166910/ 173500 | consumed samples: 42728960 | consumed tokens: 87508910080 | elapsed time per iteration (s): 0.08 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 4.503014E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.330 | TFLOPs: 12.02 | 7: iteration 166920/ 173500 | consumed samples: 42731520 | consumed tokens: 87514152960 | elapsed time per iteration (s): 0.08 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 4.500818E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.787 | TFLOPs: 11.99 | 7: iteration 166930/ 173500 | consumed samples: 42734080 | consumed tokens: 87519395840 | elapsed time per iteration (s): 0.08 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 4.500215E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.269 | TFLOPs: 11.95 | 7: iteration 166940/ 173500 | consumed samples: 42736640 | consumed tokens: 87524638720 | elapsed time per iteration (s): 0.08 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 4.504530E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3167.830 | TFLOPs: 11.78 | 7: iteration 166950/ 173500 | consumed samples: 42739200 | consumed tokens: 87529881600 | elapsed time per iteration (s): 0.08 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 4.499522E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.699 | TFLOPs: 11.88 | 7: iteration 166960/ 173500 | consumed samples: 42741760 | consumed tokens: 87535124480 | elapsed time per iteration (s): 0.08 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 4.512137E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.147 | TFLOPs: 11.90 | 7: iteration 166970/ 173500 | consumed samples: 42744320 | consumed tokens: 87540367360 | elapsed time per iteration (s): 0.08 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 4.507929E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.543 | TFLOPs: 11.87 | 7: iteration 166980/ 173500 | consumed samples: 42746880 | consumed tokens: 87545610240 | elapsed time per iteration (s): 0.08 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 4.498401E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3151.146 | TFLOPs: 11.72 | 7: iteration 166990/ 173500 | consumed samples: 42749440 | consumed tokens: 87550853120 | elapsed time per iteration (s): 0.08 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 4.510350E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.426 | TFLOPs: 11.90 | 7: iteration 167000/ 173500 | consumed samples: 42752000 | consumed tokens: 87556096000 | elapsed time per iteration (s): 0.08 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 4.510621E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.829 | TFLOPs: 11.69 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 167000 | lm loss value: 4.373934E+00 | lm loss PPL: 7.935522E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 167000 to checkpoints_14m91b100m 0: [2023-03-17 04:19:49,079] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step167000 is begin to save! 0: [2023-03-17 04:19:49,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:19:49,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:19:49,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:19:49,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:19:49,117] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:19:49,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:19:49,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:19:49,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:19:49,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:19:49,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:19:49,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:19:49,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:19:49,128] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step167000/mp_rank_00_model_states.pt 0: [2023-03-17 04:19:49,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:19:49,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:19:49,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:19:49,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:19:49,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 1: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 4: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 6: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 3: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 7: [2023-03-17 04:19:49,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:19:49,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 2: [2023-03-17 04:19:49,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step167000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2023-03-17 04:19:49,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step167000 is ready now! 0: successfully saved checkpoint at iteration 167000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 117.58 7: iteration 167010/ 173500 | consumed samples: 42754560 | consumed tokens: 87561338880 | elapsed time per iteration (s): 0.10 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 4.493033E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2673.314 | TFLOPs: 9.94 | 7: iteration 167020/ 173500 | consumed samples: 42757120 | consumed tokens: 87566581760 | elapsed time per iteration (s): 0.08 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 4.494909E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.684 | TFLOPs: 11.83 | 7: iteration 167030/ 173500 | consumed samples: 42759680 | consumed tokens: 87571824640 | elapsed time per iteration (s): 0.08 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 4.501926E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.269 | TFLOPs: 11.92 | 7: iteration 167040/ 173500 | consumed samples: 42762240 | consumed tokens: 87577067520 | elapsed time per iteration (s): 0.08 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 4.516216E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.074 | TFLOPs: 11.96 | 7: iteration 167050/ 173500 | consumed samples: 42764800 | consumed tokens: 87582310400 | elapsed time per iteration (s): 0.08 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 4.503162E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.808 | TFLOPs: 11.94 | 7: iteration 167060/ 173500 | consumed samples: 42767360 | consumed tokens: 87587553280 | elapsed time per iteration (s): 0.08 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 4.502847E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.557 | TFLOPs: 11.97 | 7: iteration 167070/ 173500 | consumed samples: 42769920 | consumed tokens: 87592796160 | elapsed time per iteration (s): 0.08 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 4.507159E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.152 | TFLOPs: 11.93 | 7: iteration 167080/ 173500 | consumed samples: 42772480 | consumed tokens: 87598039040 | elapsed time per iteration (s): 0.08 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 4.495185E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.934 | TFLOPs: 11.94 | 7: iteration 167090/ 173500 | consumed samples: 42775040 | consumed tokens: 87603281920 | elapsed time per iteration (s): 0.08 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 4.498912E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.721 | TFLOPs: 11.88 | 7: iteration 167100/ 173500 | consumed samples: 42777600 | consumed tokens: 87608524800 | elapsed time per iteration (s): 0.08 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 4.498693E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.061 | TFLOPs: 11.87 | 7: iteration 167110/ 173500 | consumed samples: 42780160 | consumed tokens: 87613767680 | elapsed time per iteration (s): 0.08 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 4.497862E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.736 | TFLOPs: 11.86 | 7: iteration 167120/ 173500 | consumed samples: 42782720 | consumed tokens: 87619010560 | elapsed time per iteration (s): 0.08 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 4.514523E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.022 | TFLOPs: 11.91 | 7: iteration 167130/ 173500 | consumed samples: 42785280 | consumed tokens: 87624253440 | elapsed time per iteration (s): 0.08 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 4.499405E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.255 | TFLOPs: 11.86 | 7: iteration 167140/ 173500 | consumed samples: 42787840 | consumed tokens: 87629496320 | elapsed time per iteration (s): 0.08 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 4.508862E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.669 | TFLOPs: 11.93 | 7: iteration 167150/ 173500 | consumed samples: 42790400 | consumed tokens: 87634739200 | elapsed time per iteration (s): 0.08 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 4.502793E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.602 | TFLOPs: 11.95 | 7: iteration 167160/ 173500 | consumed samples: 42792960 | consumed tokens: 87639982080 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.504093E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.531 | TFLOPs: 11.89 | 7: iteration 167170/ 173500 | consumed samples: 42795520 | consumed tokens: 87645224960 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.493308E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.362 | TFLOPs: 11.95 | 7: iteration 167180/ 173500 | consumed samples: 42798080 | consumed tokens: 87650467840 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.512581E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.270 | TFLOPs: 11.97 | 7: iteration 167190/ 173500 | consumed samples: 42800640 | consumed tokens: 87655710720 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.510640E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.513 | TFLOPs: 11.98 | 7: iteration 167200/ 173500 | consumed samples: 42803200 | consumed tokens: 87660953600 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.506828E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.511 | TFLOPs: 11.92 | 7: iteration 167210/ 173500 | consumed samples: 42805760 | consumed tokens: 87666196480 | elapsed time per iteration (s): 0.08 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 4.496621E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3158.256 | TFLOPs: 11.75 | 7: iteration 167220/ 173500 | consumed samples: 42808320 | consumed tokens: 87671439360 | elapsed time per iteration (s): 0.08 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 4.514897E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.986 | TFLOPs: 11.98 | 7: iteration 167230/ 173500 | consumed samples: 42810880 | consumed tokens: 87676682240 | elapsed time per iteration (s): 0.08 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 4.512767E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.114 | TFLOPs: 11.90 | 7: iteration 167240/ 173500 | consumed samples: 42813440 | consumed tokens: 87681925120 | elapsed time per iteration (s): 0.08 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 4.499571E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.711 | TFLOPs: 11.95 | 7: iteration 167250/ 173500 | consumed samples: 42816000 | consumed tokens: 87687168000 | elapsed time per iteration (s): 0.08 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 4.500285E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.227 | TFLOPs: 11.99 | 7: iteration 167260/ 173500 | consumed samples: 42818560 | consumed tokens: 87692410880 | elapsed time per iteration (s): 0.08 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 4.516034E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.907 | TFLOPs: 11.96 | 7: iteration 167270/ 173500 | consumed samples: 42821120 | consumed tokens: 87697653760 | elapsed time per iteration (s): 0.08 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 4.496216E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.614 | TFLOPs: 11.96 | 7: iteration 167280/ 173500 | consumed samples: 42823680 | consumed tokens: 87702896640 | elapsed time per iteration (s): 0.08 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 4.509634E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.131 | TFLOPs: 11.96 | 7: iteration 167290/ 173500 | consumed samples: 42826240 | consumed tokens: 87708139520 | elapsed time per iteration (s): 0.08 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 4.493424E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.580 | TFLOPs: 11.96 | 7: iteration 167300/ 173500 | consumed samples: 42828800 | consumed tokens: 87713382400 | elapsed time per iteration (s): 0.08 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 4.490871E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.812 | TFLOPs: 11.99 | 7: iteration 167310/ 173500 | consumed samples: 42831360 | consumed tokens: 87718625280 | elapsed time per iteration (s): 0.08 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 4.494827E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.778 | TFLOPs: 11.99 | 7: iteration 167320/ 173500 | consumed samples: 42833920 | consumed tokens: 87723868160 | elapsed time per iteration (s): 0.08 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.507439E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.414 | TFLOPs: 11.84 | 7: iteration 167330/ 173500 | consumed samples: 42836480 | consumed tokens: 87729111040 | elapsed time per iteration (s): 0.08 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.505569E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.587 | TFLOPs: 11.94 | 7: iteration 167340/ 173500 | consumed samples: 42839040 | consumed tokens: 87734353920 | elapsed time per iteration (s): 0.08 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.496909E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.688 | TFLOPs: 11.95 | 7: iteration 167350/ 173500 | consumed samples: 42841600 | consumed tokens: 87739596800 | elapsed time per iteration (s): 0.08 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.499509E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.124 | TFLOPs: 11.97 | 7: iteration 167360/ 173500 | consumed samples: 42844160 | consumed tokens: 87744839680 | elapsed time per iteration (s): 0.11 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.507927E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2305.152 | TFLOPs: 8.57 | 7: iteration 167370/ 173500 | consumed samples: 42846720 | consumed tokens: 87750082560 | elapsed time per iteration (s): 0.08 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 4.498234E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.063 | TFLOPs: 11.88 | 7: iteration 167380/ 173500 | consumed samples: 42849280 | consumed tokens: 87755325440 | elapsed time per iteration (s): 0.08 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 4.505929E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.878 | TFLOPs: 11.82 | 7: iteration 167390/ 173500 | consumed samples: 42851840 | consumed tokens: 87760568320 | elapsed time per iteration (s): 0.08 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 4.511317E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.291 | TFLOPs: 11.91 | 7: iteration 167400/ 173500 | consumed samples: 42854400 | consumed tokens: 87765811200 | elapsed time per iteration (s): 0.08 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 4.520728E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.186 | TFLOPs: 11.96 | 7: iteration 167410/ 173500 | consumed samples: 42856960 | consumed tokens: 87771054080 | elapsed time per iteration (s): 0.08 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 4.510683E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.933 | TFLOPs: 11.97 | 7: iteration 167420/ 173500 | consumed samples: 42859520 | consumed tokens: 87776296960 | elapsed time per iteration (s): 0.08 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 4.497563E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.742 | TFLOPs: 11.95 | 7: iteration 167430/ 173500 | consumed samples: 42862080 | consumed tokens: 87781539840 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.490651E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.670 | TFLOPs: 11.88 | 7: iteration 167440/ 173500 | consumed samples: 42864640 | consumed tokens: 87786782720 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.506474E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.567 | TFLOPs: 11.95 | 7: iteration 167450/ 173500 | consumed samples: 42867200 | consumed tokens: 87792025600 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.506969E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.107 | TFLOPs: 11.98 | 7: iteration 167460/ 173500 | consumed samples: 42869760 | consumed tokens: 87797268480 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.504763E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.233 | TFLOPs: 11.92 | 7: iteration 167470/ 173500 | consumed samples: 42872320 | consumed tokens: 87802511360 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.518316E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.973 | TFLOPs: 11.98 | 7: iteration 167480/ 173500 | consumed samples: 42874880 | consumed tokens: 87807754240 | elapsed time per iteration (s): 0.08 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 4.498611E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.618 | TFLOPs: 11.97 | 7: iteration 167490/ 173500 | consumed samples: 42877440 | consumed tokens: 87812997120 | elapsed time per iteration (s): 0.08 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 4.496226E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.420 | TFLOPs: 11.97 | 7: iteration 167500/ 173500 | consumed samples: 42880000 | consumed tokens: 87818240000 | elapsed time per iteration (s): 0.08 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 4.511393E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.046 | TFLOPs: 11.95 | 7: iteration 167510/ 173500 | consumed samples: 42882560 | consumed tokens: 87823482880 | elapsed time per iteration (s): 0.08 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 4.498605E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.993 | TFLOPs: 11.95 | 7: iteration 167520/ 173500 | consumed samples: 42885120 | consumed tokens: 87828725760 | elapsed time per iteration (s): 0.08 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 4.499411E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.961 | TFLOPs: 11.95 | 7: iteration 167530/ 173500 | consumed samples: 42887680 | consumed tokens: 87833968640 | elapsed time per iteration (s): 0.08 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 4.495759E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.990 | TFLOPs: 11.98 | 7: iteration 167540/ 173500 | consumed samples: 42890240 | consumed tokens: 87839211520 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.513389E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.274 | TFLOPs: 11.97 | 7: iteration 167550/ 173500 | consumed samples: 42892800 | consumed tokens: 87844454400 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.498003E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.661 | TFLOPs: 11.96 | 7: iteration 167560/ 173500 | consumed samples: 42895360 | consumed tokens: 87849697280 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.501996E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.266 | TFLOPs: 11.95 | 7: iteration 167570/ 173500 | consumed samples: 42897920 | consumed tokens: 87854940160 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.507304E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.998 | TFLOPs: 11.87 | 7: iteration 167580/ 173500 | consumed samples: 42900480 | consumed tokens: 87860183040 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.494783E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.285 | TFLOPs: 11.99 | 7: iteration 167590/ 173500 | consumed samples: 42903040 | consumed tokens: 87865425920 | elapsed time per iteration (s): 0.08 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 4.493951E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.929 | TFLOPs: 11.94 | 7: iteration 167600/ 173500 | consumed samples: 42905600 | consumed tokens: 87870668800 | elapsed time per iteration (s): 0.08 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 4.493472E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.782 | TFLOPs: 11.98 | 7: iteration 167610/ 173500 | consumed samples: 42908160 | consumed tokens: 87875911680 | elapsed time per iteration (s): 0.08 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 4.507861E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.101 | TFLOPs: 11.95 | 7: iteration 167620/ 173500 | consumed samples: 42910720 | consumed tokens: 87881154560 | elapsed time per iteration (s): 0.08 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 4.501979E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.348 | TFLOPs: 11.99 | 7: iteration 167630/ 173500 | consumed samples: 42913280 | consumed tokens: 87886397440 | elapsed time per iteration (s): 0.08 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 4.522019E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.228 | TFLOPs: 12.00 | 7: iteration 167640/ 173500 | consumed samples: 42915840 | consumed tokens: 87891640320 | elapsed time per iteration (s): 0.08 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 4.499631E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.837 | TFLOPs: 11.99 | 7: iteration 167650/ 173500 | consumed samples: 42918400 | consumed tokens: 87896883200 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.509505E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.358 | TFLOPs: 11.99 | 7: iteration 167660/ 173500 | consumed samples: 42920960 | consumed tokens: 87902126080 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.502617E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.215 | TFLOPs: 11.93 | 7: iteration 167670/ 173500 | consumed samples: 42923520 | consumed tokens: 87907368960 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.488825E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.286 | TFLOPs: 11.97 | 7: iteration 167680/ 173500 | consumed samples: 42926080 | consumed tokens: 87912611840 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.499642E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.652 | TFLOPs: 11.99 | 7: iteration 167690/ 173500 | consumed samples: 42928640 | consumed tokens: 87917854720 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.503871E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.469 | TFLOPs: 11.99 | 7: iteration 167700/ 173500 | consumed samples: 42931200 | consumed tokens: 87923097600 | elapsed time per iteration (s): 0.08 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 4.505844E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.311 | TFLOPs: 11.91 | 7: iteration 167710/ 173500 | consumed samples: 42933760 | consumed tokens: 87928340480 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.503272E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.652 | TFLOPs: 11.99 | 7: iteration 167720/ 173500 | consumed samples: 42936320 | consumed tokens: 87933583360 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.501376E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.825 | TFLOPs: 11.93 | 7: iteration 167730/ 173500 | consumed samples: 42938880 | consumed tokens: 87938826240 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.509179E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.600 | TFLOPs: 11.97 | 7: iteration 167740/ 173500 | consumed samples: 42941440 | consumed tokens: 87944069120 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.521416E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.816 | TFLOPs: 11.90 | 7: iteration 167750/ 173500 | consumed samples: 42944000 | consumed tokens: 87949312000 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.507546E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.836 | TFLOPs: 11.98 | 7: iteration 167760/ 173500 | consumed samples: 42946560 | consumed tokens: 87954554880 | elapsed time per iteration (s): 0.08 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 4.508091E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.976 | TFLOPs: 11.97 | 7: iteration 167770/ 173500 | consumed samples: 42949120 | consumed tokens: 87959797760 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.492111E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.589 | TFLOPs: 11.93 | 7: iteration 167780/ 173500 | consumed samples: 42951680 | consumed tokens: 87965040640 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.507230E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.018 | TFLOPs: 11.96 | 7: iteration 167790/ 173500 | consumed samples: 42954240 | consumed tokens: 87970283520 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.515082E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.373 | TFLOPs: 11.94 | 7: iteration 167800/ 173500 | consumed samples: 42956800 | consumed tokens: 87975526400 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.506247E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.256 | TFLOPs: 11.93 | 7: iteration 167810/ 173500 | consumed samples: 42959360 | consumed tokens: 87980769280 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.513246E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.394 | TFLOPs: 12.01 | 7: iteration 167820/ 173500 | consumed samples: 42961920 | consumed tokens: 87986012160 | elapsed time per iteration (s): 0.08 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 4.495908E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3231.625 | TFLOPs: 12.02 | 7: iteration 167830/ 173500 | consumed samples: 42964480 | consumed tokens: 87991255040 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.511190E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.673 | TFLOPs: 12.00 | 7: iteration 167840/ 173500 | consumed samples: 42967040 | consumed tokens: 87996497920 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.506418E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.000 | TFLOPs: 11.94 | 7: iteration 167850/ 173500 | consumed samples: 42969600 | consumed tokens: 88001740800 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.504729E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.052 | TFLOPs: 11.97 | 7: iteration 167860/ 173500 | consumed samples: 42972160 | consumed tokens: 88006983680 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.514022E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.551 | TFLOPs: 11.85 | 7: iteration 167870/ 173500 | consumed samples: 42974720 | consumed tokens: 88012226560 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.495488E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.266 | TFLOPs: 11.92 | 7: iteration 167880/ 173500 | consumed samples: 42977280 | consumed tokens: 88017469440 | elapsed time per iteration (s): 0.08 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 4.519697E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.571 | TFLOPs: 11.92 | 7: iteration 167890/ 173500 | consumed samples: 42979840 | consumed tokens: 88022712320 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.509801E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.903 | TFLOPs: 11.95 | 7: iteration 167900/ 173500 | consumed samples: 42982400 | consumed tokens: 88027955200 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.515564E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.615 | TFLOPs: 11.97 | 7: iteration 167910/ 173500 | consumed samples: 42984960 | consumed tokens: 88033198080 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.500369E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.137 | TFLOPs: 11.81 | 7: iteration 167920/ 173500 | consumed samples: 42987520 | consumed tokens: 88038440960 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.504384E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.487 | TFLOPs: 11.97 | 7: iteration 167930/ 173500 | consumed samples: 42990080 | consumed tokens: 88043683840 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.508725E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.542 | TFLOPs: 11.96 | 7: iteration 167940/ 173500 | consumed samples: 42992640 | consumed tokens: 88048926720 | elapsed time per iteration (s): 0.08 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 4.481242E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.416 | TFLOPs: 11.92 | 7: iteration 167950/ 173500 | consumed samples: 42995200 | consumed tokens: 88054169600 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.515395E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.089 | TFLOPs: 11.96 | 7: iteration 167960/ 173500 | consumed samples: 42997760 | consumed tokens: 88059412480 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.513572E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.801 | TFLOPs: 11.94 | 7: iteration 167970/ 173500 | consumed samples: 43000320 | consumed tokens: 88064655360 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.507050E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.228 | TFLOPs: 11.67 | 7: iteration 167980/ 173500 | consumed samples: 43002880 | consumed tokens: 88069898240 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.501747E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.307 | TFLOPs: 11.97 | 7: iteration 167990/ 173500 | consumed samples: 43005440 | consumed tokens: 88075141120 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.499238E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.171 | TFLOPs: 11.96 | 0: [2023-03-17 04:21:09,261] [INFO] [logging.py:68:log_dist] [Rank 0] step=168000, skipped=0, lr=[2.0455079592202583e-05, 2.0455079592202583e-05, 2.0455079592202583e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 168000/ 173500 | consumed samples: 43008000 | consumed tokens: 88080384000 | elapsed time per iteration (s): 0.08 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 4.497399E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.584 | TFLOPs: 11.92 | 0: steps: 168000 loss: 4.5205 iter time (s): 0.080 samples/sec: 3197.568 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 168000 | lm loss value: 4.414765E+00 | lm loss PPL: 8.266244E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 168000 to checkpoints_14m91b100m 0: [2023-03-17 04:21:09,325] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step168000 is begin to save! 0: [2023-03-17 04:21:09,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:21:09,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:21:09,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:21:09,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:21:09,358] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:21:09,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:21:09,361] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:21:09,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:21:09,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:21:09,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:21:09,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:21:09,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:21:09,368] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step168000/mp_rank_00_model_states.pt 0: [2023-03-17 04:21:09,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:21:09,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:21:09,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:21:09,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 6: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 2: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 2: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 3: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 4: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 7: [2023-03-17 04:21:09,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:21:09,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 1: [2023-03-17 04:21:09,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:21:09,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:21:09,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 5: [2023-03-17 04:21:09,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:21:09,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step168000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:21:09,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step168000 is ready now! 0: successfully saved checkpoint at iteration 168000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 78.59 7: iteration 168010/ 173500 | consumed samples: 43010560 | consumed tokens: 88085626880 | elapsed time per iteration (s): 0.09 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.508477E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2740.317 | TFLOPs: 10.19 | 7: iteration 168020/ 173500 | consumed samples: 43013120 | consumed tokens: 88090869760 | elapsed time per iteration (s): 0.08 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.507641E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.603 | TFLOPs: 11.98 | 7: iteration 168030/ 173500 | consumed samples: 43015680 | consumed tokens: 88096112640 | elapsed time per iteration (s): 0.08 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.504416E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.869 | TFLOPs: 11.96 | 7: iteration 168040/ 173500 | consumed samples: 43018240 | consumed tokens: 88101355520 | elapsed time per iteration (s): 0.08 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.497001E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.436 | TFLOPs: 11.99 | 7: iteration 168050/ 173500 | consumed samples: 43020800 | consumed tokens: 88106598400 | elapsed time per iteration (s): 0.08 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.497925E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.259 | TFLOPs: 11.93 | 7: iteration 168060/ 173500 | consumed samples: 43023360 | consumed tokens: 88111841280 | elapsed time per iteration (s): 0.08 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 4.484576E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.541 | TFLOPs: 11.98 | 7: iteration 168070/ 173500 | consumed samples: 43025920 | consumed tokens: 88117084160 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.506612E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.969 | TFLOPs: 11.97 | 7: iteration 168080/ 173500 | consumed samples: 43028480 | consumed tokens: 88122327040 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.495373E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.964 | TFLOPs: 11.84 | 7: iteration 168090/ 173500 | consumed samples: 43031040 | consumed tokens: 88127569920 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.490167E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.275 | TFLOPs: 11.98 | 7: iteration 168100/ 173500 | consumed samples: 43033600 | consumed tokens: 88132812800 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.507980E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.107 | TFLOPs: 11.96 | 7: iteration 168110/ 173500 | consumed samples: 43036160 | consumed tokens: 88138055680 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.502044E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.479 | TFLOPs: 11.96 | 7: iteration 168120/ 173500 | consumed samples: 43038720 | consumed tokens: 88143298560 | elapsed time per iteration (s): 0.08 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 4.504474E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.154 | TFLOPs: 11.90 | 7: iteration 168130/ 173500 | consumed samples: 43041280 | consumed tokens: 88148541440 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.502505E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.986 | TFLOPs: 11.44 | 7: iteration 168140/ 173500 | consumed samples: 43043840 | consumed tokens: 88153784320 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.500677E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.827 | TFLOPs: 11.91 | 7: iteration 168150/ 173500 | consumed samples: 43046400 | consumed tokens: 88159027200 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.496295E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.646 | TFLOPs: 11.94 | 7: iteration 168160/ 173500 | consumed samples: 43048960 | consumed tokens: 88164270080 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.501065E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.279 | TFLOPs: 11.89 | 7: iteration 168170/ 173500 | consumed samples: 43051520 | consumed tokens: 88169512960 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.517776E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.106 | TFLOPs: 11.90 | 7: iteration 168180/ 173500 | consumed samples: 43054080 | consumed tokens: 88174755840 | elapsed time per iteration (s): 0.08 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 4.504550E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.687 | TFLOPs: 11.95 | 7: iteration 168190/ 173500 | consumed samples: 43056640 | consumed tokens: 88179998720 | elapsed time per iteration (s): 0.08 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.499640E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.558 | TFLOPs: 11.93 | 7: iteration 168200/ 173500 | consumed samples: 43059200 | consumed tokens: 88185241600 | elapsed time per iteration (s): 0.08 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.507001E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.065 | TFLOPs: 11.95 | 7: iteration 168210/ 173500 | consumed samples: 43061760 | consumed tokens: 88190484480 | elapsed time per iteration (s): 0.08 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.497142E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.291 | TFLOPs: 11.95 | 7: iteration 168220/ 173500 | consumed samples: 43064320 | consumed tokens: 88195727360 | elapsed time per iteration (s): 0.08 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.507178E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.836 | TFLOPs: 11.96 | 7: iteration 168230/ 173500 | consumed samples: 43066880 | consumed tokens: 88200970240 | elapsed time per iteration (s): 0.10 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.501546E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2562.498 | TFLOPs: 9.53 | 7: iteration 168240/ 173500 | consumed samples: 43069440 | consumed tokens: 88206213120 | elapsed time per iteration (s): 0.12 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 4.491747E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2055.846 | TFLOPs: 7.65 | 7: iteration 168250/ 173500 | consumed samples: 43072000 | consumed tokens: 88211456000 | elapsed time per iteration (s): 0.13 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.508856E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1976.643 | TFLOPs: 7.35 | 7: iteration 168260/ 173500 | consumed samples: 43074560 | consumed tokens: 88216698880 | elapsed time per iteration (s): 0.13 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.502751E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1975.362 | TFLOPs: 7.35 | 7: iteration 168270/ 173500 | consumed samples: 43077120 | consumed tokens: 88221941760 | elapsed time per iteration (s): 0.09 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.497738E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.570 | TFLOPs: 10.54 | 7: iteration 168280/ 173500 | consumed samples: 43079680 | consumed tokens: 88227184640 | elapsed time per iteration (s): 0.08 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.515767E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.445 | TFLOPs: 11.76 | 7: iteration 168290/ 173500 | consumed samples: 43082240 | consumed tokens: 88232427520 | elapsed time per iteration (s): 0.08 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.502584E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.355 | TFLOPs: 11.93 | 7: iteration 168300/ 173500 | consumed samples: 43084800 | consumed tokens: 88237670400 | elapsed time per iteration (s): 0.08 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.497042E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.722 | TFLOPs: 11.89 | 7: iteration 168310/ 173500 | consumed samples: 43087360 | consumed tokens: 88242913280 | elapsed time per iteration (s): 0.08 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 4.516440E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.276 | TFLOPs: 11.88 | 7: iteration 168320/ 173500 | consumed samples: 43089920 | consumed tokens: 88248156160 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.498788E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.724 | TFLOPs: 11.88 | 7: iteration 168330/ 173500 | consumed samples: 43092480 | consumed tokens: 88253399040 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.498659E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.403 | TFLOPs: 11.73 | 7: iteration 168340/ 173500 | consumed samples: 43095040 | consumed tokens: 88258641920 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.497549E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.956 | TFLOPs: 11.89 | 7: iteration 168350/ 173500 | consumed samples: 43097600 | consumed tokens: 88263884800 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.496548E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.470 | TFLOPs: 12.04 | 7: iteration 168360/ 173500 | consumed samples: 43100160 | consumed tokens: 88269127680 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.503291E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.610 | TFLOPs: 12.02 | 7: iteration 168370/ 173500 | consumed samples: 43102720 | consumed tokens: 88274370560 | elapsed time per iteration (s): 0.08 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 4.500366E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.307 | TFLOPs: 11.87 | 7: iteration 168380/ 173500 | consumed samples: 43105280 | consumed tokens: 88279613440 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.510767E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.054 | TFLOPs: 12.05 | 7: iteration 168390/ 173500 | consumed samples: 43107840 | consumed tokens: 88284856320 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.499125E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.048 | TFLOPs: 12.03 | 7: iteration 168400/ 173500 | consumed samples: 43110400 | consumed tokens: 88290099200 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.502438E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.811 | TFLOPs: 11.92 | 7: iteration 168410/ 173500 | consumed samples: 43112960 | consumed tokens: 88295342080 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.517286E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.323 | TFLOPs: 12.06 | 7: iteration 168420/ 173500 | consumed samples: 43115520 | consumed tokens: 88300584960 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.498044E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.567 | TFLOPs: 11.89 | 7: iteration 168430/ 173500 | consumed samples: 43118080 | consumed tokens: 88305827840 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.494590E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.156 | TFLOPs: 12.01 | 7: iteration 168440/ 173500 | consumed samples: 43120640 | consumed tokens: 88311070720 | elapsed time per iteration (s): 0.08 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 4.519508E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3239.570 | TFLOPs: 12.05 | 7: iteration 168450/ 173500 | consumed samples: 43123200 | consumed tokens: 88316313600 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.519339E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.726 | TFLOPs: 12.00 | 7: iteration 168460/ 173500 | consumed samples: 43125760 | consumed tokens: 88321556480 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.501797E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3222.955 | TFLOPs: 11.99 | 7: iteration 168470/ 173500 | consumed samples: 43128320 | consumed tokens: 88326799360 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.500689E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.007 | TFLOPs: 12.06 | 7: iteration 168480/ 173500 | consumed samples: 43130880 | consumed tokens: 88332042240 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.512973E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.317 | TFLOPs: 12.06 | 7: iteration 168490/ 173500 | consumed samples: 43133440 | consumed tokens: 88337285120 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.506427E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3250.464 | TFLOPs: 12.09 | 7: iteration 168500/ 173500 | consumed samples: 43136000 | consumed tokens: 88342528000 | elapsed time per iteration (s): 0.08 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 4.505038E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.392 | TFLOPs: 12.03 | 7: iteration 168510/ 173500 | consumed samples: 43138560 | consumed tokens: 88347770880 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.500947E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3168.445 | TFLOPs: 11.79 | 7: iteration 168520/ 173500 | consumed samples: 43141120 | consumed tokens: 88353013760 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.507494E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3103.270 | TFLOPs: 11.54 | 7: iteration 168530/ 173500 | consumed samples: 43143680 | consumed tokens: 88358256640 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.498318E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3123.675 | TFLOPs: 11.62 | 7: iteration 168540/ 173500 | consumed samples: 43146240 | consumed tokens: 88363499520 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.510424E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3074.336 | TFLOPs: 11.44 | 7: iteration 168550/ 173500 | consumed samples: 43148800 | consumed tokens: 88368742400 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.503312E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.497 | TFLOPs: 12.00 | 7: iteration 168560/ 173500 | consumed samples: 43151360 | consumed tokens: 88373985280 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.510602E+00 | grad norm: 0.423 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.602 | TFLOPs: 11.71 | 7: iteration 168570/ 173500 | consumed samples: 43153920 | consumed tokens: 88379228160 | elapsed time per iteration (s): 0.08 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 4.516787E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.602 | TFLOPs: 12.02 | 7: iteration 168580/ 173500 | consumed samples: 43156480 | consumed tokens: 88384471040 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.500030E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3235.592 | TFLOPs: 12.03 | 7: iteration 168590/ 173500 | consumed samples: 43159040 | consumed tokens: 88389713920 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.510948E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.841 | TFLOPs: 12.05 | 7: iteration 168600/ 173500 | consumed samples: 43161600 | consumed tokens: 88394956800 | elapsed time per iteration (s): 0.24 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.500681E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1067.383 | TFLOPs: 3.97 | 7: iteration 168610/ 173500 | consumed samples: 43164160 | consumed tokens: 88400199680 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.495837E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.297 | TFLOPs: 11.86 | 7: iteration 168620/ 173500 | consumed samples: 43166720 | consumed tokens: 88405442560 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.498360E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.884 | TFLOPs: 11.91 | 7: iteration 168630/ 173500 | consumed samples: 43169280 | consumed tokens: 88410685440 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.502268E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.950 | TFLOPs: 11.87 | 7: iteration 168640/ 173500 | consumed samples: 43171840 | consumed tokens: 88415928320 | elapsed time per iteration (s): 0.08 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 4.499514E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.472 | TFLOPs: 11.92 | 7: iteration 168650/ 173500 | consumed samples: 43174400 | consumed tokens: 88421171200 | elapsed time per iteration (s): 0.08 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.502817E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.257 | TFLOPs: 11.97 | 7: iteration 168660/ 173500 | consumed samples: 43176960 | consumed tokens: 88426414080 | elapsed time per iteration (s): 0.09 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.512045E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2932.551 | TFLOPs: 10.91 | 7: iteration 168670/ 173500 | consumed samples: 43179520 | consumed tokens: 88431656960 | elapsed time per iteration (s): 0.08 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.497453E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.557 | TFLOPs: 11.60 | 7: iteration 168680/ 173500 | consumed samples: 43182080 | consumed tokens: 88436899840 | elapsed time per iteration (s): 0.08 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.483865E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.035 | TFLOPs: 11.91 | 7: iteration 168690/ 173500 | consumed samples: 43184640 | consumed tokens: 88442142720 | elapsed time per iteration (s): 0.09 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.496782E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2973.818 | TFLOPs: 11.06 | 7: iteration 168700/ 173500 | consumed samples: 43187200 | consumed tokens: 88447385600 | elapsed time per iteration (s): 0.08 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.505363E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3069.731 | TFLOPs: 11.42 | 7: iteration 168710/ 173500 | consumed samples: 43189760 | consumed tokens: 88452628480 | elapsed time per iteration (s): 0.08 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 4.498365E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.747 | TFLOPs: 11.92 | 7: iteration 168720/ 173500 | consumed samples: 43192320 | consumed tokens: 88457871360 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.497706E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.074 | TFLOPs: 11.95 | 7: iteration 168730/ 173500 | consumed samples: 43194880 | consumed tokens: 88463114240 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.494792E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.728 | TFLOPs: 11.38 | 7: iteration 168740/ 173500 | consumed samples: 43197440 | consumed tokens: 88468357120 | elapsed time per iteration (s): 0.09 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.510740E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2852.214 | TFLOPs: 10.61 | 7: iteration 168750/ 173500 | consumed samples: 43200000 | consumed tokens: 88473600000 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.504498E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.152 | TFLOPs: 11.85 | 7: iteration 168760/ 173500 | consumed samples: 43202560 | consumed tokens: 88478842880 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.506223E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.534 | TFLOPs: 11.55 | 7: iteration 168770/ 173500 | consumed samples: 43205120 | consumed tokens: 88484085760 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.500307E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.507 | TFLOPs: 11.91 | 7: iteration 168780/ 173500 | consumed samples: 43207680 | consumed tokens: 88489328640 | elapsed time per iteration (s): 0.08 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 4.505239E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3219.732 | TFLOPs: 11.98 | 7: iteration 168790/ 173500 | consumed samples: 43210240 | consumed tokens: 88494571520 | elapsed time per iteration (s): 0.09 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.503238E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2702.496 | TFLOPs: 10.05 | 7: iteration 168800/ 173500 | consumed samples: 43212800 | consumed tokens: 88499814400 | elapsed time per iteration (s): 0.08 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.502908E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.865 | TFLOPs: 11.93 | 7: iteration 168810/ 173500 | consumed samples: 43215360 | consumed tokens: 88505057280 | elapsed time per iteration (s): 0.08 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.506805E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.055 | TFLOPs: 11.96 | 7: iteration 168820/ 173500 | consumed samples: 43217920 | consumed tokens: 88510300160 | elapsed time per iteration (s): 0.08 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.512189E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.606 | TFLOPs: 11.84 | 7: iteration 168830/ 173500 | consumed samples: 43220480 | consumed tokens: 88515543040 | elapsed time per iteration (s): 0.09 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.515355E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2948.331 | TFLOPs: 10.97 | 7: iteration 168840/ 173500 | consumed samples: 43223040 | consumed tokens: 88520785920 | elapsed time per iteration (s): 0.08 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.515232E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.711 | TFLOPs: 11.86 | 7: iteration 168850/ 173500 | consumed samples: 43225600 | consumed tokens: 88526028800 | elapsed time per iteration (s): 0.08 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 4.503281E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.551 | TFLOPs: 11.92 | 7: iteration 168860/ 173500 | consumed samples: 43228160 | consumed tokens: 88531271680 | elapsed time per iteration (s): 0.09 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.502279E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2796.369 | TFLOPs: 10.40 | 7: iteration 168870/ 173500 | consumed samples: 43230720 | consumed tokens: 88536514560 | elapsed time per iteration (s): 0.08 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.503744E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.397 | TFLOPs: 11.96 | 7: iteration 168880/ 173500 | consumed samples: 43233280 | consumed tokens: 88541757440 | elapsed time per iteration (s): 0.09 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.499950E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2776.709 | TFLOPs: 10.33 | 7: iteration 168890/ 173500 | consumed samples: 43235840 | consumed tokens: 88547000320 | elapsed time per iteration (s): 0.09 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.511282E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.714 | TFLOPs: 10.59 | 7: iteration 168900/ 173500 | consumed samples: 43238400 | consumed tokens: 88552243200 | elapsed time per iteration (s): 0.08 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.505939E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.918 | TFLOPs: 11.88 | 7: iteration 168910/ 173500 | consumed samples: 43240960 | consumed tokens: 88557486080 | elapsed time per iteration (s): 0.08 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.503245E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.595 | TFLOPs: 11.83 | 7: iteration 168920/ 173500 | consumed samples: 43243520 | consumed tokens: 88562728960 | elapsed time per iteration (s): 0.08 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 4.505222E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.279 | TFLOPs: 11.83 | 7: iteration 168930/ 173500 | consumed samples: 43246080 | consumed tokens: 88567971840 | elapsed time per iteration (s): 0.08 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.502110E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.489 | TFLOPs: 11.83 | 7: iteration 168940/ 173500 | consumed samples: 43248640 | consumed tokens: 88573214720 | elapsed time per iteration (s): 0.08 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.507370E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.266 | TFLOPs: 11.93 | 7: iteration 168950/ 173500 | consumed samples: 43251200 | consumed tokens: 88578457600 | elapsed time per iteration (s): 0.09 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.510620E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2935.205 | TFLOPs: 10.92 | 7: iteration 168960/ 173500 | consumed samples: 43253760 | consumed tokens: 88583700480 | elapsed time per iteration (s): 0.12 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.488486E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2162.036 | TFLOPs: 8.04 | 7: iteration 168970/ 173500 | consumed samples: 43256320 | consumed tokens: 88588943360 | elapsed time per iteration (s): 0.09 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.500728E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.480 | TFLOPs: 10.87 | 7: iteration 168980/ 173500 | consumed samples: 43258880 | consumed tokens: 88594186240 | elapsed time per iteration (s): 0.08 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.496624E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.999 | TFLOPs: 11.94 | 7: iteration 168990/ 173500 | consumed samples: 43261440 | consumed tokens: 88599429120 | elapsed time per iteration (s): 0.08 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 4.513814E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.815 | TFLOPs: 11.60 | 7: iteration 169000/ 173500 | consumed samples: 43264000 | consumed tokens: 88604672000 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.511367E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.521 | TFLOPs: 11.87 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 169000 | lm loss value: 4.348076E+00 | lm loss PPL: 7.732956E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 169000 to checkpoints_14m91b100m 0: [2023-03-17 04:22:34,183] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step169000 is begin to save! 0: [2023-03-17 04:22:34,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:22:34,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:22:34,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:22:34,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:22:34,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:22:34,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:22:34,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:22:34,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:22:34,224] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:22:34,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:22:34,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:22:34,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:22:34,228] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step169000/mp_rank_00_model_states.pt 0: [2023-03-17 04:22:34,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:22:34,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:22:34,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:22:34,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,259] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 7: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 7: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 4: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 3: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 6: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 1: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:22:34,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:22:34,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 2: [2023-03-17 04:22:34,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:22:34,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step169000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:22:34,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step169000 is ready now! 0: successfully saved checkpoint at iteration 169000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 82.88 7: iteration 169010/ 173500 | consumed samples: 43266560 | consumed tokens: 88609914880 | elapsed time per iteration (s): 0.09 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.511914E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2773.476 | TFLOPs: 10.32 | 7: iteration 169020/ 173500 | consumed samples: 43269120 | consumed tokens: 88615157760 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.508760E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.460 | TFLOPs: 11.83 | 7: iteration 169030/ 173500 | consumed samples: 43271680 | consumed tokens: 88620400640 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.503318E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.098 | TFLOPs: 11.83 | 7: iteration 169040/ 173500 | consumed samples: 43274240 | consumed tokens: 88625643520 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.512278E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.741 | TFLOPs: 11.92 | 7: iteration 169050/ 173500 | consumed samples: 43276800 | consumed tokens: 88630886400 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.498457E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3176.948 | TFLOPs: 11.82 | 7: iteration 169060/ 173500 | consumed samples: 43279360 | consumed tokens: 88636129280 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.505183E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.689 | TFLOPs: 11.93 | 7: iteration 169070/ 173500 | consumed samples: 43281920 | consumed tokens: 88641372160 | elapsed time per iteration (s): 0.08 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 4.510506E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3031.735 | TFLOPs: 11.28 | 7: iteration 169080/ 173500 | consumed samples: 43284480 | consumed tokens: 88646615040 | elapsed time per iteration (s): 0.08 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.505331E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.003 | TFLOPs: 11.51 | 7: iteration 169090/ 173500 | consumed samples: 43287040 | consumed tokens: 88651857920 | elapsed time per iteration (s): 0.12 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.496677E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2052.826 | TFLOPs: 7.64 | 7: iteration 169100/ 173500 | consumed samples: 43289600 | consumed tokens: 88657100800 | elapsed time per iteration (s): 0.13 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.506408E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1910.087 | TFLOPs: 7.10 | 7: iteration 169110/ 173500 | consumed samples: 43292160 | consumed tokens: 88662343680 | elapsed time per iteration (s): 0.13 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.496245E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1989.810 | TFLOPs: 7.40 | 7: iteration 169120/ 173500 | consumed samples: 43294720 | consumed tokens: 88667586560 | elapsed time per iteration (s): 0.13 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.506535E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1949.789 | TFLOPs: 7.25 | 7: iteration 169130/ 173500 | consumed samples: 43297280 | consumed tokens: 88672829440 | elapsed time per iteration (s): 0.13 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.511814E+00 | grad norm: 0.388 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.677 | TFLOPs: 7.42 | 7: iteration 169140/ 173500 | consumed samples: 43299840 | consumed tokens: 88678072320 | elapsed time per iteration (s): 0.13 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 4.494861E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2044.948 | TFLOPs: 7.61 | 7: iteration 169150/ 173500 | consumed samples: 43302400 | consumed tokens: 88683315200 | elapsed time per iteration (s): 0.13 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.505675E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2018.762 | TFLOPs: 7.51 | 7: iteration 169160/ 173500 | consumed samples: 43304960 | consumed tokens: 88688558080 | elapsed time per iteration (s): 0.12 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.504136E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2158.447 | TFLOPs: 8.03 | 7: iteration 169170/ 173500 | consumed samples: 43307520 | consumed tokens: 88693800960 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.514156E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.888 | TFLOPs: 11.97 | 7: iteration 169180/ 173500 | consumed samples: 43310080 | consumed tokens: 88699043840 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.508530E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3220.311 | TFLOPs: 11.98 | 7: iteration 169190/ 173500 | consumed samples: 43312640 | consumed tokens: 88704286720 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.501140E+00 | grad norm: 0.655 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.713 | TFLOPs: 12.00 | 7: iteration 169200/ 173500 | consumed samples: 43315200 | consumed tokens: 88709529600 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.500861E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.874 | TFLOPs: 11.96 | 7: iteration 169210/ 173500 | consumed samples: 43317760 | consumed tokens: 88714772480 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.497316E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.823 | TFLOPs: 11.99 | 7: iteration 169220/ 173500 | consumed samples: 43320320 | consumed tokens: 88720015360 | elapsed time per iteration (s): 0.08 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 4.502357E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.123 | TFLOPs: 11.84 | 7: iteration 169230/ 173500 | consumed samples: 43322880 | consumed tokens: 88725258240 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.508028E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.432 | TFLOPs: 11.80 | 7: iteration 169240/ 173500 | consumed samples: 43325440 | consumed tokens: 88730501120 | elapsed time per iteration (s): 0.09 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.505750E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3002.524 | TFLOPs: 11.17 | 7: iteration 169250/ 173500 | consumed samples: 43328000 | consumed tokens: 88735744000 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.503893E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.577 | TFLOPs: 11.84 | 7: iteration 169260/ 173500 | consumed samples: 43330560 | consumed tokens: 88740986880 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.505491E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3140.171 | TFLOPs: 11.68 | 7: iteration 169270/ 173500 | consumed samples: 43333120 | consumed tokens: 88746229760 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.493824E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.468 | TFLOPs: 11.86 | 7: iteration 169280/ 173500 | consumed samples: 43335680 | consumed tokens: 88751472640 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.509457E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.686 | TFLOPs: 11.61 | 7: iteration 169290/ 173500 | consumed samples: 43338240 | consumed tokens: 88756715520 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.500491E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.394 | TFLOPs: 12.01 | 7: iteration 169300/ 173500 | consumed samples: 43340800 | consumed tokens: 88761958400 | elapsed time per iteration (s): 0.08 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 4.518830E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.523 | TFLOPs: 11.99 | 7: iteration 169310/ 173500 | consumed samples: 43343360 | consumed tokens: 88767201280 | elapsed time per iteration (s): 0.09 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.513566E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2752.627 | TFLOPs: 10.24 | 7: iteration 169320/ 173500 | consumed samples: 43345920 | consumed tokens: 88772444160 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.512325E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.550 | TFLOPs: 11.92 | 7: iteration 169330/ 173500 | consumed samples: 43348480 | consumed tokens: 88777687040 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.505157E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3218.321 | TFLOPs: 11.97 | 7: iteration 169340/ 173500 | consumed samples: 43351040 | consumed tokens: 88782929920 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.502224E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.677 | TFLOPs: 11.96 | 7: iteration 169350/ 173500 | consumed samples: 43353600 | consumed tokens: 88788172800 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.485021E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.514 | TFLOPs: 11.92 | 7: iteration 169360/ 173500 | consumed samples: 43356160 | consumed tokens: 88793415680 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.506191E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.823 | TFLOPs: 11.66 | 7: iteration 169370/ 173500 | consumed samples: 43358720 | consumed tokens: 88798658560 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.497849E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.926 | TFLOPs: 11.80 | 7: iteration 169380/ 173500 | consumed samples: 43361280 | consumed tokens: 88803901440 | elapsed time per iteration (s): 0.08 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 4.500470E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.309 | TFLOPs: 11.75 | 7: iteration 169390/ 173500 | consumed samples: 43363840 | consumed tokens: 88809144320 | elapsed time per iteration (s): 0.09 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.518946E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2911.874 | TFLOPs: 10.83 | 7: iteration 169400/ 173500 | consumed samples: 43366400 | consumed tokens: 88814387200 | elapsed time per iteration (s): 0.08 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.494628E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.983 | TFLOPs: 11.93 | 7: iteration 169410/ 173500 | consumed samples: 43368960 | consumed tokens: 88819630080 | elapsed time per iteration (s): 0.08 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.503155E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3161.049 | TFLOPs: 11.76 | 7: iteration 169420/ 173500 | consumed samples: 43371520 | consumed tokens: 88824872960 | elapsed time per iteration (s): 0.08 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.516726E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.649 | TFLOPs: 11.81 | 7: iteration 169430/ 173500 | consumed samples: 43374080 | consumed tokens: 88830115840 | elapsed time per iteration (s): 0.09 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.492102E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2997.730 | TFLOPs: 11.15 | 7: iteration 169440/ 173500 | consumed samples: 43376640 | consumed tokens: 88835358720 | elapsed time per iteration (s): 0.08 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.494493E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3209.632 | TFLOPs: 11.94 | 7: iteration 169450/ 173500 | consumed samples: 43379200 | consumed tokens: 88840601600 | elapsed time per iteration (s): 0.09 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.500968E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2876.603 | TFLOPs: 10.70 | 7: iteration 169460/ 173500 | consumed samples: 43381760 | consumed tokens: 88845844480 | elapsed time per iteration (s): 0.08 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 4.512640E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.139 | TFLOPs: 11.85 | 7: iteration 169470/ 173500 | consumed samples: 43384320 | consumed tokens: 88851087360 | elapsed time per iteration (s): 0.09 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.498019E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.587 | TFLOPs: 10.60 | 7: iteration 169480/ 173500 | consumed samples: 43386880 | consumed tokens: 88856330240 | elapsed time per iteration (s): 0.09 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.492567E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2887.268 | TFLOPs: 10.74 | 7: iteration 169490/ 173500 | consumed samples: 43389440 | consumed tokens: 88861573120 | elapsed time per iteration (s): 0.09 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.496177E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2868.020 | TFLOPs: 10.67 | 7: iteration 169500/ 173500 | consumed samples: 43392000 | consumed tokens: 88866816000 | elapsed time per iteration (s): 0.08 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.501027E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3159.836 | TFLOPs: 11.75 | 7: iteration 169510/ 173500 | consumed samples: 43394560 | consumed tokens: 88872058880 | elapsed time per iteration (s): 0.09 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.493513E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2821.921 | TFLOPs: 10.50 | 7: iteration 169520/ 173500 | consumed samples: 43397120 | consumed tokens: 88877301760 | elapsed time per iteration (s): 0.08 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.516779E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.509 | TFLOPs: 11.94 | 7: iteration 169530/ 173500 | consumed samples: 43399680 | consumed tokens: 88882544640 | elapsed time per iteration (s): 0.08 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.509137E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.555 | TFLOPs: 11.90 | 7: iteration 169540/ 173500 | consumed samples: 43402240 | consumed tokens: 88887787520 | elapsed time per iteration (s): 0.08 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 4.503551E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.156 | TFLOPs: 11.89 | 7: iteration 169550/ 173500 | consumed samples: 43404800 | consumed tokens: 88893030400 | elapsed time per iteration (s): 0.08 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.504887E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.548 | TFLOPs: 11.91 | 7: iteration 169560/ 173500 | consumed samples: 43407360 | consumed tokens: 88898273280 | elapsed time per iteration (s): 0.10 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.493181E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2596.049 | TFLOPs: 9.66 | 7: iteration 169570/ 173500 | consumed samples: 43409920 | consumed tokens: 88903516160 | elapsed time per iteration (s): 0.08 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.502398E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3121.268 | TFLOPs: 11.61 | 7: iteration 169580/ 173500 | consumed samples: 43412480 | consumed tokens: 88908759040 | elapsed time per iteration (s): 0.08 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.509883E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.170 | TFLOPs: 11.88 | 7: iteration 169590/ 173500 | consumed samples: 43415040 | consumed tokens: 88914001920 | elapsed time per iteration (s): 0.08 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.500944E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.903 | TFLOPs: 11.56 | 7: iteration 169600/ 173500 | consumed samples: 43417600 | consumed tokens: 88919244800 | elapsed time per iteration (s): 0.09 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.518340E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2698.710 | TFLOPs: 10.04 | 7: iteration 169610/ 173500 | consumed samples: 43420160 | consumed tokens: 88924487680 | elapsed time per iteration (s): 0.08 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.502036E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3096.774 | TFLOPs: 11.52 | 7: iteration 169620/ 173500 | consumed samples: 43422720 | consumed tokens: 88929730560 | elapsed time per iteration (s): 0.10 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.495275E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2641.222 | TFLOPs: 9.82 | 7: iteration 169630/ 173500 | consumed samples: 43425280 | consumed tokens: 88934973440 | elapsed time per iteration (s): 0.09 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 4.503382E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2750.568 | TFLOPs: 10.23 | 7: iteration 169640/ 173500 | consumed samples: 43427840 | consumed tokens: 88940216320 | elapsed time per iteration (s): 0.09 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.513525E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.817 | TFLOPs: 11.11 | 7: iteration 169650/ 173500 | consumed samples: 43430400 | consumed tokens: 88945459200 | elapsed time per iteration (s): 0.13 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.499630E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1993.904 | TFLOPs: 7.42 | 7: iteration 169660/ 173500 | consumed samples: 43432960 | consumed tokens: 88950702080 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.502335E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.612 | TFLOPs: 11.99 | 7: iteration 169670/ 173500 | consumed samples: 43435520 | consumed tokens: 88955944960 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.503054E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.319 | TFLOPs: 12.05 | 7: iteration 169680/ 173500 | consumed samples: 43438080 | consumed tokens: 88961187840 | elapsed time per iteration (s): 0.09 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.501222E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.261 | TFLOPs: 11.15 | 7: iteration 169690/ 173500 | consumed samples: 43440640 | consumed tokens: 88966430720 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.492362E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.123 | TFLOPs: 11.96 | 7: iteration 169700/ 173500 | consumed samples: 43443200 | consumed tokens: 88971673600 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.507723E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3039.560 | TFLOPs: 11.31 | 7: iteration 169710/ 173500 | consumed samples: 43445760 | consumed tokens: 88976916480 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.506924E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3244.453 | TFLOPs: 12.07 | 7: iteration 169720/ 173500 | consumed samples: 43448320 | consumed tokens: 88982159360 | elapsed time per iteration (s): 0.08 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 4.494898E+00 | grad norm: 0.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.729 | TFLOPs: 11.83 | 7: iteration 169730/ 173500 | consumed samples: 43450880 | consumed tokens: 88987402240 | elapsed time per iteration (s): 0.08 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.503647E+00 | grad norm: 0.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.549 | TFLOPs: 11.56 | 7: iteration 169740/ 173500 | consumed samples: 43453440 | consumed tokens: 88992645120 | elapsed time per iteration (s): 0.09 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.507468E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2999.718 | TFLOPs: 11.16 | 7: iteration 169750/ 173500 | consumed samples: 43456000 | consumed tokens: 88997888000 | elapsed time per iteration (s): 0.09 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.507704E+00 | grad norm: 0.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2994.627 | TFLOPs: 11.14 | 7: iteration 169760/ 173500 | consumed samples: 43458560 | consumed tokens: 89003130880 | elapsed time per iteration (s): 0.10 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.506030E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2610.550 | TFLOPs: 9.71 | 7: iteration 169770/ 173500 | consumed samples: 43461120 | consumed tokens: 89008373760 | elapsed time per iteration (s): 0.08 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.492765E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.567 | TFLOPs: 12.01 | 7: iteration 169780/ 173500 | consumed samples: 43463680 | consumed tokens: 89013616640 | elapsed time per iteration (s): 0.10 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.491339E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2481.451 | TFLOPs: 9.23 | 7: iteration 169790/ 173500 | consumed samples: 43466240 | consumed tokens: 89018859520 | elapsed time per iteration (s): 0.08 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.498334E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.415 | TFLOPs: 11.29 | 7: iteration 169800/ 173500 | consumed samples: 43468800 | consumed tokens: 89024102400 | elapsed time per iteration (s): 0.08 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 4.501778E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.064 | TFLOPs: 11.86 | 7: iteration 169810/ 173500 | consumed samples: 43471360 | consumed tokens: 89029345280 | elapsed time per iteration (s): 0.08 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.495279E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.382 | TFLOPs: 11.82 | 7: iteration 169820/ 173500 | consumed samples: 43473920 | consumed tokens: 89034588160 | elapsed time per iteration (s): 0.10 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.516775E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2617.253 | TFLOPs: 9.74 | 7: iteration 169830/ 173500 | consumed samples: 43476480 | consumed tokens: 89039831040 | elapsed time per iteration (s): 0.08 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.502781E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.104 | TFLOPs: 11.88 | 7: iteration 169840/ 173500 | consumed samples: 43479040 | consumed tokens: 89045073920 | elapsed time per iteration (s): 0.09 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.500961E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2993.048 | TFLOPs: 11.13 | 7: iteration 169850/ 173500 | consumed samples: 43481600 | consumed tokens: 89050316800 | elapsed time per iteration (s): 0.09 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.502512E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3000.793 | TFLOPs: 11.16 | 7: iteration 169860/ 173500 | consumed samples: 43484160 | consumed tokens: 89055559680 | elapsed time per iteration (s): 0.08 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.501941E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3146.769 | TFLOPs: 11.70 | 7: iteration 169870/ 173500 | consumed samples: 43486720 | consumed tokens: 89060802560 | elapsed time per iteration (s): 0.13 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.513212E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 1948.914 | TFLOPs: 7.25 | 7: iteration 169880/ 173500 | consumed samples: 43489280 | consumed tokens: 89066045440 | elapsed time per iteration (s): 0.11 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.506300E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2295.363 | TFLOPs: 8.54 | 7: iteration 169890/ 173500 | consumed samples: 43491840 | consumed tokens: 89071288320 | elapsed time per iteration (s): 0.08 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.501443E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.571 | TFLOPs: 12.03 | 7: iteration 169900/ 173500 | consumed samples: 43494400 | consumed tokens: 89076531200 | elapsed time per iteration (s): 0.08 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 4.509104E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3036.884 | TFLOPs: 11.30 | 7: iteration 169910/ 173500 | consumed samples: 43496960 | consumed tokens: 89081774080 | elapsed time per iteration (s): 0.09 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.507301E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2908.890 | TFLOPs: 10.82 | 7: iteration 169920/ 173500 | consumed samples: 43499520 | consumed tokens: 89087016960 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.521144E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.883 | TFLOPs: 12.01 | 7: iteration 169930/ 173500 | consumed samples: 43502080 | consumed tokens: 89092259840 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.514016E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.439 | TFLOPs: 12.03 | 7: iteration 169940/ 173500 | consumed samples: 43504640 | consumed tokens: 89097502720 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.512562E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3232.232 | TFLOPs: 12.02 | 7: iteration 169950/ 173500 | consumed samples: 43507200 | consumed tokens: 89102745600 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.499689E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.289 | TFLOPs: 12.00 | 7: iteration 169960/ 173500 | consumed samples: 43509760 | consumed tokens: 89107988480 | elapsed time per iteration (s): 0.09 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.502020E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2910.050 | TFLOPs: 10.82 | 7: iteration 169970/ 173500 | consumed samples: 43512320 | consumed tokens: 89113231360 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.495274E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.592 | TFLOPs: 11.28 | 7: iteration 169980/ 173500 | consumed samples: 43514880 | consumed tokens: 89118474240 | elapsed time per iteration (s): 0.08 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.495052E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.406 | TFLOPs: 12.00 | 7: iteration 169990/ 173500 | consumed samples: 43517440 | consumed tokens: 89123717120 | elapsed time per iteration (s): 0.09 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 4.503640E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.663 | TFLOPs: 11.01 | 0: [2023-03-17 04:24:02,277] [INFO] [logging.py:68:log_dist] [Rank 0] step=170000, skipped=0, lr=[2.0184402348785326e-05, 2.0184402348785326e-05, 2.0184402348785326e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 170000/ 173500 | consumed samples: 43520000 | consumed tokens: 89128960000 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.501927E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.639 | TFLOPs: 12.03 | 0: steps: 170000 loss: 4.5336 iter time (s): 0.086 samples/sec: 2983.948 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 170000 | lm loss value: 4.387980E+00 | lm loss PPL: 8.047765E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 170000 to checkpoints_14m91b100m 0: [2023-03-17 04:24:02,335] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step170000 is begin to save! 0: [2023-03-17 04:24:02,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:24:02,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:24:02,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:24:02,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:24:02,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:24:02,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:24:02,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:24:02,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:24:02,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:24:02,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:24:02,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:24:02,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:24:02,382] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step170000/mp_rank_00_model_states.pt 0: [2023-03-17 04:24:02,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:24:02,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:24:02,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:24:02,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 6: [2023-03-17 04:24:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 7: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 7: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 2: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 3: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 5: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 1: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 4: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:24:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step170000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:24:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step170000 is ready now! 0: successfully saved checkpoint at iteration 170000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 87.50 7: iteration 170010/ 173500 | consumed samples: 43522560 | consumed tokens: 89134202880 | elapsed time per iteration (s): 0.09 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.509333E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.850 | TFLOPs: 10.16 | 7: iteration 170020/ 173500 | consumed samples: 43525120 | consumed tokens: 89139445760 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.495550E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.929 | TFLOPs: 11.93 | 7: iteration 170030/ 173500 | consumed samples: 43527680 | consumed tokens: 89144688640 | elapsed time per iteration (s): 0.09 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.505669E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2861.406 | TFLOPs: 10.64 | 7: iteration 170040/ 173500 | consumed samples: 43530240 | consumed tokens: 89149931520 | elapsed time per iteration (s): 0.09 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.502540E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2965.699 | TFLOPs: 11.03 | 7: iteration 170050/ 173500 | consumed samples: 43532800 | consumed tokens: 89155174400 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.504263E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.207 | TFLOPs: 11.89 | 7: iteration 170060/ 173500 | consumed samples: 43535360 | consumed tokens: 89160417280 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.506300E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.833 | TFLOPs: 11.94 | 7: iteration 170070/ 173500 | consumed samples: 43537920 | consumed tokens: 89165660160 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.512332E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3128.940 | TFLOPs: 11.64 | 7: iteration 170080/ 173500 | consumed samples: 43540480 | consumed tokens: 89170903040 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.498533E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.675 | TFLOPs: 11.95 | 7: iteration 170090/ 173500 | consumed samples: 43543040 | consumed tokens: 89176145920 | elapsed time per iteration (s): 0.08 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 4.497335E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.654 | TFLOPs: 11.90 | 7: iteration 170100/ 173500 | consumed samples: 43545600 | consumed tokens: 89181388800 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.507390E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.897 | TFLOPs: 11.87 | 7: iteration 170110/ 173500 | consumed samples: 43548160 | consumed tokens: 89186631680 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.504600E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3210.675 | TFLOPs: 11.94 | 7: iteration 170120/ 173500 | consumed samples: 43550720 | consumed tokens: 89191874560 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.504624E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.337 | TFLOPs: 11.92 | 7: iteration 170130/ 173500 | consumed samples: 43553280 | consumed tokens: 89197117440 | elapsed time per iteration (s): 0.11 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.499321E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2298.460 | TFLOPs: 8.55 | 7: iteration 170140/ 173500 | consumed samples: 43555840 | consumed tokens: 89202360320 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.507283E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.032 | TFLOPs: 11.86 | 7: iteration 170150/ 173500 | consumed samples: 43558400 | consumed tokens: 89207603200 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.516190E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.985 | TFLOPs: 11.87 | 7: iteration 170160/ 173500 | consumed samples: 43560960 | consumed tokens: 89212846080 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.500481E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.947 | TFLOPs: 11.90 | 7: iteration 170170/ 173500 | consumed samples: 43563520 | consumed tokens: 89218088960 | elapsed time per iteration (s): 0.11 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.498892E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2392.891 | TFLOPs: 8.90 | 7: iteration 170180/ 173500 | consumed samples: 43566080 | consumed tokens: 89223331840 | elapsed time per iteration (s): 0.08 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 4.507563E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3153.154 | TFLOPs: 11.73 | 7: iteration 170190/ 173500 | consumed samples: 43568640 | consumed tokens: 89228574720 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.507681E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.228 | TFLOPs: 11.85 | 7: iteration 170200/ 173500 | consumed samples: 43571200 | consumed tokens: 89233817600 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.493520E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.681 | TFLOPs: 11.87 | 7: iteration 170210/ 173500 | consumed samples: 43573760 | consumed tokens: 89239060480 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.499129E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.197 | TFLOPs: 11.82 | 7: iteration 170220/ 173500 | consumed samples: 43576320 | consumed tokens: 89244303360 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.512395E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.246 | TFLOPs: 11.86 | 7: iteration 170230/ 173500 | consumed samples: 43578880 | consumed tokens: 89249546240 | elapsed time per iteration (s): 0.12 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.512678E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2180.101 | TFLOPs: 8.11 | 7: iteration 170240/ 173500 | consumed samples: 43581440 | consumed tokens: 89254789120 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.491348E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.600 | TFLOPs: 11.88 | 7: iteration 170250/ 173500 | consumed samples: 43584000 | consumed tokens: 89260032000 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.503038E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.238 | TFLOPs: 11.88 | 7: iteration 170260/ 173500 | consumed samples: 43586560 | consumed tokens: 89265274880 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.511163E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3086.031 | TFLOPs: 11.48 | 7: iteration 170270/ 173500 | consumed samples: 43589120 | consumed tokens: 89270517760 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.510629E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3016.496 | TFLOPs: 11.22 | 7: iteration 170280/ 173500 | consumed samples: 43591680 | consumed tokens: 89275760640 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.496543E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3177.069 | TFLOPs: 11.82 | 7: iteration 170290/ 173500 | consumed samples: 43594240 | consumed tokens: 89281003520 | elapsed time per iteration (s): 0.08 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 4.497084E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.139 | TFLOPs: 11.90 | 7: iteration 170300/ 173500 | consumed samples: 43596800 | consumed tokens: 89286246400 | elapsed time per iteration (s): 0.09 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.495560E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2923.755 | TFLOPs: 10.88 | 7: iteration 170310/ 173500 | consumed samples: 43599360 | consumed tokens: 89291489280 | elapsed time per iteration (s): 0.10 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.503443E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2520.291 | TFLOPs: 9.37 | 7: iteration 170320/ 173500 | consumed samples: 43601920 | consumed tokens: 89296732160 | elapsed time per iteration (s): 0.08 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.512584E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3024.443 | TFLOPs: 11.25 | 7: iteration 170330/ 173500 | consumed samples: 43604480 | consumed tokens: 89301975040 | elapsed time per iteration (s): 0.09 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.502835E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2832.856 | TFLOPs: 10.54 | 7: iteration 170340/ 173500 | consumed samples: 43607040 | consumed tokens: 89307217920 | elapsed time per iteration (s): 0.08 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.514071E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3133.095 | TFLOPs: 11.65 | 7: iteration 170350/ 173500 | consumed samples: 43609600 | consumed tokens: 89312460800 | elapsed time per iteration (s): 0.09 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.503123E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.891 | TFLOPs: 10.84 | 7: iteration 170360/ 173500 | consumed samples: 43612160 | consumed tokens: 89317703680 | elapsed time per iteration (s): 0.08 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.509834E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.447 | TFLOPs: 11.89 | 7: iteration 170370/ 173500 | consumed samples: 43614720 | consumed tokens: 89322946560 | elapsed time per iteration (s): 0.08 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.504038E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.717 | TFLOPs: 11.72 | 7: iteration 170380/ 173500 | consumed samples: 43617280 | consumed tokens: 89328189440 | elapsed time per iteration (s): 0.08 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.507923E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.554 | TFLOPs: 11.85 | 7: iteration 170390/ 173500 | consumed samples: 43619840 | consumed tokens: 89333432320 | elapsed time per iteration (s): 0.09 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 4.503818E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2864.422 | TFLOPs: 10.65 | 7: iteration 170400/ 173500 | consumed samples: 43622400 | consumed tokens: 89338675200 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.498367E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.511 | TFLOPs: 11.80 | 7: iteration 170410/ 173500 | consumed samples: 43624960 | consumed tokens: 89343918080 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.513990E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.859 | TFLOPs: 11.85 | 7: iteration 170420/ 173500 | consumed samples: 43627520 | consumed tokens: 89349160960 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.504269E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3028.634 | TFLOPs: 11.27 | 7: iteration 170430/ 173500 | consumed samples: 43630080 | consumed tokens: 89354403840 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.498544E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.233 | TFLOPs: 11.29 | 7: iteration 170440/ 173500 | consumed samples: 43632640 | consumed tokens: 89359646720 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.505420E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3156.699 | TFLOPs: 11.74 | 7: iteration 170450/ 173500 | consumed samples: 43635200 | consumed tokens: 89364889600 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.505509E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3162.593 | TFLOPs: 11.76 | 7: iteration 170460/ 173500 | consumed samples: 43637760 | consumed tokens: 89370132480 | elapsed time per iteration (s): 0.09 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.519465E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.334 | TFLOPs: 11.15 | 7: iteration 170470/ 173500 | consumed samples: 43640320 | consumed tokens: 89375375360 | elapsed time per iteration (s): 0.09 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.501205E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2992.611 | TFLOPs: 11.13 | 7: iteration 170480/ 173500 | consumed samples: 43642880 | consumed tokens: 89380618240 | elapsed time per iteration (s): 0.11 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.511377E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2318.850 | TFLOPs: 8.63 | 7: iteration 170490/ 173500 | consumed samples: 43645440 | consumed tokens: 89385861120 | elapsed time per iteration (s): 0.08 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.488447E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3119.714 | TFLOPs: 11.60 | 7: iteration 170500/ 173500 | consumed samples: 43648000 | consumed tokens: 89391104000 | elapsed time per iteration (s): 0.09 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 4.501790E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2858.057 | TFLOPs: 10.63 | 7: iteration 170510/ 173500 | consumed samples: 43650560 | consumed tokens: 89396346880 | elapsed time per iteration (s): 0.11 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.508888E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2270.127 | TFLOPs: 8.44 | 7: iteration 170520/ 173500 | consumed samples: 43653120 | consumed tokens: 89401589760 | elapsed time per iteration (s): 0.09 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.499428E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.004 | TFLOPs: 11.14 | 7: iteration 170530/ 173500 | consumed samples: 43655680 | consumed tokens: 89406832640 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.490465E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.037 | TFLOPs: 11.87 | 7: iteration 170540/ 173500 | consumed samples: 43658240 | consumed tokens: 89412075520 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.512939E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.359 | TFLOPs: 11.51 | 7: iteration 170550/ 173500 | consumed samples: 43660800 | consumed tokens: 89417318400 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.507217E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.524 | TFLOPs: 11.87 | 7: iteration 170560/ 173500 | consumed samples: 43663360 | consumed tokens: 89422561280 | elapsed time per iteration (s): 0.09 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.479712E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.335 | TFLOPs: 10.53 | 7: iteration 170570/ 173500 | consumed samples: 43665920 | consumed tokens: 89427804160 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.497702E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3174.244 | TFLOPs: 11.81 | 7: iteration 170580/ 173500 | consumed samples: 43668480 | consumed tokens: 89433047040 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.497956E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.594 | TFLOPs: 11.29 | 7: iteration 170590/ 173500 | consumed samples: 43671040 | consumed tokens: 89438289920 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.516727E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.596 | TFLOPs: 11.88 | 7: iteration 170600/ 173500 | consumed samples: 43673600 | consumed tokens: 89443532800 | elapsed time per iteration (s): 0.09 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.494499E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2720.642 | TFLOPs: 10.12 | 7: iteration 170610/ 173500 | consumed samples: 43676160 | consumed tokens: 89448775680 | elapsed time per iteration (s): 0.08 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 4.498180E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.572 | TFLOPs: 11.86 | 7: iteration 170620/ 173500 | consumed samples: 43678720 | consumed tokens: 89454018560 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.495767E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.551 | TFLOPs: 11.89 | 7: iteration 170630/ 173500 | consumed samples: 43681280 | consumed tokens: 89459261440 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.500913E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.388 | TFLOPs: 11.42 | 7: iteration 170640/ 173500 | consumed samples: 43683840 | consumed tokens: 89464504320 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.488314E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.199 | TFLOPs: 11.85 | 7: iteration 170650/ 173500 | consumed samples: 43686400 | consumed tokens: 89469747200 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.494009E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.537 | TFLOPs: 11.87 | 7: iteration 170660/ 173500 | consumed samples: 43688960 | consumed tokens: 89474990080 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.498437E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3152.790 | TFLOPs: 11.73 | 7: iteration 170670/ 173500 | consumed samples: 43691520 | consumed tokens: 89480232960 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.499634E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.186 | TFLOPs: 11.91 | 7: iteration 170680/ 173500 | consumed samples: 43694080 | consumed tokens: 89485475840 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.513967E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.828 | TFLOPs: 11.90 | 7: iteration 170690/ 173500 | consumed samples: 43696640 | consumed tokens: 89490718720 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.501395E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3149.632 | TFLOPs: 11.72 | 7: iteration 170700/ 173500 | consumed samples: 43699200 | consumed tokens: 89495961600 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.505521E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3085.388 | TFLOPs: 11.48 | 7: iteration 170710/ 173500 | consumed samples: 43701760 | consumed tokens: 89501204480 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.509613E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.832 | TFLOPs: 11.89 | 7: iteration 170720/ 173500 | consumed samples: 43704320 | consumed tokens: 89506447360 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.500285E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.927 | TFLOPs: 11.95 | 7: iteration 170730/ 173500 | consumed samples: 43706880 | consumed tokens: 89511690240 | elapsed time per iteration (s): 0.08 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 4.508077E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.454 | TFLOPs: 11.90 | 7: iteration 170740/ 173500 | consumed samples: 43709440 | consumed tokens: 89516933120 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.504572E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.501 | TFLOPs: 11.89 | 7: iteration 170750/ 173500 | consumed samples: 43712000 | consumed tokens: 89522176000 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.522156E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.834 | TFLOPs: 11.90 | 7: iteration 170760/ 173500 | consumed samples: 43714560 | consumed tokens: 89527418880 | elapsed time per iteration (s): 0.11 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.504195E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2334.300 | TFLOPs: 8.68 | 7: iteration 170770/ 173500 | consumed samples: 43717120 | consumed tokens: 89532661760 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.501218E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3136.767 | TFLOPs: 11.67 | 7: iteration 170780/ 173500 | consumed samples: 43719680 | consumed tokens: 89537904640 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.510027E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.785 | TFLOPs: 11.85 | 7: iteration 170790/ 173500 | consumed samples: 43722240 | consumed tokens: 89543147520 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.507323E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.452 | TFLOPs: 11.85 | 7: iteration 170800/ 173500 | consumed samples: 43724800 | consumed tokens: 89548390400 | elapsed time per iteration (s): 0.09 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.515351E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2998.368 | TFLOPs: 11.15 | 7: iteration 170810/ 173500 | consumed samples: 43727360 | consumed tokens: 89553633280 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.522060E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.484 | TFLOPs: 11.89 | 7: iteration 170820/ 173500 | consumed samples: 43729920 | consumed tokens: 89558876160 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.511627E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.102 | TFLOPs: 11.90 | 7: iteration 170830/ 173500 | consumed samples: 43732480 | consumed tokens: 89564119040 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.501671E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.257 | TFLOPs: 11.84 | 7: iteration 170840/ 173500 | consumed samples: 43735040 | consumed tokens: 89569361920 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.498296E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3117.411 | TFLOPs: 11.60 | 7: iteration 170850/ 173500 | consumed samples: 43737600 | consumed tokens: 89574604800 | elapsed time per iteration (s): 0.08 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 4.504312E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3182.389 | TFLOPs: 11.84 | 7: iteration 170860/ 173500 | consumed samples: 43740160 | consumed tokens: 89579847680 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.497328E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3049.963 | TFLOPs: 11.34 | 7: iteration 170870/ 173500 | consumed samples: 43742720 | consumed tokens: 89585090560 | elapsed time per iteration (s): 0.09 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.499842E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2841.023 | TFLOPs: 10.57 | 7: iteration 170880/ 173500 | consumed samples: 43745280 | consumed tokens: 89590333440 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.513129E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.071 | TFLOPs: 11.91 | 7: iteration 170890/ 173500 | consumed samples: 43747840 | consumed tokens: 89595576320 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.504502E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.155 | TFLOPs: 11.67 | 7: iteration 170900/ 173500 | consumed samples: 43750400 | consumed tokens: 89600819200 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.503901E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3137.904 | TFLOPs: 11.67 | 7: iteration 170910/ 173500 | consumed samples: 43752960 | consumed tokens: 89606062080 | elapsed time per iteration (s): 0.10 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.507521E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2580.962 | TFLOPs: 9.60 | 7: iteration 170920/ 173500 | consumed samples: 43755520 | consumed tokens: 89611304960 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.507689E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.191 | TFLOPs: 11.98 | 7: iteration 170930/ 173500 | consumed samples: 43758080 | consumed tokens: 89616547840 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.517979E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.427 | TFLOPs: 11.96 | 7: iteration 170940/ 173500 | consumed samples: 43760640 | consumed tokens: 89621790720 | elapsed time per iteration (s): 0.08 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.508434E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.418 | TFLOPs: 11.98 | 7: iteration 170950/ 173500 | consumed samples: 43763200 | consumed tokens: 89627033600 | elapsed time per iteration (s): 0.09 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.507270E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2843.004 | TFLOPs: 10.57 | 7: iteration 170960/ 173500 | consumed samples: 43765760 | consumed tokens: 89632276480 | elapsed time per iteration (s): 0.10 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.510109E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2442.824 | TFLOPs: 9.09 | 7: iteration 170970/ 173500 | consumed samples: 43768320 | consumed tokens: 89637519360 | elapsed time per iteration (s): 0.10 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.500247E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2683.933 | TFLOPs: 9.98 | 7: iteration 170980/ 173500 | consumed samples: 43770880 | consumed tokens: 89642762240 | elapsed time per iteration (s): 0.09 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 4.507614E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3008.309 | TFLOPs: 11.19 | 7: iteration 170990/ 173500 | consumed samples: 43773440 | consumed tokens: 89648005120 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.503788E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.550 | TFLOPs: 11.87 | 7: iteration 171000/ 173500 | consumed samples: 43776000 | consumed tokens: 89653248000 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.507404E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.205 | TFLOPs: 11.99 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 171000 | lm loss value: 4.407505E+00 | lm loss PPL: 8.206446E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 171000 to checkpoints_14m91b100m 0: [2023-03-17 04:25:27,106] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step171000 is begin to save! 0: [2023-03-17 04:25:27,109] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:25:27,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:25:27,133] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:25:27,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:25:27,138] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:25:27,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:25:27,141] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:25:27,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:25:27,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:25:27,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:25:27,148] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:25:27,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:25:27,149] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step171000/mp_rank_00_model_states.pt 0: [2023-03-17 04:25:27,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:25:27,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:25:27,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:25:27,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:25:27,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:25:27,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:25:27,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 7: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:25:27,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 3: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 1: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:25:27,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 6: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 2: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 5: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 4: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step171000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 4: [2023-03-17 04:25:27,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step171000 is ready now! 0: successfully saved checkpoint at iteration 171000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 80.81 7: iteration 171010/ 173500 | consumed samples: 43778560 | consumed tokens: 89658490880 | elapsed time per iteration (s): 0.09 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.509525E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2807.203 | TFLOPs: 10.44 | 7: iteration 171020/ 173500 | consumed samples: 43781120 | consumed tokens: 89663733760 | elapsed time per iteration (s): 0.09 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.500657E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2849.368 | TFLOPs: 10.60 | 7: iteration 171030/ 173500 | consumed samples: 43783680 | consumed tokens: 89668976640 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.499254E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.695 | TFLOPs: 11.91 | 7: iteration 171040/ 173500 | consumed samples: 43786240 | consumed tokens: 89674219520 | elapsed time per iteration (s): 0.10 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.498837E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2607.225 | TFLOPs: 9.70 | 7: iteration 171050/ 173500 | consumed samples: 43788800 | consumed tokens: 89679462400 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.507693E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3198.972 | TFLOPs: 11.90 | 7: iteration 171060/ 173500 | consumed samples: 43791360 | consumed tokens: 89684705280 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.489506E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3134.040 | TFLOPs: 11.66 | 7: iteration 171070/ 173500 | consumed samples: 43793920 | consumed tokens: 89689948160 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.508037E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.070 | TFLOPs: 11.94 | 7: iteration 171080/ 173500 | consumed samples: 43796480 | consumed tokens: 89695191040 | elapsed time per iteration (s): 0.09 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.500729E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2847.406 | TFLOPs: 10.59 | 7: iteration 171090/ 173500 | consumed samples: 43799040 | consumed tokens: 89700433920 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.502539E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3075.177 | TFLOPs: 11.44 | 7: iteration 171100/ 173500 | consumed samples: 43801600 | consumed tokens: 89705676800 | elapsed time per iteration (s): 0.09 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.515586E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2840.210 | TFLOPs: 10.56 | 7: iteration 171110/ 173500 | consumed samples: 43804160 | consumed tokens: 89710919680 | elapsed time per iteration (s): 0.08 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.501262E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3204.557 | TFLOPs: 11.92 | 7: iteration 171120/ 173500 | consumed samples: 43806720 | consumed tokens: 89716162560 | elapsed time per iteration (s): 0.09 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 4.507811E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2924.079 | TFLOPs: 10.88 | 7: iteration 171130/ 173500 | consumed samples: 43809280 | consumed tokens: 89721405440 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.502625E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2874.615 | TFLOPs: 10.69 | 7: iteration 171140/ 173500 | consumed samples: 43811840 | consumed tokens: 89726648320 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.497142E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2987.336 | TFLOPs: 11.11 | 7: iteration 171150/ 173500 | consumed samples: 43814400 | consumed tokens: 89731891200 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.502453E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2996.671 | TFLOPs: 11.15 | 7: iteration 171160/ 173500 | consumed samples: 43816960 | consumed tokens: 89737134080 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.501587E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.290 | TFLOPs: 10.04 | 7: iteration 171170/ 173500 | consumed samples: 43819520 | consumed tokens: 89742376960 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.510504E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2863.270 | TFLOPs: 10.65 | 7: iteration 171180/ 173500 | consumed samples: 43822080 | consumed tokens: 89747619840 | elapsed time per iteration (s): 0.09 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.513160E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2736.437 | TFLOPs: 10.18 | 7: iteration 171190/ 173500 | consumed samples: 43824640 | consumed tokens: 89752862720 | elapsed time per iteration (s): 0.11 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.492826E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2348.396 | TFLOPs: 8.74 | 7: iteration 171200/ 173500 | consumed samples: 43827200 | consumed tokens: 89758105600 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.511072E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3066.386 | TFLOPs: 11.41 | 7: iteration 171210/ 173500 | consumed samples: 43829760 | consumed tokens: 89763348480 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.506624E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.824 | TFLOPs: 11.70 | 7: iteration 171220/ 173500 | consumed samples: 43832320 | consumed tokens: 89768591360 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.503228E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3037.085 | TFLOPs: 11.30 | 7: iteration 171230/ 173500 | consumed samples: 43834880 | consumed tokens: 89773834240 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.498086E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3122.728 | TFLOPs: 11.62 | 7: iteration 171240/ 173500 | consumed samples: 43837440 | consumed tokens: 89779077120 | elapsed time per iteration (s): 0.11 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.500074E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2417.044 | TFLOPs: 8.99 | 7: iteration 171250/ 173500 | consumed samples: 43840000 | consumed tokens: 89784320000 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.499426E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.112 | TFLOPs: 11.44 | 7: iteration 171260/ 173500 | consumed samples: 43842560 | consumed tokens: 89789562880 | elapsed time per iteration (s): 0.08 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 4.502259E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.562 | TFLOPs: 11.87 | 7: iteration 171270/ 173500 | consumed samples: 43845120 | consumed tokens: 89794805760 | elapsed time per iteration (s): 0.12 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.500832E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2079.482 | TFLOPs: 7.73 | 7: iteration 171280/ 173500 | consumed samples: 43847680 | consumed tokens: 89800048640 | elapsed time per iteration (s): 0.11 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.507517E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2303.364 | TFLOPs: 8.57 | 7: iteration 171290/ 173500 | consumed samples: 43850240 | consumed tokens: 89805291520 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.516093E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.278 | TFLOPs: 11.51 | 7: iteration 171300/ 173500 | consumed samples: 43852800 | consumed tokens: 89810534400 | elapsed time per iteration (s): 0.09 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.504410E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3005.578 | TFLOPs: 11.18 | 7: iteration 171310/ 173500 | consumed samples: 43855360 | consumed tokens: 89815777280 | elapsed time per iteration (s): 0.10 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.506225E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2587.562 | TFLOPs: 9.62 | 7: iteration 171320/ 173500 | consumed samples: 43857920 | consumed tokens: 89821020160 | elapsed time per iteration (s): 0.10 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.501079E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.546 | TFLOPs: 9.33 | 7: iteration 171330/ 173500 | consumed samples: 43860480 | consumed tokens: 89826263040 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.518072E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3032.756 | TFLOPs: 11.28 | 7: iteration 171340/ 173500 | consumed samples: 43863040 | consumed tokens: 89831505920 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.509010E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3090.742 | TFLOPs: 11.50 | 7: iteration 171350/ 173500 | consumed samples: 43865600 | consumed tokens: 89836748800 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.506961E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3112.128 | TFLOPs: 11.58 | 7: iteration 171360/ 173500 | consumed samples: 43868160 | consumed tokens: 89841991680 | elapsed time per iteration (s): 0.09 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.501769E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2800.864 | TFLOPs: 10.42 | 7: iteration 171370/ 173500 | consumed samples: 43870720 | consumed tokens: 89847234560 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.491214E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3126.086 | TFLOPs: 11.63 | 7: iteration 171380/ 173500 | consumed samples: 43873280 | consumed tokens: 89852477440 | elapsed time per iteration (s): 0.09 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.500888E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2856.816 | TFLOPs: 10.63 | 7: iteration 171390/ 173500 | consumed samples: 43875840 | consumed tokens: 89857720320 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.517919E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3175.233 | TFLOPs: 11.81 | 7: iteration 171400/ 173500 | consumed samples: 43878400 | consumed tokens: 89862963200 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.506921E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.441 | TFLOPs: 11.83 | 7: iteration 171410/ 173500 | consumed samples: 43880960 | consumed tokens: 89868206080 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.520776E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3093.478 | TFLOPs: 11.51 | 7: iteration 171420/ 173500 | consumed samples: 43883520 | consumed tokens: 89873448960 | elapsed time per iteration (s): 0.08 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 4.515632E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.983 | TFLOPs: 11.89 | 7: iteration 171430/ 173500 | consumed samples: 43886080 | consumed tokens: 89878691840 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.500986E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.773 | TFLOPs: 11.88 | 7: iteration 171440/ 173500 | consumed samples: 43888640 | consumed tokens: 89883934720 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.497845E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3070.377 | TFLOPs: 11.42 | 7: iteration 171450/ 173500 | consumed samples: 43891200 | consumed tokens: 89889177600 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.506852E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3071.317 | TFLOPs: 11.42 | 7: iteration 171460/ 173500 | consumed samples: 43893760 | consumed tokens: 89894420480 | elapsed time per iteration (s): 0.10 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.505645E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2541.424 | TFLOPs: 9.45 | 7: iteration 171470/ 173500 | consumed samples: 43896320 | consumed tokens: 89899663360 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.498405E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.458 | TFLOPs: 11.90 | 7: iteration 171480/ 173500 | consumed samples: 43898880 | consumed tokens: 89904906240 | elapsed time per iteration (s): 0.09 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.499945E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2936.645 | TFLOPs: 10.92 | 7: iteration 171490/ 173500 | consumed samples: 43901440 | consumed tokens: 89910149120 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.504237E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.368 | TFLOPs: 11.91 | 7: iteration 171500/ 173500 | consumed samples: 43904000 | consumed tokens: 89915392000 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.517170E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.076 | TFLOPs: 11.90 | 7: iteration 171510/ 173500 | consumed samples: 43906560 | consumed tokens: 89920634880 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.512112E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3099.569 | TFLOPs: 11.53 | 7: iteration 171520/ 173500 | consumed samples: 43909120 | consumed tokens: 89925877760 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.515945E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.266 | TFLOPs: 11.86 | 7: iteration 171530/ 173500 | consumed samples: 43911680 | consumed tokens: 89931120640 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.503141E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.117 | TFLOPs: 11.88 | 7: iteration 171540/ 173500 | consumed samples: 43914240 | consumed tokens: 89936363520 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.520280E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.604 | TFLOPs: 11.74 | 7: iteration 171550/ 173500 | consumed samples: 43916800 | consumed tokens: 89941606400 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.506746E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3157.540 | TFLOPs: 11.74 | 7: iteration 171560/ 173500 | consumed samples: 43919360 | consumed tokens: 89946849280 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.496886E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.721 | TFLOPs: 11.85 | 7: iteration 171570/ 173500 | consumed samples: 43921920 | consumed tokens: 89952092160 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.511353E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3142.026 | TFLOPs: 11.69 | 7: iteration 171580/ 173500 | consumed samples: 43924480 | consumed tokens: 89957335040 | elapsed time per iteration (s): 0.08 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 4.516208E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3141.504 | TFLOPs: 11.69 | 7: iteration 171590/ 173500 | consumed samples: 43927040 | consumed tokens: 89962577920 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.512667E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3180.497 | TFLOPs: 11.83 | 7: iteration 171600/ 173500 | consumed samples: 43929600 | consumed tokens: 89967820800 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.509997E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3148.436 | TFLOPs: 11.71 | 7: iteration 171610/ 173500 | consumed samples: 43932160 | consumed tokens: 89973063680 | elapsed time per iteration (s): 0.09 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.504287E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2986.600 | TFLOPs: 11.11 | 7: iteration 171620/ 173500 | consumed samples: 43934720 | consumed tokens: 89978306560 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.500081E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3189.645 | TFLOPs: 11.86 | 7: iteration 171630/ 173500 | consumed samples: 43937280 | consumed tokens: 89983549440 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.510464E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.227 | TFLOPs: 11.95 | 7: iteration 171640/ 173500 | consumed samples: 43939840 | consumed tokens: 89988792320 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.490829E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3088.042 | TFLOPs: 11.49 | 7: iteration 171650/ 173500 | consumed samples: 43942400 | consumed tokens: 89994035200 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.505566E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3227.939 | TFLOPs: 12.01 | 7: iteration 171660/ 173500 | consumed samples: 43944960 | consumed tokens: 89999278080 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.494401E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.244 | TFLOPs: 12.01 | 7: iteration 171670/ 173500 | consumed samples: 43947520 | consumed tokens: 90004520960 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.510006E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3044.331 | TFLOPs: 11.32 | 7: iteration 171680/ 173500 | consumed samples: 43950080 | consumed tokens: 90009763840 | elapsed time per iteration (s): 0.09 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.508529E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3006.065 | TFLOPs: 11.18 | 7: iteration 171690/ 173500 | consumed samples: 43952640 | consumed tokens: 90015006720 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.514968E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3192.037 | TFLOPs: 11.87 | 7: iteration 171700/ 173500 | consumed samples: 43955200 | consumed tokens: 90020249600 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.500451E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3205.440 | TFLOPs: 11.92 | 7: iteration 171710/ 173500 | consumed samples: 43957760 | consumed tokens: 90025492480 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.505413E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.992 | TFLOPs: 11.92 | 7: iteration 171720/ 173500 | consumed samples: 43960320 | consumed tokens: 90030735360 | elapsed time per iteration (s): 0.08 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.508109E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.436 | TFLOPs: 11.86 | 7: iteration 171730/ 173500 | consumed samples: 43962880 | consumed tokens: 90035978240 | elapsed time per iteration (s): 0.12 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.492283E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2165.301 | TFLOPs: 8.05 | 7: iteration 171740/ 173500 | consumed samples: 43965440 | consumed tokens: 90041221120 | elapsed time per iteration (s): 0.12 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.515095E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2177.070 | TFLOPs: 8.10 | 7: iteration 171750/ 173500 | consumed samples: 43968000 | consumed tokens: 90046464000 | elapsed time per iteration (s): 0.10 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.494726E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2555.705 | TFLOPs: 9.51 | 7: iteration 171760/ 173500 | consumed samples: 43970560 | consumed tokens: 90051706880 | elapsed time per iteration (s): 0.11 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.500064E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2272.566 | TFLOPs: 8.45 | 7: iteration 171770/ 173500 | consumed samples: 43973120 | consumed tokens: 90056949760 | elapsed time per iteration (s): 0.09 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 4.489516E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2830.776 | TFLOPs: 10.53 | 7: iteration 171780/ 173500 | consumed samples: 43975680 | consumed tokens: 90062192640 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.499354E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3078.746 | TFLOPs: 11.45 | 7: iteration 171790/ 173500 | consumed samples: 43978240 | consumed tokens: 90067435520 | elapsed time per iteration (s): 0.09 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.508956E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2962.225 | TFLOPs: 11.02 | 7: iteration 171800/ 173500 | consumed samples: 43980800 | consumed tokens: 90072678400 | elapsed time per iteration (s): 0.13 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.509138E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2023.119 | TFLOPs: 7.53 | 7: iteration 171810/ 173500 | consumed samples: 43983360 | consumed tokens: 90077921280 | elapsed time per iteration (s): 0.10 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.500014E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2509.974 | TFLOPs: 9.34 | 7: iteration 171820/ 173500 | consumed samples: 43985920 | consumed tokens: 90083164160 | elapsed time per iteration (s): 0.10 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.514754E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2461.192 | TFLOPs: 9.15 | 7: iteration 171830/ 173500 | consumed samples: 43988480 | consumed tokens: 90088407040 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.513284E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.391 | TFLOPs: 11.93 | 7: iteration 171840/ 173500 | consumed samples: 43991040 | consumed tokens: 90093649920 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.512288E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3145.563 | TFLOPs: 11.70 | 7: iteration 171850/ 173500 | consumed samples: 43993600 | consumed tokens: 90098892800 | elapsed time per iteration (s): 0.09 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.509124E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2731.104 | TFLOPs: 10.16 | 7: iteration 171860/ 173500 | consumed samples: 43996160 | consumed tokens: 90104135680 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.502699E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3045.185 | TFLOPs: 11.33 | 7: iteration 171870/ 173500 | consumed samples: 43998720 | consumed tokens: 90109378560 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.512016E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.447 | TFLOPs: 11.86 | 7: iteration 171880/ 173500 | consumed samples: 44001280 | consumed tokens: 90114621440 | elapsed time per iteration (s): 0.09 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.510260E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2797.159 | TFLOPs: 10.40 | 7: iteration 171890/ 173500 | consumed samples: 44003840 | consumed tokens: 90119864320 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.498333E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3178.237 | TFLOPs: 11.82 | 7: iteration 171900/ 173500 | consumed samples: 44006400 | consumed tokens: 90125107200 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.515556E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3104.210 | TFLOPs: 11.55 | 7: iteration 171910/ 173500 | consumed samples: 44008960 | consumed tokens: 90130350080 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.502224E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3124.959 | TFLOPs: 11.62 | 7: iteration 171920/ 173500 | consumed samples: 44011520 | consumed tokens: 90135592960 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.509116E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.741 | TFLOPs: 11.91 | 7: iteration 171930/ 173500 | consumed samples: 44014080 | consumed tokens: 90140835840 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.494180E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.811 | TFLOPs: 11.83 | 7: iteration 171940/ 173500 | consumed samples: 44016640 | consumed tokens: 90146078720 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.504306E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.935 | TFLOPs: 11.84 | 7: iteration 171950/ 173500 | consumed samples: 44019200 | consumed tokens: 90151321600 | elapsed time per iteration (s): 0.10 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.510549E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2615.631 | TFLOPs: 9.73 | 7: iteration 171960/ 173500 | consumed samples: 44021760 | consumed tokens: 90156564480 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.501164E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.274 | TFLOPs: 11.83 | 7: iteration 171970/ 173500 | consumed samples: 44024320 | consumed tokens: 90161807360 | elapsed time per iteration (s): 0.08 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 4.503056E+00 | grad norm: 0.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3171.636 | TFLOPs: 11.80 | 7: iteration 171980/ 173500 | consumed samples: 44026880 | consumed tokens: 90167050240 | elapsed time per iteration (s): 0.10 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.499417E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2685.050 | TFLOPs: 9.99 | 7: iteration 171990/ 173500 | consumed samples: 44029440 | consumed tokens: 90172293120 | elapsed time per iteration (s): 0.09 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.502135E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2985.624 | TFLOPs: 11.11 | 0: [2023-03-17 04:26:54,469] [INFO] [logging.py:68:log_dist] [Rank 0] step=172000, skipped=0, lr=[2.0033893682955986e-05, 2.0033893682955986e-05, 2.0033893682955986e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 172000/ 173500 | consumed samples: 44032000 | consumed tokens: 90177536000 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.506057E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.323 | TFLOPs: 11.25 | 0: steps: 172000 loss: 4.5033 iter time (s): 0.085 samples/sec: 2997.869 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 172000 | lm loss value: 4.371511E+00 | lm loss PPL: 7.916312E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 172000 to checkpoints_14m91b100m 0: [2023-03-17 04:26:54,551] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step172000 is begin to save! 0: [2023-03-17 04:26:54,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:26:54,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:26:54,582] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:26:54,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:26:54,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:26:54,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:26:54,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:26:54,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:26:54,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:26:54,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:26:54,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:26:54,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:26:54,595] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step172000/mp_rank_00_model_states.pt 0: [2023-03-17 04:26:54,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:26:54,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:26:54,613] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:26:54,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,624] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,627] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,627] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,629] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,629] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 6: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 7: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 2: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 5: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 5: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 4: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 1: [2023-03-17 04:26:54,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:26:54,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 3: [2023-03-17 04:26:54,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:26:54,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step172000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:26:54,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step172000 is ready now! 0: successfully saved checkpoint at iteration 172000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 88.84 7: iteration 172010/ 173500 | consumed samples: 44034560 | consumed tokens: 90182778880 | elapsed time per iteration (s): 0.12 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.510487E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2078.383 | TFLOPs: 7.73 | 7: iteration 172020/ 173500 | consumed samples: 44037120 | consumed tokens: 90188021760 | elapsed time per iteration (s): 0.10 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.503566E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2519.584 | TFLOPs: 9.37 | 7: iteration 172030/ 173500 | consumed samples: 44039680 | consumed tokens: 90193264640 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.511677E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3166.170 | TFLOPs: 11.78 | 7: iteration 172040/ 173500 | consumed samples: 44042240 | consumed tokens: 90198507520 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.511796E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.635 | TFLOPs: 11.96 | 7: iteration 172050/ 173500 | consumed samples: 44044800 | consumed tokens: 90203750400 | elapsed time per iteration (s): 0.11 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.498504E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2363.383 | TFLOPs: 8.79 | 7: iteration 172060/ 173500 | consumed samples: 44047360 | consumed tokens: 90208993280 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.513202E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3138.284 | TFLOPs: 11.67 | 7: iteration 172070/ 173500 | consumed samples: 44049920 | consumed tokens: 90214236160 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.492594E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3253.827 | TFLOPs: 12.10 | 7: iteration 172080/ 173500 | consumed samples: 44052480 | consumed tokens: 90219479040 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.502958E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.647 | TFLOPs: 11.95 | 7: iteration 172090/ 173500 | consumed samples: 44055040 | consumed tokens: 90224721920 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.510844E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3035.217 | TFLOPs: 11.29 | 7: iteration 172100/ 173500 | consumed samples: 44057600 | consumed tokens: 90229964800 | elapsed time per iteration (s): 0.10 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.508465E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2546.906 | TFLOPs: 9.47 | 7: iteration 172110/ 173500 | consumed samples: 44060160 | consumed tokens: 90235207680 | elapsed time per iteration (s): 0.09 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.509662E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2873.296 | TFLOPs: 10.69 | 7: iteration 172120/ 173500 | consumed samples: 44062720 | consumed tokens: 90240450560 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.510835E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.066 | TFLOPs: 11.91 | 7: iteration 172130/ 173500 | consumed samples: 44065280 | consumed tokens: 90245693440 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.499759E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3238.039 | TFLOPs: 12.04 | 7: iteration 172140/ 173500 | consumed samples: 44067840 | consumed tokens: 90250936320 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.495942E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.560 | TFLOPs: 12.03 | 7: iteration 172150/ 173500 | consumed samples: 44070400 | consumed tokens: 90256179200 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.512362E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3113.928 | TFLOPs: 11.58 | 7: iteration 172160/ 173500 | consumed samples: 44072960 | consumed tokens: 90261422080 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.493004E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.510 | TFLOPs: 11.90 | 7: iteration 172170/ 173500 | consumed samples: 44075520 | consumed tokens: 90266664960 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.508103E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3236.326 | TFLOPs: 12.04 | 7: iteration 172180/ 173500 | consumed samples: 44078080 | consumed tokens: 90271907840 | elapsed time per iteration (s): 0.09 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.508492E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2700.368 | TFLOPs: 10.04 | 7: iteration 172190/ 173500 | consumed samples: 44080640 | consumed tokens: 90277150720 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.501026E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.148 | TFLOPs: 11.99 | 7: iteration 172200/ 173500 | consumed samples: 44083200 | consumed tokens: 90282393600 | elapsed time per iteration (s): 0.08 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.511819E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3214.071 | TFLOPs: 11.95 | 7: iteration 172210/ 173500 | consumed samples: 44085760 | consumed tokens: 90287636480 | elapsed time per iteration (s): 0.09 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 4.496807E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2892.980 | TFLOPs: 10.76 | 7: iteration 172220/ 173500 | consumed samples: 44088320 | consumed tokens: 90292879360 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.494303E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3223.804 | TFLOPs: 11.99 | 7: iteration 172230/ 173500 | consumed samples: 44090880 | consumed tokens: 90298122240 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.494208E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2881.361 | TFLOPs: 10.72 | 7: iteration 172240/ 173500 | consumed samples: 44093440 | consumed tokens: 90303365120 | elapsed time per iteration (s): 0.10 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.489133E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2589.789 | TFLOPs: 9.63 | 7: iteration 172250/ 173500 | consumed samples: 44096000 | consumed tokens: 90308608000 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.502839E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3022.834 | TFLOPs: 11.24 | 7: iteration 172260/ 173500 | consumed samples: 44098560 | consumed tokens: 90313850880 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.485625E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3212.139 | TFLOPs: 11.95 | 7: iteration 172270/ 173500 | consumed samples: 44101120 | consumed tokens: 90319093760 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.501517E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3228.913 | TFLOPs: 12.01 | 7: iteration 172280/ 173500 | consumed samples: 44103680 | consumed tokens: 90324336640 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.495063E+00 | grad norm: 0.392 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.290 | TFLOPs: 12.00 | 7: iteration 172290/ 173500 | consumed samples: 44106240 | consumed tokens: 90329579520 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.514317E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2850.782 | TFLOPs: 10.60 | 7: iteration 172300/ 173500 | consumed samples: 44108800 | consumed tokens: 90334822400 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.497419E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2770.535 | TFLOPs: 10.31 | 7: iteration 172310/ 173500 | consumed samples: 44111360 | consumed tokens: 90340065280 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.507496E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.340 | TFLOPs: 12.01 | 7: iteration 172320/ 173500 | consumed samples: 44113920 | consumed tokens: 90345308160 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.511378E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2896.295 | TFLOPs: 10.77 | 7: iteration 172330/ 173500 | consumed samples: 44116480 | consumed tokens: 90350551040 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.493483E+00 | grad norm: 0.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.724 | TFLOPs: 11.94 | 7: iteration 172340/ 173500 | consumed samples: 44119040 | consumed tokens: 90355793920 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.485630E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2919.301 | TFLOPs: 10.86 | 7: iteration 172350/ 173500 | consumed samples: 44121600 | consumed tokens: 90361036800 | elapsed time per iteration (s): 0.11 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.510791E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2349.025 | TFLOPs: 8.74 | 7: iteration 172360/ 173500 | consumed samples: 44124160 | consumed tokens: 90366279680 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.505295E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3150.362 | TFLOPs: 11.72 | 7: iteration 172370/ 173500 | consumed samples: 44126720 | consumed tokens: 90371522560 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.493102E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3196.762 | TFLOPs: 11.89 | 7: iteration 172380/ 173500 | consumed samples: 44129280 | consumed tokens: 90376765440 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.509377E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.258 | TFLOPs: 11.89 | 7: iteration 172390/ 173500 | consumed samples: 44131840 | consumed tokens: 90382008320 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.508464E+00 | grad norm: 0.390 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3186.366 | TFLOPs: 11.85 | 7: iteration 172400/ 173500 | consumed samples: 44134400 | consumed tokens: 90387251200 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.514566E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3194.379 | TFLOPs: 11.88 | 7: iteration 172410/ 173500 | consumed samples: 44136960 | consumed tokens: 90392494080 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.495923E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.842 | TFLOPs: 11.53 | 7: iteration 172420/ 173500 | consumed samples: 44139520 | consumed tokens: 90397736960 | elapsed time per iteration (s): 0.09 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.499852E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2878.635 | TFLOPs: 10.71 | 7: iteration 172430/ 173500 | consumed samples: 44142080 | consumed tokens: 90402979840 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.491759E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.713 | TFLOPs: 11.53 | 7: iteration 172440/ 173500 | consumed samples: 44144640 | consumed tokens: 90408222720 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.507565E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.372 | TFLOPs: 11.91 | 7: iteration 172450/ 173500 | consumed samples: 44147200 | consumed tokens: 90413465600 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.503511E+00 | grad norm: 0.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3059.339 | TFLOPs: 11.38 | 7: iteration 172460/ 173500 | consumed samples: 44149760 | consumed tokens: 90418708480 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.501868E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.050 | TFLOPs: 11.91 | 7: iteration 172470/ 173500 | consumed samples: 44152320 | consumed tokens: 90423951360 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.508095E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3033.561 | TFLOPs: 11.28 | 7: iteration 172480/ 173500 | consumed samples: 44154880 | consumed tokens: 90429194240 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.505905E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3068.253 | TFLOPs: 11.41 | 7: iteration 172490/ 173500 | consumed samples: 44157440 | consumed tokens: 90434437120 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.508635E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.258 | TFLOPs: 12.00 | 7: iteration 172500/ 173500 | consumed samples: 44160000 | consumed tokens: 90439680000 | elapsed time per iteration (s): 0.08 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 4.517267E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3184.837 | TFLOPs: 11.85 | 7: iteration 172510/ 173500 | consumed samples: 44162560 | consumed tokens: 90444922880 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.495057E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3097.806 | TFLOPs: 11.52 | 7: iteration 172520/ 173500 | consumed samples: 44165120 | consumed tokens: 90450165760 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.516990E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3127.529 | TFLOPs: 11.63 | 7: iteration 172530/ 173500 | consumed samples: 44167680 | consumed tokens: 90455408640 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.501726E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3213.979 | TFLOPs: 11.95 | 7: iteration 172540/ 173500 | consumed samples: 44170240 | consumed tokens: 90460651520 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.490151E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.886 | TFLOPs: 11.87 | 7: iteration 172550/ 173500 | consumed samples: 44172800 | consumed tokens: 90465894400 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.511445E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.067 | TFLOPs: 11.83 | 7: iteration 172560/ 173500 | consumed samples: 44175360 | consumed tokens: 90471137280 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.506038E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3021.417 | TFLOPs: 11.24 | 7: iteration 172570/ 173500 | consumed samples: 44177920 | consumed tokens: 90476380160 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.515251E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2966.986 | TFLOPs: 11.04 | 7: iteration 172580/ 173500 | consumed samples: 44180480 | consumed tokens: 90481623040 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.513276E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.058 | TFLOPs: 12.03 | 7: iteration 172590/ 173500 | consumed samples: 44183040 | consumed tokens: 90486865920 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.499196E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3154.040 | TFLOPs: 11.73 | 7: iteration 172600/ 173500 | consumed samples: 44185600 | consumed tokens: 90492108800 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.494774E+00 | grad norm: 0.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3087.270 | TFLOPs: 11.48 | 7: iteration 172610/ 173500 | consumed samples: 44188160 | consumed tokens: 90497351680 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.505729E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2913.135 | TFLOPs: 10.84 | 7: iteration 172620/ 173500 | consumed samples: 44190720 | consumed tokens: 90502594560 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.501271E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.151 | TFLOPs: 11.35 | 7: iteration 172630/ 173500 | consumed samples: 44193280 | consumed tokens: 90507837440 | elapsed time per iteration (s): 0.10 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.507974E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2565.885 | TFLOPs: 9.54 | 7: iteration 172640/ 173500 | consumed samples: 44195840 | consumed tokens: 90513080320 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.484963E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3076.492 | TFLOPs: 11.44 | 7: iteration 172650/ 173500 | consumed samples: 44198400 | consumed tokens: 90518323200 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.503845E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2771.098 | TFLOPs: 10.31 | 7: iteration 172660/ 173500 | consumed samples: 44200960 | consumed tokens: 90523566080 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.498749E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3181.009 | TFLOPs: 11.83 | 7: iteration 172670/ 173500 | consumed samples: 44203520 | consumed tokens: 90528808960 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.505706E+00 | grad norm: 0.398 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2851.369 | TFLOPs: 10.61 | 7: iteration 172680/ 173500 | consumed samples: 44206080 | consumed tokens: 90534051840 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.497977E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.842 | TFLOPs: 11.91 | 7: iteration 172690/ 173500 | consumed samples: 44208640 | consumed tokens: 90539294720 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.522953E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3241.097 | TFLOPs: 12.06 | 7: iteration 172700/ 173500 | consumed samples: 44211200 | consumed tokens: 90544537600 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.492417E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3217.080 | TFLOPs: 11.97 | 7: iteration 172710/ 173500 | consumed samples: 44213760 | consumed tokens: 90549780480 | elapsed time per iteration (s): 0.11 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.505864E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2283.581 | TFLOPs: 8.49 | 7: iteration 172720/ 173500 | consumed samples: 44216320 | consumed tokens: 90555023360 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.487654E+00 | grad norm: 0.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3129.515 | TFLOPs: 11.64 | 7: iteration 172730/ 173500 | consumed samples: 44218880 | consumed tokens: 90560266240 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.513589E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3225.029 | TFLOPs: 12.00 | 7: iteration 172740/ 173500 | consumed samples: 44221440 | consumed tokens: 90565509120 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.492188E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.380 | TFLOPs: 11.98 | 7: iteration 172750/ 173500 | consumed samples: 44224000 | consumed tokens: 90570752000 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.505561E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2952.003 | TFLOPs: 10.98 | 7: iteration 172760/ 173500 | consumed samples: 44226560 | consumed tokens: 90575994880 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.510229E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3237.226 | TFLOPs: 12.04 | 7: iteration 172770/ 173500 | consumed samples: 44229120 | consumed tokens: 90581237760 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.511936E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.279 | TFLOPs: 11.88 | 7: iteration 172780/ 173500 | consumed samples: 44231680 | consumed tokens: 90586480640 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.504467E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2967.614 | TFLOPs: 11.04 | 7: iteration 172790/ 173500 | consumed samples: 44234240 | consumed tokens: 90591723520 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.491274E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2714.280 | TFLOPs: 10.10 | 7: iteration 172800/ 173500 | consumed samples: 44236800 | consumed tokens: 90596966400 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.496013E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2990.943 | TFLOPs: 11.12 | 7: iteration 172810/ 173500 | consumed samples: 44239360 | consumed tokens: 90602209280 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.512027E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3224.755 | TFLOPs: 11.99 | 7: iteration 172820/ 173500 | consumed samples: 44241920 | consumed tokens: 90607452160 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.508311E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3147.845 | TFLOPs: 11.71 | 7: iteration 172830/ 173500 | consumed samples: 44244480 | consumed tokens: 90612695040 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.491182E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3216.631 | TFLOPs: 11.96 | 7: iteration 172840/ 173500 | consumed samples: 44247040 | consumed tokens: 90617937920 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.496930E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3047.267 | TFLOPs: 11.33 | 7: iteration 172850/ 173500 | consumed samples: 44249600 | consumed tokens: 90623180800 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.502060E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3108.586 | TFLOPs: 11.56 | 7: iteration 172860/ 173500 | consumed samples: 44252160 | consumed tokens: 90628423680 | elapsed time per iteration (s): 0.09 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.511731E+00 | grad norm: 0.360 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2914.506 | TFLOPs: 10.84 | 7: iteration 172870/ 173500 | consumed samples: 44254720 | consumed tokens: 90633666560 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.502614E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3026.674 | TFLOPs: 11.26 | 7: iteration 172880/ 173500 | consumed samples: 44257280 | consumed tokens: 90638909440 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.491372E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3229.997 | TFLOPs: 12.01 | 7: iteration 172890/ 173500 | consumed samples: 44259840 | consumed tokens: 90644152320 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.507524E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3247.648 | TFLOPs: 12.08 | 7: iteration 172900/ 173500 | consumed samples: 44262400 | consumed tokens: 90649395200 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.496814E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3233.121 | TFLOPs: 12.03 | 7: iteration 172910/ 173500 | consumed samples: 44264960 | consumed tokens: 90654638080 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.498377E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.564 | TFLOPs: 12.03 | 7: iteration 172920/ 173500 | consumed samples: 44267520 | consumed tokens: 90659880960 | elapsed time per iteration (s): 0.08 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 4.498346E+00 | grad norm: 0.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3243.936 | TFLOPs: 12.07 | 7: iteration 172930/ 173500 | consumed samples: 44270080 | consumed tokens: 90665123840 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.502570E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2960.620 | TFLOPs: 11.01 | 7: iteration 172940/ 173500 | consumed samples: 44272640 | consumed tokens: 90670366720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.502433E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3226.727 | TFLOPs: 12.00 | 7: iteration 172950/ 173500 | consumed samples: 44275200 | consumed tokens: 90675609600 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.492509E+00 | grad norm: 0.383 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2309.568 | TFLOPs: 8.59 | 7: iteration 172960/ 173500 | consumed samples: 44277760 | consumed tokens: 90680852480 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.508148E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3173.682 | TFLOPs: 11.80 | 7: iteration 172970/ 173500 | consumed samples: 44280320 | consumed tokens: 90686095360 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.499481E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2819.577 | TFLOPs: 10.49 | 7: iteration 172980/ 173500 | consumed samples: 44282880 | consumed tokens: 90691338240 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505424E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3025.604 | TFLOPs: 11.25 | 7: iteration 172990/ 173500 | consumed samples: 44285440 | consumed tokens: 90696581120 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.504491E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.605 | TFLOPs: 11.93 | 7: iteration 173000/ 173500 | consumed samples: 44288000 | consumed tokens: 90701824000 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505613E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3242.577 | TFLOPs: 12.06 | 7: ------------------------------------------------------------------------------------------------- 7: validation loss at iteration 173000 | lm loss value: 4.440018E+00 | lm loss PPL: 8.477644E+01 | 7: ------------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 173000 to checkpoints_14m91b100m 0: [2023-03-17 04:28:19,320] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step173000 is begin to save! 0: [2023-03-17 04:28:19,324] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:28:19,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:28:19,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:28:19,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:28:19,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:28:19,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:28:19,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:28:19,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:28:19,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:28:19,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:28:19,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:28:19,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:28:19,366] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step173000/mp_rank_00_model_states.pt 0: [2023-03-17 04:28:19,366] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/mp_rank_00_model_states.pt... 0: [2023-03-17 04:28:19,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/mp_rank_00_model_states.pt. 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:28:19,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:28:19,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:28:19,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 6: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 6: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 4: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 3: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 2: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 2: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 3: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2023-03-17 04:28:19,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 0: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 7: [2023-03-17 04:28:19,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 5: [2023-03-17 04:28:19,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:28:19,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:28:19,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 1: [2023-03-17 04:28:19,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:28:19,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:28:19,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173000 is ready now! 0: successfully saved checkpoint at iteration 173000 to checkpoints_14m91b100m 7: time (ms) | save-checkpoint: 84.50 7: iteration 173010/ 173500 | consumed samples: 44290560 | consumed tokens: 90707066880 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.516256E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2823.960 | TFLOPs: 10.50 | 7: iteration 173020/ 173500 | consumed samples: 44293120 | consumed tokens: 90712309760 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.508624E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3215.035 | TFLOPs: 11.96 | 7: iteration 173030/ 173500 | consumed samples: 44295680 | consumed tokens: 90717552640 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.494923E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3240.622 | TFLOPs: 12.05 | 7: iteration 173040/ 173500 | consumed samples: 44298240 | consumed tokens: 90722795520 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.507954E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3234.654 | TFLOPs: 12.03 | 7: iteration 173050/ 173500 | consumed samples: 44300800 | consumed tokens: 90728038400 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.521698E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3230.443 | TFLOPs: 12.02 | 7: iteration 173060/ 173500 | consumed samples: 44303360 | consumed tokens: 90733281280 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.513791E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3221.619 | TFLOPs: 11.98 | 7: iteration 173070/ 173500 | consumed samples: 44305920 | consumed tokens: 90738524160 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.504604E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3094.480 | TFLOPs: 11.51 | 7: iteration 173080/ 173500 | consumed samples: 44308480 | consumed tokens: 90743767040 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.499912E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.793 | TFLOPs: 11.89 | 7: iteration 173090/ 173500 | consumed samples: 44311040 | consumed tokens: 90749009920 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.507771E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.216 | TFLOPs: 11.93 | 7: iteration 173100/ 173500 | consumed samples: 44313600 | consumed tokens: 90754252800 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.498156E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3191.721 | TFLOPs: 11.87 | 7: iteration 173110/ 173500 | consumed samples: 44316160 | consumed tokens: 90759495680 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.496057E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3183.910 | TFLOPs: 11.84 | 7: iteration 173120/ 173500 | consumed samples: 44318720 | consumed tokens: 90764738560 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505738E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3034.085 | TFLOPs: 11.29 | 7: iteration 173130/ 173500 | consumed samples: 44321280 | consumed tokens: 90769981440 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500039E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3018.361 | TFLOPs: 11.23 | 7: iteration 173140/ 173500 | consumed samples: 44323840 | consumed tokens: 90775224320 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505931E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2721.336 | TFLOPs: 10.12 | 7: iteration 173150/ 173500 | consumed samples: 44326400 | consumed tokens: 90780467200 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.508282E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2313.669 | TFLOPs: 8.61 | 7: iteration 173160/ 173500 | consumed samples: 44328960 | consumed tokens: 90785710080 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.508775E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2801.787 | TFLOPs: 10.42 | 7: iteration 173170/ 173500 | consumed samples: 44331520 | consumed tokens: 90790952960 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.498943E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3203.230 | TFLOPs: 11.91 | 7: iteration 173180/ 173500 | consumed samples: 44334080 | consumed tokens: 90796195840 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.491800E+00 | grad norm: 0.380 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3201.165 | TFLOPs: 11.91 | 7: iteration 173190/ 173500 | consumed samples: 44336640 | consumed tokens: 90801438720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.489439E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.745 | TFLOPs: 11.87 | 7: iteration 173200/ 173500 | consumed samples: 44339200 | consumed tokens: 90806681600 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.498525E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2976.267 | TFLOPs: 11.07 | 7: iteration 173210/ 173500 | consumed samples: 44341760 | consumed tokens: 90811924480 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.509760E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3185.562 | TFLOPs: 11.85 | 7: iteration 173220/ 173500 | consumed samples: 44344320 | consumed tokens: 90817167360 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.509129E+00 | grad norm: 0.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.080 | TFLOPs: 11.90 | 7: iteration 173230/ 173500 | consumed samples: 44346880 | consumed tokens: 90822410240 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.502825E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3132.195 | TFLOPs: 11.65 | 7: iteration 173240/ 173500 | consumed samples: 44349440 | consumed tokens: 90827653120 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500888E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2774.155 | TFLOPs: 10.32 | 7: iteration 173250/ 173500 | consumed samples: 44352000 | consumed tokens: 90832896000 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.497340E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3179.952 | TFLOPs: 11.83 | 7: iteration 173260/ 173500 | consumed samples: 44354560 | consumed tokens: 90838138880 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505033E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.602 | TFLOPs: 11.87 | 7: iteration 173270/ 173500 | consumed samples: 44357120 | consumed tokens: 90843381760 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.506136E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3195.813 | TFLOPs: 11.89 | 7: iteration 173280/ 173500 | consumed samples: 44359680 | consumed tokens: 90848624640 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.494768E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3106.258 | TFLOPs: 11.55 | 7: iteration 173290/ 173500 | consumed samples: 44362240 | consumed tokens: 90853867520 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.492010E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2808.451 | TFLOPs: 10.45 | 7: iteration 173300/ 173500 | consumed samples: 44364800 | consumed tokens: 90859110400 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.488865E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3190.488 | TFLOPs: 11.87 | 7: iteration 173310/ 173500 | consumed samples: 44367360 | consumed tokens: 90864353280 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505595E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3188.809 | TFLOPs: 11.86 | 7: iteration 173320/ 173500 | consumed samples: 44369920 | consumed tokens: 90869596160 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500605E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3202.633 | TFLOPs: 11.91 | 7: iteration 173330/ 173500 | consumed samples: 44372480 | consumed tokens: 90874839040 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.506499E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2252.529 | TFLOPs: 8.38 | 7: iteration 173340/ 173500 | consumed samples: 44375040 | consumed tokens: 90880081920 | elapsed time per iteration (s): 0.12 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.512489E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2071.211 | TFLOPs: 7.70 | 7: iteration 173350/ 173500 | consumed samples: 44377600 | consumed tokens: 90885324800 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.504807E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2925.672 | TFLOPs: 10.88 | 7: iteration 173360/ 173500 | consumed samples: 44380160 | consumed tokens: 90890567680 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500359E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3098.246 | TFLOPs: 11.52 | 7: iteration 173370/ 173500 | consumed samples: 44382720 | consumed tokens: 90895810560 | elapsed time per iteration (s): 0.09 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.495346E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2915.467 | TFLOPs: 10.84 | 7: iteration 173380/ 173500 | consumed samples: 44385280 | consumed tokens: 90901053440 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.496057E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3207.366 | TFLOPs: 11.93 | 7: iteration 173390/ 173500 | consumed samples: 44387840 | consumed tokens: 90906296320 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.495079E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3144.061 | TFLOPs: 11.69 | 7: iteration 173400/ 173500 | consumed samples: 44390400 | consumed tokens: 90911539200 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.494824E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.471 | TFLOPs: 11.95 | 7: iteration 173410/ 173500 | consumed samples: 44392960 | consumed tokens: 90916782080 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.493816E+00 | grad norm: 0.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3199.762 | TFLOPs: 11.90 | 7: iteration 173420/ 173500 | consumed samples: 44395520 | consumed tokens: 90922024960 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.517920E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3206.589 | TFLOPs: 11.93 | 7: iteration 173430/ 173500 | consumed samples: 44398080 | consumed tokens: 90927267840 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.505107E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3193.005 | TFLOPs: 11.88 | 7: iteration 173440/ 173500 | consumed samples: 44400640 | consumed tokens: 90932510720 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.506483E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3197.541 | TFLOPs: 11.89 | 7: iteration 173450/ 173500 | consumed samples: 44403200 | consumed tokens: 90937753600 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500385E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3200.719 | TFLOPs: 11.91 | 7: iteration 173460/ 173500 | consumed samples: 44405760 | consumed tokens: 90942996480 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.506743E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3211.447 | TFLOPs: 11.95 | 7: iteration 173470/ 173500 | consumed samples: 44408320 | consumed tokens: 90948239360 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.487247E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.484 | TFLOPs: 11.93 | 7: iteration 173480/ 173500 | consumed samples: 44410880 | consumed tokens: 90953482240 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.500208E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3208.026 | TFLOPs: 11.93 | 7: iteration 173490/ 173500 | consumed samples: 44413440 | consumed tokens: 90958725120 | elapsed time per iteration (s): 0.08 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.502982E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 3052.130 | TFLOPs: 11.35 | 7: iteration 173500/ 173500 | consumed samples: 44416000 | consumed tokens: 90963968000 | elapsed time per iteration (s): 0.11 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 4.502843E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 2333.858 | TFLOPs: 8.68 | 0: [after training is done] datetime: 2023-03-17 04:29:01 0: saving checkpoint at iteration 173500 to checkpoints_14m91b100m 7: ----------------------------------------------------------------------------------------------------------------- 7: validation loss at the end of training for val data | lm loss value: 4.383527E+00 | lm loss PPL: 8.012014E+01 | 7: ----------------------------------------------------------------------------------------------------------------- 0: [2023-03-17 04:29:01,723] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step173500 is begin to save! 0: [2023-03-17 04:29:01,726] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_01-model_00-model_states.pt... 0: [2023-03-17 04:29:01,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_01-model_00-model_states.pt. 0: [2023-03-17 04:29:01,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_03-model_00-model_states.pt... 0: [2023-03-17 04:29:01,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_03-model_00-model_states.pt. 0: [2023-03-17 04:29:01,758] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_04-model_00-model_states.pt... 0: [2023-03-17 04:29:01,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_04-model_00-model_states.pt. 0: [2023-03-17 04:29:01,761] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_05-model_00-model_states.pt... 0: [2023-03-17 04:29:01,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_05-model_00-model_states.pt. 0: [2023-03-17 04:29:01,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_06-model_00-model_states.pt... 0: [2023-03-17 04:29:01,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_06-model_00-model_states.pt. 0: [2023-03-17 04:29:01,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/layer_08-model_00-model_states.pt... 0: [2023-03-17 04:29:01,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/layer_08-model_00-model_states.pt. 0: [2023-03-17 04:29:01,768] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_14m91b100m/global_step173500/mp_rank_00_model_states.pt 0: [2023-03-17 04:29:01,768] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/mp_rank_00_model_states.pt... 0: [2023-03-17 04:29:01,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/mp_rank_00_model_states.pt. 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 7: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2023-03-17 04:29:01,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:29:01,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:29:01,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,797] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,797] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2023-03-17 04:29:01,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2023-03-17 04:29:01,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 6: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 7: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 2: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 3: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 1: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 4: [2023-03-17 04:29:01,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2023-03-17 04:29:01,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2023-03-17 04:29:01,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 5: [2023-03-17 04:29:01,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2023-03-17 04:29:01,803] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_14m91b100m/global_step173500/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2023-03-17 04:29:01,803] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step173500 is ready now! 0: successfully saved checkpoint at iteration 173500 to checkpoints_14m91b100m END 3327073: Fri 17 Mar 2023 04:29:14 AM EET